tinysearch

Semantic Search Engine using BERT embeddings This is a project done as a part of CSCE 636 (Neural Networks).

"Please note that support for this repo has been closed. "

Existing search engines use keyword matching or tf-idf based matching to map the query to the web-documentsand rank them. They also consider other factors such as pagerank, hubs-and-authority scores, knowledge graphs to make theresults more meaningful. However, the existing search enginesfail to capture the meaning of query when it becomes largeand complex. BERT, introduced by Google in 2018, provides embeddings for words as well as sentences. In this project, I have developed a semantics-oriented search engine using neural networks and BERT embeddings that can search for query and rank the documents in the order of the most meaningful to least-meaningful. The results shows improvement over one existing search engine for complex queries for given set of documents.

Install Dependencies:

pip install bert-serving-server from here
pip install tensorflow
pip install tkinter
pip install keras

How to run:

Download the two folders (uncased and model) from this zipped file from drive.
Run the bert-serving server as follows: bert-serving-start -model_dir=uncased_L-12_H-768_A-12/ -tuned_model_dir=model/ -ckpt_name=model.ckpt-78 -num_worker=1 -pooling_strategy=CLS_TOKEN -max_seq_len=125 -num_worker=4
To train the model run: python generate_embeddings.py and then python train.py. Generate embeddings fetches the embeddings of quora-question-pairs and saves them. The train file loads the embeddings and trains the neural network model.
To run the GUI type: python gui_v4.py

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
README.md		README.md
documents_list		documents_list
generate_embeddings.py		generate_embeddings.py
gui_v4.py		gui_v4.py
query_and_relevance		query_and_relevance
quest_label		quest_label
second_model.h5		second_model.h5
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

tinysearch

About

Uh oh!

Releases

Packages

Uh oh!

Languages

manishpatel005/tinysearch

Folders and files

Latest commit

History

Repository files navigation

tinysearch

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages