Welcome to the Semantic Search with GloVe Optimization project! This repository contains an enhanced semantic search algorithm using GloVe word embeddings for efficient navigation of WordNet.
- Introduction
- Objective
- Key Features
- Usage
- Optimization Strategies
- Evaluation
- Folder Structure
- Dependencies
- Contributing
- License
The project is inspired by the "Twenty Questions" game, where the goal is to guess a target concept within a limited number of logical queries expressed in Conjunctive Normal Form (CNF). The algorithm leverages GloVe word embeddings to streamline the search process in WordNet.
The primary objective is to optimize the search algorithm, replacing an inefficient depth-first approach with a more effective strategy guided by word embeddings. The goal is to achieve quick and accurate identification of target synsets within an average of 100 steps.
- Integration of GloVe word embeddings for semantic analysis.
- Logical queries in CNF for exploring hypernyms, hyponyms, and part-meronyms.
- Binary elimination of possibilities to expedite the search.
- Modular and extensible code structure.
To use the semantic search algorithm, follow these steps:
- Clone the repository:
git clone https://github.com/your-username/semantic-search-glove-optimization.git
- Navigate to the project directory:
cd semantic-search-glove-optimization
- Run the algorithm:
python wn_search.py
- Follow on-screen instructions and observe the search efficiency.
The algorithm employs the following optimization strategies:
- Utilization of GloVe embeddings to calculate centroids for remaining possibilities.
- Binary elimination of potential answers based on logical queries.
- Exploration of hypernyms, hyponyms, and part-meronyms in WordNet.
The success of the optimization is evaluated based on the algorithm's ability to find the target synset in WordNet within an average of 100 steps. The Oracle
class in the wn_eval
module is used for evaluation.
data/
: Contains GloVe word embeddings files.wn_eval.py
: Module for evaluation using the Oracle class.wn_search.py
: Main search algorithm implementation.README.md
: Project documentation.
Ensure you have the following dependencies installed:
- nltk
- gensim
Install dependencies using: pip install -r requirements.txt
Feel free to contribute to the project by opening issues or submitting pull requests. Your feedback and enhancements are highly appreciated!
This project is licensed under the MIT License.
Happy searching!