Skip to content


Repository files navigation

Tweet Search Engine


The dataset contained 10 million tweets about the corona subject. The primary goal was, given a query to retrieve the most relevant tweets from the corpus, obviously at the best runtime as possible. Therefore, I used Multithreading programming model throughout the different preprocess parts in order to improve the runtime.


  • reader , Read the tweets from the dataset
  • parser_module , Parse the tweets according to the rules of the class
  • stemmer , Stemm the tweets using Porter's stemming
  • indexer , Create and store the posting file om the disk


  • ranker , I implemented multiple models to rank tweets supply by GloVe, Word2Vec, WordNet, SpellChecker and Thesaurus.
  • searcher , return the relevant tweets

💡 Prerequisite

Python 3.7

🛠️ Installation

With Github

git clone
cd SearchEngine/
python3 install