Build a simple search engine based on TF-IDF
The data is crawled from the top-100-song chart in nhaccuatui.com
The code editor used for this project is the Visual studio Code
The crawl methods is presented in crawl.py
The string normalization method is presented in textprocessing.py
The construction methods, indexing data, are presented in indexing.py
The methods of building GUI, querying, ranking results, are presented in main.py
pip install requests
pip install beautifulsoup4
pip install selenium
pip install underthesea
pip install PyQt5
Step 1: Crawl data: (If using existing crawled data, skip this step)
python crawl.py
Step 2: Build Data, inverted_index: (If using existing crawled data, skip this step)
python indexing.py
Step 3: start GUI
python main.py