This project is a prototype of text document retrieval system that utilizes the TF-IDF (Term Frequency-Inverse Document Frequency) weighting and Cosine Similarity algorithm. The text documents used in this project are news articles related to the activities of the Data Science Study Program at the Universitas Pembangunan Nasional "Veteran" Jawa Timur, which can be accessed through https://sada.upnjatim.ac.id/category/berita/. This project is designed to search and display news articles related to the queries entered by the users.
Additionally, the project includes a graphical user interface (GUI) created using Tkinter and CustomTkinter. The GUI allows users to input search queries and obtain retrieval results in the form of a list of documents most relevant to the query.
Before running this project, make sure you have the following prerequisites:
- Python 3.9 or above.
Follow these steps to install the project:
-
Clone this repository to your local directory:
$ git clone https://github.com/harishartanto/information-retrieval.git
-
Navigate to the project directory:
$ cd information-retrieval
-
Install the required dependencies:
$ pip install -r requirements.txt
To run this project, follow these steps:
-
Make sure you are in the project directory:
$ cd information-retrieval
-
Run the
main.py
file:$ python main.py
This will open the user interface (GUI) of the text document retrieval system.
-
Enter your search query in the provided text box and click the "Search" button.
-
The search results will be displayed as a list of documents, sorted based on their relevance to the query.
A scientific paper related to this project has been published in the Seminar Nasional Teknologi dan Sistem Informasi (SITASI) 2022. You can access the paper through the following link:
This project is licensed under the MIT License. Please read the LICENSE file for more details.