Yet Another Google (YAG) [Algorithms for Information Retrieval]

Search Engine Implementation in Python

This project was implemented as part of a course.

The problem statement:

Build a search engine for Environmental News NLP archive.
Build a corpus for archive with at least 418 documents.

Our search engine is capable of the following query types

Simple Boolean Query (for eg: good deed -> this would translate to "good AND deed")
Phrase Query (for eg: prince charles)
Wildcard Query (for eg: nat* , *til , nat*nal)

Some features include

Corpus and Query Preprocessing
Inverted Index
Parallelized Index Construction
Ranked Results (for top K documents retrieval)
Searching on a single index (for eg: republicans and democrats | CNN.201710.csv)

Getting Started

The following steps will help you setup and run the project.

Prerequisites

Installing external libraries using requirements.txt

python -m pip install -r requirements.txt

Executing Code

Windows

python main.py

Linux

python3 main.py

Built With

NLTK - For Natural Language processing and Corpus Preprocessing
pandas - For reading and interpreting csv files in the dataset
bidict - For the Bidirectional Dictionary
pygtrie - For Index Construction

Authors

Archana Prakash - GitHub
Hritvik Patel - GitHub
Shreyas BS - GitHub
Sriram SK - GitHub

License

This project is licensed under the MIT License - see the LICENSE file for details

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
construct_index.py		construct_index.py
main.py		main.py
query.py		query.py
ranking.py		ranking.py
requirements.txt		requirements.txt
timer.py		timer.py
word_processor.py		word_processor.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Yet Another Google (YAG) [Algorithms for Information Retrieval]

Getting Started

Prerequisites

Executing Code

Built With

Authors

License

About

Releases

Packages

Contributors 4

Languages

License

hritvikpatel4/YAG

Folders and files

Latest commit

History

Repository files navigation

Yet Another Google (YAG) [Algorithms for Information Retrieval]

Getting Started

Prerequisites

Executing Code

Built With

Authors

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages