Skip to content

๐Ÿ” Explore info retrieval! Index compression, inverted index, ranking models (TF-IDF, BM25, language model), LSI, K-means, hierarchical cluster. Uncover insights efficiently! ๐Ÿš€๐Ÿ“š

Notifications You must be signed in to change notification settings

ishijaiswal18/Information-Retrieval

Repository files navigation

WebCraze: Unleash the Power of Information Retrieval! ๐ŸŒ

Welcome to the Info Retrieval Wonderland! This project is an exhilarating journey through the realms of web data, index compression, and data analysis. Whether you're a coding wizard or just getting started, there's something magical for everyone! ๐Ÿš€โœจ

๐Ÿ“š Section 1: Web Adventures

Part 1: Index Construction ๐Ÿ—๏ธ

Explore the web with Python magic! Crawl top 20 pages for queries like "Forests of India" and build an inverted index. Witness the dance of Boolean queries and unveil the power of Tiger AND Safari! ๐Ÿฏ๐ŸŒ

Part 2: Merge Posting-Lists ๐Ÿ”„

Master the art of intersecting postings with Boolean queries. Get ready for a wildlife adventure with queries like Wildlife AND Poaching. The most restrictive intersections first - because efficiency is key! ๐Ÿฆ“๐Ÿ”

Part 3: Adding Skip-Pointers โšก

Re-index with skip-pointers and witness the speed boost! Run queries 100 times and compare time taken for skip-pointers vs. without. Skip into the future of efficient searching! ๐Ÿ•ฐ๏ธ๐Ÿš€

Part 4: Spelling Correction ๐Ÿ“

Embark on a spelling adventure! Create a 3-gram index and correct queries like Tiger AND Saphari. Explore how our correction techniques impact the quality of retrieved documents. ๐Ÿ…๐Ÿ“š

Part 5: Scoring ๐ŸŒŸ

Level up! Extend the system to perform TF-IDF scoring. Witness the magic of sorted document IDs matching queries. Because scoring adds a touch of enchantment! โš–๏ธ๐Ÿ“Š

๐Ÿ“ฆ Section 2: Index Compression Magic

Part 1: Medical Abstracts Compression ๐Ÿ’Š

Dive into the world of medical abstracts. Create an inverted index and perform TF-IDF scoring. Unravel the secrets of 20 queries and see the space taken by the dictionary! ๐Ÿ“‘๐Ÿ’ก

Part 2: Compression Variations ๐Ÿ› ๏ธ

Apply dictionary string compression with and without blocking. Witness the evolution of dictionary sizes and query resolution times. Because compression is an art! ๐Ÿ—œ๏ธ๐Ÿ’ฝ

๐Ÿš€ Section 3: Data Analysis Odyssey

Part 1: Dataset Evaluation ๐Ÿ“Š

Analyze the dataset using Python magic! Evaluate inter-annotator agreement, build inverted indices, and compare Elasticsearch performance. It's a data analysis odyssey! ๐Ÿ“ˆ๐Ÿ”

Part 2: Pseudo-Relevance Feedback and Query Expansion ๐Ÿ”

Explore pseudo relevance feedback and query expansion. Find the alpha maximizing MAP and witness the impact on IR engine performance. Because relevance is the name of the game! ๐ŸŽฎ๐Ÿ’ฌ

Part 3: Document Ranking Showdown ๐Ÿฅ‹

Witness the battle of document ranking models - TF-IDF, BM25, Language Model, and LSI. Precision, recall, MAP - the metrics showdown begins! ๐Ÿ“Š๐Ÿ†

Part 4: Clustering Magic ๐ŸŒ

Apply K-Means and hierarchical clustering on chosen documents. Discover the secrets of RSS plots and compare purity/NMI values. Because clustering adds an extra layer of magic! โœจ๐Ÿ”—

๐Ÿ”— Section 4: Querying Beyond Boundaries ๐ŸŽฎ

Explore the universe of video game sales with Elasticsearch. Pose questions, extract insights, and unveil the gaming legends. Because every game has a story! ๐Ÿš€๐ŸŽฎ

๐Ÿš€ Getting Started

  1. Clone the repository.
  2. Navigate to the respective sections you want to explore.
  3. Follow the instructions in each section's README to unleash the magic.

Explore the wonders of information retrieval and data analysis! Each section is a new adventure, so grab your keyboard and embark on a journey through the code. Happy coding! ๐Ÿš€๐ŸŽฎ๐Ÿ•ต๏ธโ€โ™‚๏ธ

About

๐Ÿ” Explore info retrieval! Index compression, inverted index, ranking models (TF-IDF, BM25, language model), LSI, K-means, hierarchical cluster. Uncover insights efficiently! ๐Ÿš€๐Ÿ“š

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published