## TF-IDF Retriever

In this notebook, we will develop a TF-IDF Retriever on a small dataset. Let's import the TFIDFRetriever class

In [1]:
import sys
import os

sys.path.append(os.path.abspath(".."))
from src.TF_IDFRetriever import TFIDFRetriever

In [2]:
# Create TF-IDF retriever
retriever = TFIDFRetriever()

In [3]:
# Let us load the BNS sections
def load_md_files(base_folder):
    md_files_dict = []

    # Iterate through all folders in the base directory
    for folder in os.listdir(base_folder):
        folder_path = os.path.join(base_folder, folder)

        # Check if it's a directory
        if os.path.isdir(folder_path):
            # Iterate through all .md files in the folder
            for filename in os.listdir(folder_path):
                if filename.endswith(".md"):
                    file_path = os.path.join(folder_path, filename)

                    # Open the file and read its contents
                    with open(file_path, "r", encoding="utf-8") as file:
                        file_contents = file.read()

                    # Store the contents in the dictionary with the key being "folder/filename"
                    temp_doc = {}
                    temp_doc["_id"] = f"{folder}/{filename}"
                    temp_doc["text"] = file_contents
                    md_files_dict.append(temp_doc)

    return md_files_dict

In [4]:
bns_data = load_md_files("ilab_sdg/")

In [6]:
# Add some documents
for each_section in bns_data:
    retriever.add_document(each_section["_id"], each_section["text"])

retriever.update_index()

In [7]:
# Search for a query
print("Search for 'robbery':")
results = retriever.search("robbery")
print(results)

# Get the matching documents
print("\nTop matching documents:")
for doc_id, score in results:
    print(f"Document {doc_id} (Score: {score:.4f}): {retriever.documents[doc_id]}")

Search for 'robbery':
[('Chapter_XVII/Section_311.md', 0.3283260006317209), ('Chapter_XVII/Section_312.md', 0.3116363975419594), ('Chapter_XVII/Section_313.md', 0.25107645287498526), ('Chapter_IV/Section_56.md', 0.2252681703552038), ('Chapter_III/Section_35.md', 0.2065372123740925), ('Chapter_XIV/Section_254.md', 0.20380948750630068), ('Chapter_XVII/Section_310.md', 0.18244384710317216), ('Chapter_XVII/Section_309.md', 0.14342161585986243), ('Chapter_IV/Section_59.md', 0.1432756833333348), ('Chapter_III/Section_41.md', 0.11395446924549021)]

Top matching documents:
Document Chapter_XVII/Section_311.md (Score: 0.3283): CHAPTER XVII: OF OFFENCES AGAINST PROPERTY

Subchapter: Of robbery and dacoity

Section 311: Robbery, or dacoity, with attempt to cause death or grievous hurt
If, at the time of committing robbery or dacoity, the offender uses any deadly weapon, or causes grievous hurt to any person, or attempts to cause death or grievous hurt to any person, the imprisonment with which su




In [8]:
# Search for a query
print("Search for 'robbery and chain-snatching':")
results = retriever.search("robbery and chain-snatching")
print(results)

# Get the matching documents
print("\nTop matching documents:")
for doc_id, score in results:
    print(f"Document {doc_id} (Score: {score:.4f}): {retriever.documents[doc_id]}")

Search for 'robbery and chain-snatching':
[('Chapter_XVII/Section_304.md', 0.33812669799394435), ('Chapter_XVII/Section_311.md', 0.17157725031788007), ('Chapter_XVII/Section_312.md', 0.16285556454968503), ('Chapter_XVII/Section_313.md', 0.1313873776686545), ('Chapter_VI/Section_112.md', 0.13036309055536316), ('Chapter_IV/Section_56.md', 0.11783448296494729), ('Chapter_III/Section_35.md', 0.10803659298491969), ('Chapter_XIV/Section_254.md', 0.10657189517963878), ('Chapter_XVII/Section_310.md', 0.09551570002679212), ('Chapter_XVII/Section_309.md', 0.07503556081766766)]

Top matching documents:
Document Chapter_XVII/Section_304.md (Score: 0.3381): CHAPTER XVII: OF OFFENCES AGAINST PROPERTY

Subchapter: Of theft

Section 304: Snatching
(1) Theft is snatching if, in order to commit theft, the offender suddenly or quickly or forcibly seizes or secures or grabs or takes away from any person or from his possession any movable property. (2) Whoever commits snatching, shall be punished with impr