## Boolean Retriever

In this notebook, we will develop a Boolean Retriever on a small dataset. Let's import the BooleanRetriever class

In [None]:
import sys
import os

sys.path.append(os.path.abspath(".."))
from src.BooleanRetriever import BooleanRetrieval

#### We will load the retriever and the dataset

In [None]:
bool_retriever = BooleanRetrieval()

We will add some documents

In [None]:
bool_retriever.add_document(1, "Python is a popular programming language.")
bool_retriever.add_document(2, "Java is also a popular programming language.")
bool_retriever.add_document(3, "Python and Java are both object-oriented.")
bool_retriever.add_document(4, "Python is known for its simplicity and readability.")
bool_retriever.add_document(5, "Data science often uses Python for analysis.")

#### We will build the index

In [None]:
bool_retriever.build_index()

We will print the documents

In [None]:
first_key = ""
for key in bool_retriever.documents:
    first_key = key
    break

print(f"Key is {first_key}")
print(bool_retriever.documents[first_key])

We will print the inverted index

In [None]:
# Let's see how Inverted Index looks like
for each_doc in bool_retriever.inverted_index["python"]:
    print(bool_retriever.documents[each_doc])

#### We will search using a simple query

In [None]:
query = "python AND java"
results = bool_retriever.search(query)
for doc_id, content in results.items():
    print(f"DocID: {doc_id}\nContent: {content}\n")

## Real-World Example

In [None]:
# let us load the Indian Cultural raw dataset
bool_retriever = BooleanRetrieval("BNS")

In [None]:
# Let us print some documents
print(bool_retriever.dataset[0]["_id"])
print(bool_retriever.dataset[0]["text"])

In [None]:
# Time to build index
bool_retriever.build_index()
for each_doc in bool_retriever.inverted_index["robbery"]:
    print(bool_retriever.documents[each_doc])

In [None]:
search_results = bool_retriever.search("robbery AND chain-snatching")

In [None]:
for doc_id in search_results:
    print(doc_id)
    print(search_results[doc_id])
    print("\n")