# Query PDF Vector Database
# This notebook retrieves relevant passages from the PDF vector database based on user queries.

In [5]:
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import FAISS

In [6]:
# Load the saved index
index_path = "faiss_combined_index"  # Path to your saved index
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
vector_store = FAISS.load_local(index_path, embeddings, allow_dangerous_deserialization=True)

In [7]:
def search_in_pdf(query: str, k: int = 5):
    """Search for relevant passages in the PDF.
    
    Args:
        query (str): The search query
        k (int): Number of results to return
        
    Returns:
        list: List of relevant text passages
    """
    results = vector_store.search(query, k=k)
    return [result['text'] for result in results]

In [9]:
# Example usage
query = "How many types of probability are there in natural language processing?"  # Your query here
docs = vector_store.similarity_search(query, k=5)

print("Relevant passages:")
for i, doc in enumerate(docs, 1):
    print(f"\nPassage {i}:")
    print(doc.page_content)

Relevant passages:

Passage 1:
I Fundamental Algorithms for NLP: Some of the bigram probabilities above encode some facts that we think of as strictly
syntactic in nature, like the fact that what comes after eatis usually a noun or an
adjective, or that what comes after tois usually a verb. Others might be a fact about
the personal assistant task, like the high probability of sentences beginning with
the words I. And some might even be cultural rather than linguistic, like the higher
probability that people are looking for Chinese versus English food.

Passage 2:
I Fundamental Algorithms for NLP: P(ﬁshjThanks for all the )
Language models give us the ability to assign such a conditional probability to every
possible next word, giving us a distribution over the entire vocabulary. We can also
assign probabilities to entire sequences by combining these conditional probabilities
with the chain rule:
P(w1:n) =nY
i=1P(wijw<i)
The n-gram language models of Chapter 3 compute the probability of