### Rewrite-Retrive-Read

Strategy to prompt the LLM to rewrite query before performing retrieval, as the user query might be worded poorly

In [25]:
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_postgres.vectorstores import PGVector
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import chain
from langchain_core.runnables import Runnable
import re

In [2]:
connection = 'postgresql+psycopg://langchain:langchain@localhost:6024/langchain'
collection_name = "Harry_Potter_Complete"
embedding_model = OpenAIEmbeddings(model="text-embedding-3-small")

db = PGVector(
    embeddings=embedding_model,
    connection=connection,
    collection_name=collection_name
)

retriever = db.as_retriever()

In [3]:
prompt = ChatPromptTemplate.from_template("""Answer the question based only on the provided context: {context}
question: {question}
""")
llm = ChatOpenAI(model='gpt-3.5-turbo', temperature=0)

In [4]:
@chain
def qa(question):
    docs = retriever.invoke(question)
    context = '\n\n'.join(d.page_content for d in docs)
    formatted = prompt.invoke({"context" : context, "question" : question})
    answer = llm.invoke(formatted)
    answer_text = answer.content if hasattr(answer, 'content') else answer
    return answer_text

In [5]:
qa.invoke("""Today I woke up and brushed my teeth, then I sat down to read the news. But then I forgot the food on the cooker. What are the names of the houses in Hogwarts?""")

'The names of the houses in Hogwarts are Gryffindor, Hufflepuff, Ravenclaw, and Slytherin.'

#### I was hoping it would not answer the question but it did anyway üòÖ

In [6]:
rewrite_prompt = ChatPromptTemplate.from_template("""Provide a better search query for web search engine to answer the given question, end the queries with ‚Äô**‚Äô. Question: {question} Answer:""")

rewriter_runnable = rewrite_prompt | llm

def parse_rewriter_output(message):
    # message may be a ChatMessage-like object or string
    text = message.content if hasattr(message, "content") else str(message)
    # split at "**" and return first piece, trimming quotes/spaces
    return text.strip().strip('"').split("**")[0].strip()

@chain
def qa_rrr(question: str):
    # get rewritten query (invoke the runnable, pass mapping)
    rewritten_msg = rewriter_runnable.invoke({"question": question})
    # parse the rewriter output into a query string
    query = parse_rewriter_output(rewritten_msg)

    # retrieve docs using the rewritten query
    docs = retriever.invoke(query)
    context = "\n\n".join(d.page_content for d in docs)

    # prepare the final prompt (use the ChatPromptTemplate)
    formatted_prompt = prompt.invoke({"context": context, "question": question})

    # call the LLM
    answer_msg = llm.invoke(formatted_prompt)
    return answer_msg.content if hasattr(answer_msg, "content") else str(answer_msg)

In [7]:
qa_rrr.invoke("""Today I woke up and brushed my teeth, then I sat down to read the news. But then I forgot the food on the cooker. 
    What are the names of the houses in Hogwarts?""")

'The names of the houses in Hogwarts are Gryffindor, Hufflepuff, Ravenclaw, and Slytherin.'

### Multi-Query Retriever

In [26]:
perspective_prompt = ChatPromptTemplate.from_template("""You are an AI language model assistant. Your task is to generate five different versions of the given user question to retrieve relevant documents from a vector database.
By generating multiple perspectives on the user question, your goal is tohelp the user overcome some of the limitations of the distance-based
 similarity search. Provide these alternative questions separated by newlines. Original question: {question}""")

def parse_query_output(message):
    text = message.content.split('\n') if hasattr(message, "content") else str(message)
    pattern = r"^\d+\.\s*"
    cleaned = []
    for item in text:
        cleaned_item = re.sub(pattern, "", item)
        cleaned.append(cleaned_item.strip())
    return cleaned

query_gen = perspective_prompt | llm | parse_query_output

In [27]:
query_gen.invoke("What is the name of Harry's owl?")

['Can you tell me the name of the owl belonging to Harry?',
 "What is the owl's name that Harry owns?",
 "Do you know the name of Harry's pet owl?",
 'Which owl is associated with Harry by name?',
 'What is the specific name of the owl that belongs to Harry?']

In [28]:
retriever.invoke("Can you tell me the name of the owl belonging to Harry?")

[Document(id='9c4c9b62-9742-4b53-9beb-1af670bef233', metadata={'source': 'data/HP1.txt'}, page_content='Questions exploded inside Harry√¢‚Ç¨‚Ñ¢s head like fireworks and he couldn√¢‚Ç¨‚Ñ¢t decide which to ask first. After a few minutes he stammered, √¢‚Ç¨≈ìWhat does it mean, they await my owl?√¢‚Ç¨\x9d\n\n√¢‚Ç¨≈ìGallopin√¢‚Ç¨‚Ñ¢ Gorgons, that reminds me,√¢‚Ç¨\x9d said Hagrid, clapping a hand to his forehead with enough force to knock over a cart horse, and from yet another pocket inside his overcoat he pulled an owl √¢‚Ç¨‚Äù a real, live, rather ruffled-looking owl √¢‚Ç¨‚Äù a long quill, and a roll of parchment. With his tongue between his teeth he scribbled a note that Harry could read upside down:\n\n\n\nDear Professor Dumbledore,\n\nGiven Harry his letter.\n\nTaking him to buy his things tomorrow.\n\nWeather√¢‚Ç¨‚Ñ¢s horrible. Hope you√¢‚Ç¨‚Ñ¢re well.\n\nHagrid\n\n\n\nHagrid rolled up the note, gave it to the owl, which clamped it in its beak, went to the door, and threw the owl out 

In [29]:
def get_unique_union(document_list):
    # Flatten list of lists, and dedupe them
    deduped_docs = {
        doc.page_content: doc
            for sublist in document_list for doc in sublist
    }
    # return a flat list of unique docs
    return list(deduped_docs.values())

# query_gen: produces the queries, retriever get the documents, get_unique_union removes the duplicate documents (we're using page_content as key, hence deduping them)
# .batch, which runs all generated queries in parallel and returns a list of the results
retrieval_chain = query_gen | retriever.batch | get_unique_union

In [30]:
@chain
def multi_query_qa(question):
    # get the docs
    docs = retrieval_chain.invoke(question)
    context = '\n\n'.join(d.page_content for d in docs)
    formatted = prompt.invoke({"context" : context, "question" : question})
    return llm.invoke(formatted).content

In [31]:
multi_query_qa.invoke("Who did Harry see go through the barrier to platform nine and three-quarters first?")

'The oldest boy, Percy, went through the barrier to platform nine and three-quarters first.'

In [32]:
query_gen.invoke("Who did Harry see go through the barrier to platform nine and three-quarters first?")

['Who was the first person Harry witnessed passing through the barrier to platform nine and three-quarters?',
 'Which individual did Harry observe going through the barrier to platform nine and three-quarters initially?',
 'Who was the initial person that Harry saw crossing the barrier to platform nine and three-quarters?',
 'Who was the first individual Harry encountered passing through the barrier to platform nine and three-quarters?',
 'Who was the person that Harry first saw going through the barrier to platform nine and three-quarters?']

In [33]:
multi_query_qa.invoke(" Who was the first person Harry saw going through the barrier to platform nine and three-quarters?")

'The first person Harry saw going through the barrier to platform nine and three-quarters was the oldest boy with flaming red hair.'

In [36]:
qa.invoke("Who was the first person Harry saw going through the barrier to platform nine and three-quarters, what was his full name?")

'The first person Harry saw going through the barrier to platform nine and three-quarters was Percy Weasley.'

In [37]:
multi_query_qa.invoke("Who was the person that Harry saw as the first to go through the barrier to platform nine and three-quarters?")

'The person that Harry saw as the first to go through the barrier to platform nine and three-quarters was Percy, the oldest boy with flaming red hair.'

#### Welp, even the multi query isn't giving proper answer, unless asked for name specifically

In [38]:
multi_query_qa.invoke("According to his chocolate frog card, what three things is Albus Dumbledore most known for?")

"According to his chocolate frog card, Albus Dumbledore is most known for his defeat of the dark wizard Grindelwald in 1945, for the discovery of the twelve uses of dragon's blood, and his work on alchemy with his partner, Nicolas Flamel."

In [39]:
multi_query_qa.invoke("What house was Susan Bones sorted into?")

'Hufflepuff'

In [42]:
multi_query_qa.invoke("Put these students in order according to the length of time it took the Sorting Hat to sort them: Ron Weasley, Seamus Finnigan, and Draco Malfoy")

'Seamus Finnigan, Ron Weasley, Draco Malfoy'

In [43]:
multi_query_qa.invoke("What‚Äôs Nearly Headless Nick‚Äôs full name?")

'Sir Nicholas de Mimsy-Porpington'

In [44]:
multi_query_qa.invoke("Name all the members of the Gryffindor Quidditch team and their positions.")

'The members of the Gryffindor Quidditch team and their positions are:\n- Oliver Wood: Keeper\n- Angelina Johnson: Chaser\n- Katie Bell: Chaser\n- Fred Weasley: Beater\n- George Weasley: Beater\n- Harry Potter: Seeker'

In [45]:
multi_query_qa.invoke("Name all the protections on the Sorcerer‚Äôs Stone and who placed them in order")

"The protections on the Sorcerer's Stone are Fluffy (Hagrid), enchantments (Professors Sprout, Flitwick, McGonagall, Quirrell, and Snape), and Dumbledore himself."