# Rag From Scratch: Query Transformations

Query transformations are a set of approaches focused on re-writing and / or modifying questions for retrieval.

## Enviornment

`(1) Packages`

In [1]:
# ! pip install langchain_community tiktoken langchain-openai langchainhub chromadb langchain

`(2) LangSmith`

https://docs.smith.langchain.com/

In [1]:
import os
os.environ['LANGCHAIN_TRACING_V2'] = 'true'
os.environ['LANGCHAIN_ENDPOINT'] = 'https://api.smith.langchain.com'
# os.environ['LANGCHAIN_API_KEY'] = "YOUR_API_KEY"

In [2]:
# os.environ['OPENAI_API_KEY'] = <your-api-key>

## Part 5: Multi Query

Flow:

Docs:

* https://python.langchain.com/docs/modules/data_connection/retrievers/MultiQueryRetriever

### Index

In [3]:
#### INDEXING ####

# Load blog
import bs4
from langchain_community.document_loaders import WebBaseLoader
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-03-15-prompt-engineering/",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    ),
)
blog_docs = loader.load()

# Split
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=300, 
    chunk_overlap=50)

# Make splits
splits = text_splitter.split_documents(blog_docs)

# Index
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
vectorstore = Chroma.from_documents(documents=splits, 
                                    embedding=OpenAIEmbeddings())

retriever = vectorstore.as_retriever()

### Prompt

In [4]:
from langchain.prompts import ChatPromptTemplate

# Multi Query: Different Perspectives
template = """You are an AI language model assistant. Your task is to generate {num_of_prompts_to_generate} 
different versions of the given user question to retrieve relevant documents from a vector 
database. By generating multiple perspectives on the user question, your goal is to help
the user overcome some of the limitations of the distance-based similarity search. 
Provide these alternative questions separated by newlines. Original question: {question}"""
prompt_perspectives = ChatPromptTemplate.from_template(template)

from langchain_core.output_parsers import StrOutputParser
from langchain_openai import ChatOpenAI

generate_queries = (
    prompt_perspectives 
    | ChatOpenAI(temperature=0) 
    | StrOutputParser() 
    | (lambda x: x.split("\n"))
)

In [5]:
generate_queries

ChatPromptTemplate(input_variables=['num_of_prompts_to_generate', 'question'], messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['num_of_prompts_to_generate', 'question'], template='You are an AI language model assistant. Your task is to generate {num_of_prompts_to_generate} \ndifferent versions of the given user question to retrieve relevant documents from a vector \ndatabase. By generating multiple perspectives on the user question, your goal is to help\nthe user overcome some of the limitations of the distance-based similarity search. \nProvide these alternative questions separated by newlines. Original question: {question}'))])
| ChatOpenAI(client=<openai.resources.chat.completions.Completions object at 0x729e4d9c4b80>, async_client=<openai.resources.chat.completions.AsyncCompletions object at 0x729e4da31580>, temperature=0.0, openai_api_key=SecretStr('**********'), openai_proxy='')
| StrOutputParser()
| RunnableLambda(...)

In [6]:
from langchain.load import dumps, loads

def get_unique_union(documents: list[list]):
    """ Unique union of retrieved docs """
    # Flatten list of lists, and convert each Document to string
    flattened_docs = [dumps(doc) for sublist in documents for doc in sublist]
    # Get unique documents
    unique_docs = list(set(flattened_docs))
    # Return
    return [loads(doc) for doc in unique_docs]

# Retrieve
question = "What is few shot?"
retrieval_chain = generate_queries | retriever.map() | get_unique_union
docs = retrieval_chain.invoke({"question":question, "num_of_prompts_to_generate": 5})
len(docs)

  warn_beta(


5

In [21]:
from operator import itemgetter
from langchain_openai import ChatOpenAI
from langchain_core.runnables import RunnablePassthrough

# RAG
quiz_template = """
You are an assistant for bank_name bank products and answer customer questions.
Use fragments of the received context to answer the question.
If you don't know the answer, say that you don't know, don't make up an answer.
Use a maximum of three sentences and be concise.\n
Question: {question} \n
Context: {context} \n
Answer:
"""

prompt = ChatPromptTemplate.from_template(quiz_template)

llm = ChatOpenAI(temperature=0)

rag_chain = (
    {
      "context": retriever, 
      "question": RunnablePassthrough()
     } 
    | prompt
    | llm
    | StrOutputParser()
)

In [13]:
answer = rag_chain.invoke("What is few shot?")

In [14]:
answer

'Few-shot learning presents a set of high-quality demonstrations, each consisting of both input and desired output, on the target task. As the model first sees good examples, it can better understand human intention and criteria for what kinds of answers are wanted. Therefore, few-shot learning often leads to better performance than zero-shot. However, it comes at the cost of more token consumption and may hit the context length limit when input and output text are long.'

In [22]:
from operator import itemgetter
from langchain_openai import ChatOpenAI
from langchain_core.runnables import RunnablePassthrough

# RAG
template = """Answer the following question based on this context:

{context}

Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

llm = ChatOpenAI(temperature=0)

final_rag_chain = (
    {"context": retrieval_chain, 
     "question": itemgetter("question")} 
    | prompt
    | llm
    | StrOutputParser()
)

final_rag_chain.invoke({"question":question, "num_of_prompts_to_generate": 5})

'Few-shot learning presents a set of high-quality demonstrations, each consisting of both input and desired output, on the target task. As the model first sees good examples, it can better understand human intention and criteria for what kinds of answers are wanted. Therefore, few-shot learning often leads to better performance than zero-shot. However, it comes at the cost of more token consumption and may hit the context length limit when input and output text are long.'

#### Generate answers for each prompt

In [35]:

test_questions = ["What is few shot?", "Explain the concept of few shot learning?", "What is the largest ocean on Earth?"]

answers = [{f"{question}": rag_chain.invoke(question)} for question in test_questions ]


In [36]:
data = answers

### Get the scores of each prompt to evaluate it's performance - Use sentence transformer

In [37]:
from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity

# Initialize the model
model = SentenceTransformer('all-MiniLM-L6-v2')

for item in data:
    for question, answer in item.items():
        # Generate embeddings
        question_embedding = model.encode([question])
        answer_embedding = model.encode([answer])

        # Calculate cosine similarity
        score = cosine_similarity(question_embedding, answer_embedding)

        print(f"Question: {question}")
        print(f"Answer: {answer}")
        print(f"Score: {score[0][0]}")
        print("\n")
        



Question: What is few shot?
Answer: Few-shot learning involves providing a model with a few high-quality demonstrations to improve its understanding of human intention and criteria for desired answers. This method often leads to better performance than zero-shot learning but may require more token consumption and could hit context length limits with long input and output text. For more information, you can refer to the source: https://lilianweng.github.io/posts/2023-03-15-prompt-engineering/.
Score: 0.6792625188827515


Question: Explain the concept of few shot learning?
Answer: Few-shot learning involves providing the model with a few high-quality demonstrations to better understand human intention and criteria for desired answers. This approach often leads to better performance than zero-shot learning but may consume more tokens and hit context length limits with long input and output text. Instruction prompting is used to explain the task intent to the model and improve alignment wi

In [None]:
# Generate test cases or test scenarios for comparing prompts (the scenarios are just situatations and not actual questions
# while the expected output of each scenario is the general expectation of what a prompt should look like in that situation)

test_scenarios = [
    {
      "scenario"
    }

In [40]:
def update_elo_ratings(rating1, rating2, score):
  """
  Update ELO ratings based on the score of a game.

  Parameters:
  rating1 (float): The ELO rating of player 1.
  rating2 (float): The ELO rating of player 2.
  score (float): The score of the game. 1 if player 1 wins, 0 if player 2 wins, 0.5 for a draw.

  Returns:
  float: The updated ELO rating of player 1.
  float: The updated ELO rating of player 2.
  """
  # Calculate expected scores
  expected_score1 = 1 / (1 + 10 ** ((rating2 - rating1) / 400))
  expected_score2 = 1 / (1 + 10 ** ((rating1 - rating2) / 400))

  # Update ratings
  k = 32
  rating1 = rating1 + k * (score - expected_score1)
  rating2 = rating2 + k * ((1 - score) - expected_score2)

  return rating1, rating2
# Number of rounds
num_rounds = 10

# Initialize ELO ratings
elo_ratings = {question: 1000 for question in test_questions}

# Calculate scores and update ELO ratings
for _ in range(num_rounds):
  for item in data:
    for question1, answer1 in item.items():
      for question2, answer2 in item.items():
        if question1 != question2:
          # Generate embeddings
          embedding1 = model.encode([answer1])
          embedding2 = model.encode([answer2])

          # Calculate cosine similarity
          score1 = cosine_similarity(embedding1, model.encode([question1]))[0][0]
          score2 = cosine_similarity(embedding2, model.encode([question2]))[0][0]

          # Update ELO ratings
          if score1 > score2:
            elo_ratings[question1], elo_ratings[question2] = update_elo_ratings(elo_ratings[question1], elo_ratings[question2], 1)
          elif score1 < score2:
            elo_ratings[question1], elo_ratings[question2] = update_elo_ratings(elo_ratings[question1], elo_ratings[question2], 0)
          else:
            elo_ratings[question1], elo_ratings[question2] = update_elo_ratings(elo_ratings[question1], elo_ratings[question2], 0.5)

print("ELO ratings:", elo_ratings)

ELO ratings: {'What is few shot?': 1000, 'Explain the concept of few shot learning?': 1000, 'What is the largest ocean on Earth?': 1000}
