# Advanced Retrieval with Langchain Part 2

We first set up langsmith, ragus to create test sets and then load the John Wick Data as in the previous notebook. We will also set up QDrant.

In [1]:
import os
from uuid import uuid4
from dotenv import load_dotenv
load_dotenv()

unique_id = uuid4().hex[0:8]

os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_PROJECT"] = f"Advanced RAG - {unique_id}"

In [2]:
from ragas.testset.generator import TestsetGenerator
from ragas.testset.evolutions import simple, reasoning, multi_context
from langchain_openai import ChatOpenAI, OpenAIEmbeddings

generator_llm = ChatOpenAI(model="gpt-3.5-turbo")
critic_llm = ChatOpenAI(model="gpt-4o-mini")
embeddings = OpenAIEmbeddings()

generator = TestsetGenerator.from_langchain(
    generator_llm,
    critic_llm,
    embeddings
)

distributions = {
    simple: 0.5,
    multi_context: 0.4,
    reasoning: 0.1
}

  from .autonotebook import tqdm as notebook_tqdm


In [3]:
from langchain_community.document_loaders.csv_loader import CSVLoader
from datetime import datetime, timedelta

documents = []

for i in range(1, 5):
  loader = CSVLoader(
      file_path=f"john_wick_{i}.csv",
      metadata_columns=["Review_Date", "Review_Title", "Review_Url", "Author", "Rating"]
  )

  movie_docs = loader.load()
  for doc in movie_docs:

    # Add the "Movie Title" (John Wick 1, 2, ...)
    doc.metadata["Movie_Title"] = f"John Wick {i}"

    # convert "Rating" to an `int`, if no rating is provided - assume 0 rating
    doc.metadata["Rating"] = int(doc.metadata["Rating"]) if doc.metadata["Rating"] else 0

    # newer movies have a more recent "last_accessed_at"
    doc.metadata["last_accessed_at"] = datetime.now() - timedelta(days=4-i)

  documents.extend(movie_docs)


In [4]:
documents[0]

Document(metadata={'source': 'john_wick_1.csv', 'row': 0, 'Review_Date': '6 May 2015', 'Review_Title': ' Kinetic, concise, and stylish; John Wick kicks ass.\n', 'Review_Url': '/review/rw3233896/?ref_=tt_urv', 'Author': 'lnvicta', 'Rating': 8, 'Movie_Title': 'John Wick 1', 'last_accessed_at': datetime.datetime(2024, 9, 28, 10, 37, 41, 849349)}, page_content=": 0\nReview: The best way I can describe John Wick is to picture Taken but instead of Liam Neeson it's Keanu Reeves and instead of his daughter it's his dog. That's essentially the plot of the movie. John Wick (Reeves) is out to seek revenge on the people who took something he loved from him. It's a beautifully simple premise for an action movie - when action movies get convoluted, they get bad i.e. A Good Day to Die Hard. John Wick gives the viewers what they want: Awesome action, stylish stunts, kinetic chaos, and a relatable hero to tie it all together. John Wick succeeds in its simplicity.")

In [27]:
#testset = generator.generate_with_langchain_docs(documents, 20, distributions, with_debugging_logs=False)
#import pickle
#with open('testset_ragas_john_wick.pkl', 'wb') as file:
    #pickle.dump(testset, file)

In [5]:
import pickle
with open('testset_ragas_john_wick.pkl', 'rb') as file:
    testset = pickle.load(file)
testset.to_pandas()

Unnamed: 0,question,contexts,ground_truth,evolution_type,metadata,episode_done
0,What is a key element that contributes to the ...,[: 13\nReview: Following on from two delirious...,The unique appeal of John Wick: Chapter 3 - Pa...,simple,"[{'source': 'john_wick_3.csv', 'row': 13, 'Rev...",True
1,What challenges does John Wick face after refu...,[: 20\nReview: After resolving his issues with...,After refusing to help Santino D'Antonio in th...,simple,"[{'source': 'john_wick_2.csv', 'row': 20, 'Rev...",True
2,What are some common cliches associated with t...,[: 13\nReview: ... slaughtering a line from th...,The answer to given question is not present in...,simple,"[{'source': 'john_wick_1.csv', 'row': 13, 'Rev...",True
3,What makes John Wick: Chapter 3 - Parabellum s...,[: 13\nReview: Following on from two delirious...,John Wick: Chapter 3 - Parabellum stands out a...,simple,"[{'source': 'john_wick_3.csv', 'row': 13, 'Rev...",True
4,What factors contribute to the increase in bod...,[: 14\nReview: Another significant increase in...,The answer to given question is not present in...,simple,"[{'source': 'john_wick_3.csv', 'row': 14, 'Rev...",True
5,What is the significance of practical stunt wo...,[: 22\nReview: John Wick is one of my favourit...,Practical stunt work in the action movie John ...,simple,"[{'source': 'john_wick_2.csv', 'row': 22, 'Rev...",True
6,What elements of the film John Wick can be des...,"[: 9\nReview: At first glance, John Wick sound...",The film John Wick can be described as having ...,simple,"[{'source': 'john_wick_1.csv', 'row': 9, 'Revi...",True
7,How does the level of violence in the third Jo...,[: 20\nReview: Sadly the third John Wick film ...,The level of violence in the third John Wick f...,simple,"[{'source': 'john_wick_3.csv', 'row': 20, 'Rev...",True
8,Why does Hollywood focus more on Marvel movies...,[: 10\nReview: Most American action flicks rel...,Hollywood focuses more on Marvel movies due to...,simple,"[{'source': 'john_wick_4.csv', 'row': 10, 'Rev...",True
9,Which character in John Wick: Chapter 2 initia...,[: 20\nReview: After resolving his issues with...,Santino D'Antonio initiates a blood oath with ...,multi_context,"[{'source': 'john_wick_2.csv', 'row': 20, 'Rev...",True


Below are retrievers for Naive, BM25 and Multi-Query 

In [8]:
from langchain_community.vectorstores  import Qdrant
from langchain_openai import OpenAIEmbeddings
from langchain_community.retrievers import BM25Retriever
from langchain.retrievers.contextual_compression import ContextualCompressionRetriever
from langchain.retrievers.multi_query import MultiQueryRetriever
from langchain_openai import ChatOpenAI
#from langchain_cohere import CohereRerank
chat_model = ChatOpenAI()

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

vectorstore = Qdrant.from_documents(
    documents,
    embeddings,
    location=":memory:",
    collection_name="johnwick_collection"
)
naive_retriever = vectorstore.as_retriever(search_kwargs={"k" : 10})
bm25_retriever = BM25Retriever.from_documents(documents)
multi_query_retriever = MultiQueryRetriever.from_llm(
    retriever=naive_retriever, llm=chat_model
)


Below is for Parent Document Retriever

In [11]:
from langchain.retrievers import ParentDocumentRetriever
from langchain.storage import InMemoryStore
from langchain_text_splitters import RecursiveCharacterTextSplitter
from qdrant_client import QdrantClient, models

parent_docs = documents
child_splitter = RecursiveCharacterTextSplitter(chunk_size=200)

client = QdrantClient(location=":memory:")

client.create_collection(
    collection_name="full_documents",
    vectors_config=models.VectorParams(size=1536, distance=models.Distance.COSINE)
)

parent_document_vectorstore = Qdrant(
    collection_name="full_documents", 
    embeddings=OpenAIEmbeddings(model="text-embedding-3-small"), 
    client=client
)

store = InMemoryStore()

parent_document_retriever = ParentDocumentRetriever(
    vectorstore = parent_document_vectorstore,
    docstore=store,
    child_splitter=child_splitter,
)

parent_document_retriever.add_documents(parent_docs, ids=None)

Below is Contextual Compression (Using Reranking). Not sure why cohere is no longer working, probably due to library conflicts

In [14]:
from langchain.retrievers.contextual_compression import ContextualCompressionRetriever
from langchain_cohere import CohereRerank

RuntimeError: no validator found for <class 'pydantic.types.SecretStr'>, see `arbitrary_types_allowed` in Config

Below is the setup for Ensemble retriever

In [13]:
from langchain.retrievers import EnsembleRetriever

retriever_list = [bm25_retriever, naive_retriever, parent_document_retriever,  multi_query_retriever]
equal_weighting = [1/len(retriever_list)] * len(retriever_list)

ensemble_retriever = EnsembleRetriever(
    retrievers=retriever_list, weights=equal_weighting
)

Finally there is a Semantic chunker. 

In [28]:
from langchain_experimental.text_splitter import SemanticChunker

semantic_chunker = SemanticChunker(
    embeddings,
    breakpoint_threshold_type="percentile"
)
semantic_documents = semantic_chunker.split_documents(documents)

semantic_vectorstore = Qdrant.from_documents(
    semantic_documents,
    embeddings,
    location=":memory:",
    collection_name="johnwick_collection_semantic"
)
semantic_retriever = semantic_vectorstore.as_retriever(search_kwargs={"k" : 10})

In [20]:
from langchain_core.runnables import RunnablePassthrough
from operator import itemgetter
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate

RAG_TEMPLATE = """\
You are a helpful and kind assistant. Use the context provided below to answer the question.

If you do not know the answer, or are unsure, say you don't know.

Query:
{question}

Context:
{context}
"""

rag_prompt = ChatPromptTemplate.from_template(RAG_TEMPLATE)
def create_chain(retriever):
    return ({"context": itemgetter("question") | retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")})

In [21]:
from langsmith import traceable
from datasets import Dataset
from ragas import evaluate
from ragas.metrics import (
    faithfulness,
    answer_relevancy,
    answer_correctness,
    context_recall,
    context_precision,
)

test_df = testset.to_pandas()
test_questions = test_df["question"].values.tolist()
test_groundtruths = test_df["ground_truth"].values.tolist()

metrics = [
    faithfulness,
    answer_relevancy,
    context_recall,
    context_precision,
    answer_correctness,
]

def run_evaluation(retrieval_chain):
    answers = []
    contexts = []

    for question in test_questions:
        response = retrieval_chain.invoke({"question" : question})
        answers.append(response["response"].content)
        contexts.append([context.page_content for context in response["context"]])
    response_dataset = Dataset.from_dict({
        "question" : test_questions,
        "answer" : answers,
        "contexts" : contexts,
        "ground_truth" : test_groundtruths
    })
    return evaluate(response_dataset, metrics)

In [29]:
mapper = {
    "naive_retriever": naive_retriever,
    "bm25_retriever": bm25_retriever,
    "multi_query_retriever": multi_query_retriever,
    "parent_document_retriever": parent_document_retriever,
    "ensemble_retriever": ensemble_retriever,
    "semantic_retriever": semantic_retriever
}
 
def run_all():
    output = {}
    for ret in mapper:
        unique_id = uuid4().hex[0:8]
        @traceable(
                run_type="llm",
                name="OpenAI Call Decorator",
                project_name=f"{ret} - {unique_id}"
        )
        def _run_eval(retriever):
            return run_evaluation(retriever)

        output[ret] = _run_eval(create_chain(mapper[ret]))
    return output
        
output = run_all()

Evaluating: 100%|██████████| 85/85 [00:42<00:00,  2.01it/s]
Evaluating: 100%|██████████| 85/85 [00:33<00:00,  2.57it/s]
Evaluating: 100%|██████████| 85/85 [00:53<00:00,  1.58it/s]
Evaluating: 100%|██████████| 85/85 [00:29<00:00,  2.87it/s]
Evaluating: 100%|██████████| 85/85 [01:01<00:00,  1.38it/s]
Evaluating: 100%|██████████| 85/85 [00:49<00:00,  1.73it/s]


In [30]:
output

{'naive_retriever': {'faithfulness': 0.7670, 'answer_relevancy': 0.9133, 'context_recall': 0.9020, 'context_precision': 0.7678, 'answer_correctness': 0.6934},
 'bm25_retriever': {'faithfulness': 0.7085, 'answer_relevancy': 0.8600, 'context_recall': 0.8431, 'context_precision': 0.7010, 'answer_correctness': 0.6436},
 'multi_query_retriever': {'faithfulness': 0.7107, 'answer_relevancy': 0.8506, 'context_recall': 0.9314, 'context_precision': 0.7168, 'answer_correctness': 0.6541},
 'parent_document_retriever': {'faithfulness': 0.6672, 'answer_relevancy': 0.9118, 'context_recall': 0.7353, 'context_precision': 0.7353, 'answer_correctness': 0.7265},
 'ensemble_retriever': {'faithfulness': 0.8111, 'answer_relevancy': 0.9199, 'context_recall': 0.9020, 'context_precision': 0.7000, 'answer_correctness': 0.7352},
 'semenatic_retriever': {'faithfulness': 0.8436, 'answer_relevancy': 0.9704, 'context_recall': 0.8784, 'context_precision': 0.7038, 'answer_correctness': 0.6228}}

In [32]:
import pandas as pd
df_naive = pd.DataFrame(list(output["naive_retriever"].items()), columns=['Metric', 'Naive'])
df_bm25 = pd.DataFrame(list(output["bm25_retriever"].items()), columns=['Metric', 'BM25'])
df_multi_query = pd.DataFrame(list(output["multi_query_retriever"].items()), columns=['Metric','MultiQuery'])
df_parent= pd.DataFrame(list(output["parent_document_retriever"].items()), columns=['Metric', 'ParentDoc'])
df_ensemble = pd.DataFrame(list(output["ensemble_retriever"].items()), columns=['Metric', 'Ensemble'])
df_semantic = pd.DataFrame(list(output["semenatic_retriever"].items()), columns=['Metric', 'Semantic'])
df_merged = df_naive.merge(df_bm25, on='Metric').merge(df_multi_query, on='Metric').merge(df_parent, on='Metric')
df_merged = df_merged.merge(df_ensemble, on='Metric').merge(df_semantic, on='Metric')
df_merged

Unnamed: 0,Metric,Naive,BM25,MultiQuery,ParentDoc,Ensemble,Semantic
0,faithfulness,0.766993,0.708497,0.710668,0.667157,0.811091,0.843604
1,answer_relevancy,0.913262,0.860031,0.850625,0.911831,0.919855,0.970443
2,context_recall,0.901961,0.843137,0.931373,0.735294,0.901961,0.878431
3,context_precision,0.767777,0.70098,0.716796,0.735294,0.699976,0.703841
4,answer_correctness,0.693357,0.643573,0.654057,0.726511,0.735202,0.622808


 It looks semantic chunking results in the best answer relevancy while the ensemble retriever appears to have the best answer correctness. Overrall semantic does well. MultiQuery does really well con context recall.

 As for cost one can look at the Langsmith traces. Of course the ensemble would cost the most as it makes multiple llm calls for each data point. Parent Document and MB25 are the cheapest but BM25 is not recommended due to performance.
 
![image](langsmith.png)