# LangChain: Evaluation

## Outline:

* Example generation
* Manual evaluation (and debuging)
* LLM-assisted evaluation
* LangChain evaluation platform

In [1]:
import os

from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file

azure_openai_api_key = os.getenv("AZURE_OPENAI_API_KEY_4")
azure_openai_api_endpoint = os.getenv("AZURE_OPENAI_API_ENDPOINT_4")
deployment_name = os.getenv("AZURE_DEPLOYMENT_NAME_4")


## Create our QandA application

In [2]:
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI
from langchain.document_loaders import CSVLoader
from langchain.indexes import VectorstoreIndexCreator
from langchain.vectorstores import DocArrayInMemorySearch

In [3]:
file = 'OutdoorClothingCatalog_1000.csv'
loader = CSVLoader(file_path=file)
data = loader.load()

In [4]:
from langchain.embeddings import OpenAIEmbeddings


index = VectorstoreIndexCreator(
    embedding = OpenAIEmbeddings(),
    vectorstore_cls=DocArrayInMemorySearch
).from_loaders([loader])

  embedding = OpenAIEmbeddings(),


**La chaîne RetrievalQA est dépreciée depuis peu, elle permettait la création de questions-réponses en langage naturel sur une source de données à l'aide de retrieval-augmented generation (RAG).**

In [7]:
from langchain_openai import AzureChatOpenAI

llm = AzureChatOpenAI(api_key=azure_openai_api_key,
                        api_version="2023-12-01-preview",
                        azure_endpoint=azure_openai_api_endpoint,
                        model=deployment_name,
                        temperature=0.9
                        )

qa = RetrievalQA.from_chain_type(
    llm=llm, 
    chain_type="stuff", 
    retriever=index.vectorstore.as_retriever(), 
    verbose=True,
    chain_type_kwargs = {
        "document_separator": "<<<<>>>>>"
    }
)

### Coming up with test datapoints

In [5]:
data[10]

Document(metadata={'source': 'OutdoorClothingCatalog_1000.csv', 'row': 10}, page_content=": 10\nname: Cozy Comfort Pullover Set, Stripe\ndescription: Perfect for lounging, this striped knit set lives up to its name. We used ultrasoft fabric and an easy design that's as comfortable at bedtime as it is when we have to make a quick run out.\n\nSize & Fit\n- Pants are Favorite Fit: Sits lower on the waist.\n- Relaxed Fit: Our most generous fit sits farthest from the body.\n\nFabric & Care\n- In the softest blend of 63% polyester, 35% rayon and 2% spandex.\n\nAdditional Features\n- Relaxed fit top with raglan sleeves and rounded hem.\n- Pull-on pants have a wide elastic waistband and drawstring, side pockets and a modern slim leg.\n\nImported.")

In [6]:
data[11]

Document(metadata={'source': 'OutdoorClothingCatalog_1000.csv', 'row': 11}, page_content=': 11\nname: Ultra-Lofty 850 Stretch Down Hooded Jacket\ndescription: This technical stretch down jacket from our DownTek collection is sure to keep you warm and comfortable with its full-stretch construction providing exceptional range of motion. With a slightly fitted style that falls at the hip and best with a midweight layer, this jacket is suitable for light activity up to 20° and moderate activity up to -30°. The soft and durable 100% polyester shell offers complete windproof protection and is insulated with warm, lofty goose down. Other features include welded baffles for a no-stitch construction and excellent stretch, an adjustable hood, an interior media port and mesh stash pocket and a hem drawcord. Machine wash and dry. Imported.')

### Hard-coded examples

In [8]:
examples = [
    {
        "query": "Do the Cozy Comfort Pullover Set\
        have side pockets?",
        "answer": "Yes"
    },
    {
        "query": "What collection is the Ultra-Lofty \
        850 Stretch Down Hooded Jacket from?",
        "answer": "The DownTek collection"
    }
]

### LLM-Generated examples

In [9]:
from langchain.evaluation.qa import QAGenerateChain


In [10]:
example_gen_chain = QAGenerateChain.from_llm(llm=llm)

In [11]:
# the warning below can be safely ignored

In [11]:
example_gen_chain

QAGenerateChain(verbose=False, prompt=PromptTemplate(input_variables=['doc'], input_types={}, partial_variables={}, template='You are a teacher coming up with questions to ask on a quiz. \nGiven the following document, please generate a question and answer based on that document.\n\nExample Format:\n<Begin Document>\n...\n<End Document>\nQUESTION: question here\nANSWER: answer here\n\nThese questions should be detailed and be based explicitly on information in the document. Begin!\n\n<Begin Document>\n{doc}\n<End Document>'), llm=AzureChatOpenAI(client=<openai.resources.chat.completions.Completions object at 0x000002402F2ADA30>, async_client=<openai.resources.chat.completions.AsyncCompletions object at 0x000002402F2AF470>, root_client=<openai.lib.azure.AzureOpenAI object at 0x0000024030D80B00>, root_async_client=<openai.lib.azure.AsyncAzureOpenAI object at 0x000002402F2ADA60>, model_name='gpt-4o-assistant', temperature=0.9, model_kwargs={}, openai_api_key=SecretStr('**********'), disab

In [12]:
new_examples = example_gen_chain.apply_and_parse(
    [{"doc": t} for t in data[:5]]
) 



In [15]:
new_examples[0]

{'qa_pairs': {'query': "What features contribute to the comfort and support of the Women's Campside Oxfords as described in the document?",
  'answer': "The Women's Campside Oxfords feature a super-soft canvas for a broken-in feel, a comfortable EVA innersole with Cleansport NXT® antimicrobial odor control, a moderate arch contour of the innersole, and an EVA foam midsole for cushioning and support."}}

### Combine examples

In [20]:
examples += new_examples

In [17]:
qa.run(examples[0]["query"])

  qa.run(examples[0]["query"])




[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


'Yes, the Cozy Comfort Pullover Set has side pockets.'

## Manual Evaluation

In [18]:
import langchain
langchain.debug = True

In [19]:
qa.run(examples[0]["query"])

[32;1m[1;3m[chain/start][0m [1m[chain:RetrievalQA] Entering Chain run with input:
[0m{
  "query": "Do the Cozy Comfort Pullover Set        have side pockets?"
}
[32;1m[1;3m[chain/start][0m [1m[chain:RetrievalQA > chain:StuffDocumentsChain] Entering Chain run with input:
[0m[inputs]
[32;1m[1;3m[chain/start][0m [1m[chain:RetrievalQA > chain:StuffDocumentsChain > chain:LLMChain] Entering Chain run with input:
[0m{
  "question": "Do the Cozy Comfort Pullover Set        have side pockets?",
  "context": ": 10\nname: Cozy Comfort Pullover Set, Stripe\ndescription: Perfect for lounging, this striped knit set lives up to its name. We used ultrasoft fabric and an easy design that's as comfortable at bedtime as it is when we have to make a quick run out.\n\nSize & Fit\n- Pants are Favorite Fit: Sits lower on the waist.\n- Relaxed Fit: Our most generous fit sits farthest from the body.\n\nFabric & Care\n- In the softest blend of 63% polyester, 35% rayon and 2% spandex.\n\nAdditiona

'Yes, the Cozy Comfort Pullover Set has side pockets.'

In [20]:
# Turn off the debug mode
langchain.debug = False

## A votre tour avec la version non dépréciée

Créez une chaine de retrievalQA avec LangChain, avec ou sans LCEL. Appliquez là aux `examples`, et testez là avec QAEvalChain comme dessous.

Indice: servez-vous de la documentation ! Notamment [ici](https://api.python.langchain.com/en/latest/chains/langchain.chains.retrieval_qa.base.RetrievalQA.html)

In [17]:
from langchain import hub
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

# See full prompt at https://smith.langchain.com/hub/rlm/rag-prompt
prompt = hub.pull("rlm/rag-prompt")


def format_docs(docs):
    return "<<<<>>>>>".join(doc.page_content for doc in docs)


qa_chain = (
    {
        "context": index.vectorstore.as_retriever() | format_docs,
        "question": RunnablePassthrough(),
    }
    | prompt
    | llm
    | StrOutputParser()
)

qa_chain.invoke("What is the Mountain Man Fleece Jacket made of?")



'The Mountain Man Fleece Jacket is made from 100% recycled polyester.'

L'implémentation de LCEL expose les aspects internes de ce qui se passe autour de la récupération, du formatage des documents et de leur transmission au LLM à travers un prompt, mais elle est plus verbeuse. Vous pouvez personnaliser et envelopper cette logique de composition dans une fonction helper, ou utiliser la méthode helper de plus haut niveau `create_retrieval_chain` et `create_stuff_documents_chain`.

## LLM assisted evaluation

Original Notebook :

In [None]:
predictions = qa.apply(examples)

In [None]:
from langchain.evaluation.qa import QAEvalChain

In [None]:
eval_chain = QAEvalChain.from_llm(llm)

In [None]:
graded_outputs = eval_chain.evaluate(examples, predictions)

In [None]:
for i, eg in enumerate(examples):
    print(f"Example {i}:")
    print("Question: " + predictions[i]['query'])
    print("Real Answer: " + predictions[i]['answer'])
    print("Predicted Answer: " + predictions[i]['result'])
    print("Predicted Grade: " + graded_outputs[i]['text'])
    print()

In [None]:
graded_outputs[0]

### A vous de jouer :

Ici on cherche à appliquer notre qa_chain à tous les examples, pour avoir une prédiction par exemple.

> Comment appliquer une chaine à une liste d'exemples ?

Une chaine est un objet `RunnableSequence`

Regarder la [documentation](https://api.python.langchain.com/en/latest/runnables/langchain_core.runnables.base.RunnableSequence.html) pour voir quelles méthodes on peut appeler sur une chaine.

In [54]:
# set the LANGCHAIN_API_KEY environment variable (create key in settings)
from langchain import hub
prompt = hub.pull("langchain-ai/retrieval-qa-chat")
prompt



ChatPromptTemplate(input_variables=['context', 'input'], optional_variables=['chat_history'], input_types={'chat_history': list[typing.Annotated[typing.Union[typing.Annotated[langchain_core.messages.ai.AIMessage, Tag(tag='ai')], typing.Annotated[langchain_core.messages.human.HumanMessage, Tag(tag='human')], typing.Annotated[langchain_core.messages.chat.ChatMessage, Tag(tag='chat')], typing.Annotated[langchain_core.messages.system.SystemMessage, Tag(tag='system')], typing.Annotated[langchain_core.messages.function.FunctionMessage, Tag(tag='function')], typing.Annotated[langchain_core.messages.tool.ToolMessage, Tag(tag='tool')], typing.Annotated[langchain_core.messages.ai.AIMessageChunk, Tag(tag='AIMessageChunk')], typing.Annotated[langchain_core.messages.human.HumanMessageChunk, Tag(tag='HumanMessageChunk')], typing.Annotated[langchain_core.messages.chat.ChatMessageChunk, Tag(tag='ChatMessageChunk')], typing.Annotated[langchain_core.messages.system.SystemMessageChunk, Tag(tag='SystemMes

In [46]:
# set the LANGCHAIN_API_KEY environment variable (create key in settings)
from langchain import hub
prompt = hub.pull("rlm/rag-prompt")
prompt



ChatPromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, metadata={'lc_hub_owner': 'rlm', 'lc_hub_repo': 'rag-prompt', 'lc_hub_commit_hash': '50442af133e61576e74536c6556cefe1fac147cad032f4377b60c436e6cdcb6e'}, messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, template="You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.\nQuestion: {question} \nContext: {context} \nAnswer:"), additional_kwargs={})])

In [49]:
from langchain.prompts import ChatPromptTemplate
prompt_template = """You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.
Question: {input} 
Context: {context} 
Answer:"""

prompt = ChatPromptTemplate.from_template(template=prompt_template)
prompt

ChatPromptTemplate(input_variables=['context', 'input'], input_types={}, partial_variables={}, messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'input'], input_types={}, partial_variables={}, template="You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.\nQuestion: {input} \nContext: {context} \nAnswer:"), additional_kwargs={})])

In [58]:
from langchain import hub
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

# See full prompt at https://smith.langchain.com/hub/rlm/rag-prompt
prompt = hub.pull("rlm/rag-prompt")


def format_docs(docs):
    return "<<<<>>>>>".join(doc.page_content for doc in docs)


qa_chain = (
    {
        "context": index.vectorstore.as_retriever() | format_docs,
        "question": RunnablePassthrough(),
    }
    | prompt
    | llm
    | StrOutputParser()
)

qa_chain.invoke("What is the Mountain Man Fleece Jacket made of?")



'The Mountain Man Fleece Jacket is made of 100% recycled polyester.'

In [77]:
batch_queries = []
examples_formatted = []

for example in examples:
    if 'query' in example:
        batch_queries.append(example['query'])
        examples_formatted.append(example)
    elif 'qa_pairs' in example:
        batch_queries.append(example['qa_pairs']['query'])
        examples_formatted.append(example['qa_pairs'])

batch_queries

['Do the Cozy Comfort Pullover Set        have side pockets?',
 'What collection is the Ultra-Lofty         850 Stretch Down Hooded Jacket from?',
 "What features contribute to the comfort and support of the Women's Campside Oxfords as described in the document?",
 'What are the primary materials used in the construction of the Recycled Waterhog Dog Mat, Chevron Weave, and what environmental benefits do they provide?',
 "What are the key features of the Infant and Toddler Girls' Coastal Chill Swimsuit, Two-Piece that ensure comfort and safety for the wearer?",
 'What are some key features of the Refresh Swimwear, V-Neck Tankini that make it suitable for watersports and how does it incorporate eco-friendly materials?',
 'What are the key features of the EcoFlex 3L Storm Pants that make them suitable for year-round outdoor activities, and how does the TEK O2 technology enhance their performance?']

In [78]:
examples_formatted

[{'query': 'Do the Cozy Comfort Pullover Set        have side pockets?',
  'answer': 'Yes'},
 {'query': 'What collection is the Ultra-Lofty         850 Stretch Down Hooded Jacket from?',
  'answer': 'The DownTek collection'},
 {'query': "What features contribute to the comfort and support of the Women's Campside Oxfords as described in the document?",
  'answer': "The Women's Campside Oxfords feature a super-soft canvas for a broken-in feel, a comfortable EVA innersole with Cleansport NXT® antimicrobial odor control, a moderate arch contour of the innersole, and an EVA foam midsole for cushioning and support."},
 {'query': 'What are the primary materials used in the construction of the Recycled Waterhog Dog Mat, Chevron Weave, and what environmental benefits do they provide?',
  'answer': 'The Recycled Waterhog Dog Mat, Chevron Weave, is constructed from 24 oz. polyester fabric made from 94% recycled materials and features rubber backing. The use of recycled plastic materials helps kee

In [80]:
predictions = qa_chain.batch(batch_queries)
predictions

['Yes, the Cozy Comfort Pullover Set has side pockets.',
 'The Ultra-Lofty 850 Stretch Down Hooded Jacket is from the DownTek collection.',
 "The Women's Campside Oxfords offer comfort and support through a super-soft canvas for a broken-in feel, a comfortable EVA innersole with antimicrobial odor control, and an EVA foam midsole for cushioning and support. They also feature a moderate arch contour and a chain-tread-inspired molded rubber outsole.",
 'The Recycled Waterhog Dog Mat, Chevron Weave, is constructed primarily from recycled plastic materials, with 94% of the 24 oz. polyester fabric being recycled, and features a rubber backing. The environmental benefits include keeping plastic out of landfills, trails, and oceans. This contributes to reducing waste and promoting sustainability.',
 "The key features of the Infant and Toddler Girls' Coastal Chill Swimsuit, Two-Piece ensuring comfort and safety are the four-way-stretch, chlorine-resistant fabric, and UPF 50+ sun protection, wh

In [81]:
predictions_as_dict = []

# eval_chain.evaluate necessite un dictionnaire pour les prédictions comme pour les questions-réponses
for prediction in predictions:
  predictions_as_dict.append({"prediction": prediction})
predictions_as_dict

[{'prediction': 'Yes, the Cozy Comfort Pullover Set has side pockets.'},
 {'prediction': 'The Ultra-Lofty 850 Stretch Down Hooded Jacket is from the DownTek collection.'},
 {'prediction': "The Women's Campside Oxfords offer comfort and support through a super-soft canvas for a broken-in feel, a comfortable EVA innersole with antimicrobial odor control, and an EVA foam midsole for cushioning and support. They also feature a moderate arch contour and a chain-tread-inspired molded rubber outsole."},
 {'prediction': 'The Recycled Waterhog Dog Mat, Chevron Weave, is constructed primarily from recycled plastic materials, with 94% of the 24 oz. polyester fabric being recycled, and features a rubber backing. The environmental benefits include keeping plastic out of landfills, trails, and oceans. This contributes to reducing waste and promoting sustainability.'},
 {'prediction': "The key features of the Infant and Toddler Girls' Coastal Chill Swimsuit, Two-Piece ensuring comfort and safety are 

In [82]:
from langchain.evaluation.qa import QAEvalChain

eval_chain = QAEvalChain.from_llm(llm)

In [88]:
graded_outputs = eval_chain.evaluate(examples_formatted, predictions_as_dict, prediction_key='prediction')



In [90]:
for i, example in enumerate(examples):
    print(f"Example {i}:")
    print("Question: " + examples_formatted[i]['query'])
    print("Real Answer: " + examples_formatted[i]['answer'])
    print("Predicted Answer: " + predictions_as_dict[i]['prediction'])
    print("Predicted Grade: " + graded_outputs[i]['results'])
    print()

Example 0:
Question: Do the Cozy Comfort Pullover Set        have side pockets?
Real Answer: Yes
Predicted Answer: Yes, the Cozy Comfort Pullover Set has side pockets.
Predicted Grade: CORRECT

Example 1:
Question: What collection is the Ultra-Lofty         850 Stretch Down Hooded Jacket from?
Real Answer: The DownTek collection
Predicted Answer: The Ultra-Lofty 850 Stretch Down Hooded Jacket is from the DownTek collection.
Predicted Grade: CORRECT

Example 2:
Question: What features contribute to the comfort and support of the Women's Campside Oxfords as described in the document?
Real Answer: The Women's Campside Oxfords feature a super-soft canvas for a broken-in feel, a comfortable EVA innersole with Cleansport NXT® antimicrobial odor control, a moderate arch contour of the innersole, and an EVA foam midsole for cushioning and support.
Predicted Answer: The Women's Campside Oxfords offer comfort and support through a super-soft canvas for a broken-in feel, a comfortable EVA innerso

## [LangSmith Platform](https://smith.langchain.com/)

What is Langsmith? What can you do with it ?

LangSmith, entre autre, vous aide à tracer et à évaluer vos applications de modèles linguistiques et d'agents intelligents pour vous aider à passer du prototype à la production.