# LangChain: Evaluation

## Outline:

* Example generation
* Manual evaluation (and debuging)
* LLM-assisted evaluation
* LangChain evaluation platform

In [1]:
import os
import openai

#openai.api_key = os.environ['OPENAI_API_KEY']

## Create our QandA application

In [2]:
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI
from langchain.document_loaders import TextLoader
from langchain.indexes import VectorstoreIndexCreator
from langchain.vectorstores import DocArrayInMemorySearch
from langchain.text_splitter import CharacterTextSplitter

## Load and split data

In [3]:

file = '../data/soilsense_info.txt'
loader = TextLoader(file_path=file)
data = loader.load()

text_splitter = CharacterTextSplitter(chunk_size=300, chunk_overlap=40)
#The default text_spliter also takes into account line changes etc., in order not to split in the midle of sentences,
#hence it might produce chunks that are larger than the specified chunk_size
data_split = text_splitter.split_documents(data)

print('Number of chunks: ', len(data_split))
print('First chunk: ', data_split[0])

Created a chunk of size 626, which is longer than the specified 300
Created a chunk of size 589, which is longer than the specified 300
Created a chunk of size 719, which is longer than the specified 300
Created a chunk of size 814, which is longer than the specified 300
Created a chunk of size 335, which is longer than the specified 300


Number of chunks:  29
First chunk:  page_content='Take the guesswork out of irrigation\nWe offer a simple, robust and affordable soil sensor system to help orchard managers, greenhouse growers and high-value field crop farmers manage and optimize irrigation\n\nSave water\nAvoid over-irrigation and irrigate only when necessary' metadata={'source': '../data/soilsense_info.txt'}


## Create vectorstore
Using openAI embeddings

In [4]:
## Create vectorstore
from langchain.embeddings import OpenAIEmbeddings
embeddings = OpenAIEmbeddings()

db = DocArrayInMemorySearch.from_documents(
    data_split, 
    embeddings
)

## Setting up Retriever and Q&A Chain

In [5]:
#Have the retriever find the 3 most similar docs
retriever = db.as_retriever(search_type="similarity", search_kwargs={"k":3})

In [44]:
llm = ChatOpenAI(temperature = 0.0)
qa = RetrievalQA.from_chain_type(
    llm=llm, 
    chain_type="stuff", 
    retriever=retriever, 
    verbose=True,
    return_source_documents=True,
    chain_type_kwargs = {
        "document_separator": "<<<<>>>>>"
    }
)

### Coming up with test datapoints
Below we can inspect some of the documents, to come up with questions that match the given paraphrases.

In [7]:
data_split[3]

Document(page_content="Ole, Operational Manager at Ørskov Foods\nØrskov Foods is Denmark's largest producer of apples. They are using SoilSense to control the growth of their trees in order to increase yield and optimize water usage.", metadata={'source': '../data/soilsense_info.txt'})

In [8]:
data_split[5]

Document(page_content='Real insights, automatic data analysis\nObtaining soil data is only half of the solution. If you need to spend hours analyzing graphs to make a decision, the system will not be used. That´s why we focus on providing simple insights based on our unique automatic data analysis', metadata={'source': '../data/soilsense_info.txt'})

### Hard-coded examples

In [59]:
examples = [
    {
        "query": "Are there any apple orchards using your product?",
        "answer": "Yes, Ørskov Food is for example a customer"
    },
    {
        "query": "Does the system automatically conduct data analysis on the soil measurements?",
        "answer": "Yes"
    }
]

### LLM-Generated examples

In [39]:
from langchain.evaluation.qa import QAGenerateChain
example_gen_chain = QAGenerateChain.from_llm(ChatOpenAI())


In [45]:
import langchain
langchain.debug = False

new_examples = example_gen_chain.apply_and_parse(
    [{"doc": t} for t in data_split[:5]]
)

Let us inspect the source of data and one of the generated questions

In [53]:
new_examples[0]

{'qa_pairs': {'query': 'What does the soil sensor system offered by the company aim to help orchard managers, greenhouse growers, and high-value field crop farmers with?',
  'answer': 'The soil sensor system offered by the company aims to help orchard managers, greenhouse growers, and high-value field crop farmers manage and optimize irrigation.'}}

In [54]:
data_split[0]

Document(page_content='Take the guesswork out of irrigation\nWe offer a simple, robust and affordable soil sensor system to help orchard managers, greenhouse growers and high-value field crop farmers manage and optimize irrigation\n\nSave water\nAvoid over-irrigation and irrigate only when necessary', metadata={'source': '../data/soilsense_info.txt'})

The question seems legit, but for some reason the  QAGenerateChain returns a nested dict with the top key "qa_pairs" , this is defined in the source code of langchain, but it does not apply in the online example.... ? In either case, it will lead the code to fail later on, because the auto eval function expects the query argument to be at top level

In [55]:
new_examples[0]['qa_pairs']

{'query': 'What does the soil sensor system offered by the company aim to help orchard managers, greenhouse growers, and high-value field crop farmers with?',
 'answer': 'The soil sensor system offered by the company aims to help orchard managers, greenhouse growers, and high-value field crop farmers manage and optimize irrigation.'}


**I therefore write a function to flatten the dict**

In [50]:
new_examples_flat = [{'query': item['qa_pairs']['query'], 'answer': item['qa_pairs']['answer']} for item in new_examples]


In [52]:
new_examples_flat[0]

{'query': 'What does the soil sensor system offered by the company aim to help orchard managers, greenhouse growers, and high-value field crop farmers with?',
 'answer': 'The soil sensor system offered by the company aims to help orchard managers, greenhouse growers, and high-value field crop farmers manage and optimize irrigation.'}

### Combine examples

In [60]:
examples += new_examples_flat

Insecting the now "ground truth" for a question and correct answer

In [62]:
examples[4]['query']

'Why did Markhaven switch from their previous soil sensor system to SoilSense?'

In [63]:
examples[4]['answer'] 

'Markhaven switched from their previous soil sensor system to SoilSense because it was more reliable and easier to use.'

Testing out the LLM to see if it answers correctly:

In [66]:
response = qa({"query": examples[4]["query"]})



[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


In [67]:
response

{'query': 'Why did Markhaven switch from their previous soil sensor system to SoilSense?',
 'result': 'Markhaven switched from their previous soil sensor system to SoilSense because it was more reliable and easier to use.',
 'source_documents': [Document(page_content='Listen to our customers\nJesper, Operational Manager at Markhaven\nMarkhaven is a large supplier of organic tomatoes and cucumbers. In 2020 they decided to switch their previous soil sensor system to SoilSense, because it was more reliable and easier to use.', metadata={'source': '../data/soilsense_info.txt'}),
  Document(page_content='Avoiding plant stress\nOur accurate soil-moisture sensors make sure that you will be warned before the soil gets too dry and starts limiting growth.\n\nSaving water\nAs the first system in the world, our algorithms automatically detect field capacity and warn you if you over-irrigate.', metadata={'source': '../data/soilsense_info.txt'}),
  Document(page_content='The best on the market\nThe 

## Manual Evaluation
Use debug=True to get a detailed printout of what is happening in the chain

In [68]:
import langchain
langchain.debug = True

In [70]:
qa({"query": examples[4]["query"]})

[32;1m[1;3m[chain/start][0m [1m[1:chain:RetrievalQA] Entering Chain run with input:
[0m{
  "query": "Why did Markhaven switch from their previous soil sensor system to SoilSense?"
}
[32;1m[1;3m[chain/start][0m [1m[1:chain:RetrievalQA > 3:chain:StuffDocumentsChain] Entering Chain run with input:
[0m[inputs]
[32;1m[1;3m[chain/start][0m [1m[1:chain:RetrievalQA > 3:chain:StuffDocumentsChain > 4:chain:LLMChain] Entering Chain run with input:
[0m{
  "question": "Why did Markhaven switch from their previous soil sensor system to SoilSense?",
  "context": "Listen to our customers\nJesper, Operational Manager at Markhaven\nMarkhaven is a large supplier of organic tomatoes and cucumbers. In 2020 they decided to switch their previous soil sensor system to SoilSense, because it was more reliable and easier to use.<<<<>>>>>Avoiding plant stress\nOur accurate soil-moisture sensors make sure that you will be warned before the soil gets too dry and starts limiting growth.\n\nSaving wate

{'query': 'Why did Markhaven switch from their previous soil sensor system to SoilSense?',
 'result': 'Markhaven switched from their previous soil sensor system to SoilSense because it was more reliable and easier to use.',
 'source_documents': [Document(page_content='Listen to our customers\nJesper, Operational Manager at Markhaven\nMarkhaven is a large supplier of organic tomatoes and cucumbers. In 2020 they decided to switch their previous soil sensor system to SoilSense, because it was more reliable and easier to use.', metadata={'source': '../data/soilsense_info.txt'}),
  Document(page_content='Avoiding plant stress\nOur accurate soil-moisture sensors make sure that you will be warned before the soil gets too dry and starts limiting growth.\n\nSaving water\nAs the first system in the world, our algorithms automatically detect field capacity and warn you if you over-irrigate.', metadata={'source': '../data/soilsense_info.txt'}),
  Document(page_content='The best on the market\nThe 

In [71]:
# Turn off the debug mode
langchain.debug = False

## LLM assisted evaluation

In [72]:
examples

[{'query': 'Are there any apple orchards using your product?',
  'answer': 'Yes, Ørskov Food is for example a customer'},
 {'query': 'Does the system automatically conduct data analysis on the soil measurements?',
  'answer': 'Yes'},
 {'query': 'What does the soil sensor system offered by the company aim to help orchard managers, greenhouse growers, and high-value field crop farmers with?',
  'answer': 'The soil sensor system offered by the company aims to help orchard managers, greenhouse growers, and high-value field crop farmers manage and optimize irrigation.'},
 {'query': 'What are the benefits of irrigating correctly throughout the varied phenological stages of crops?',
  'answer': 'The benefits of irrigating correctly throughout the varied phenological stages of crops include increasing yield, reducing risk by getting notified immediately if the soil is getting too dry or an over-irrigation occurs, and saving time by eliminating the need for manual soil inspection.'},
 {'query':

In [73]:
predictions = qa.apply(examples)



[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


In [74]:
from langchain.evaluation.qa import QAEvalChain

In [75]:
llm = ChatOpenAI(temperature=0)
eval_chain = QAEvalChain.from_llm(llm)

In [79]:
graded_outputs = eval_chain.evaluate(examples, predictions)

In [80]:
predictions[0]

{'query': 'Are there any apple orchards using your product?',
 'answer': 'Yes, Ørskov Food is for example a customer',
 'result': "Yes, Ørskov Foods, Denmark's largest producer of apples, is using SoilSense to control the growth of their apple trees and optimize water usage.",
 'source_documents': [Document(page_content="Ole, Operational Manager at Ørskov Foods\nØrskov Foods is Denmark's largest producer of apples. They are using SoilSense to control the growth of their trees in order to increase yield and optimize water usage.", metadata={'source': '../data/soilsense_info.txt'}),
  Document(page_content='Proved in 8 countries on 3 continents\nFrom a Peruvian desert to a Swedish winter. Our system is currently being used by farmers in Denmark, Germany, Sweden, Lithuania, Kenya, and Peru.', metadata={'source': '../data/soilsense_info.txt'}),
  Document(page_content='Listen to our customers\nJesper, Operational Manager at Markhaven\nMarkhaven is a large supplier of organic tomatoes and c

In [81]:
graded_outputs

[{'results': 'CORRECT'},
 {'results': 'CORRECT'},
 {'results': 'CORRECT'},
 {'results': 'CORRECT'},
 {'results': 'CORRECT'},
 {'results': 'CORRECT'},
 {'results': 'CORRECT'}]

In [82]:
for i, eg in enumerate(examples):
    print(f"Example {i}:")
    print("Question: " + predictions[i]['query'])
    print("Real Answer: " + predictions[i]['answer'])
    print("Predicted Answer: " + predictions[i]['result'])
    print("Predicted Grade: " + graded_outputs[i]['results'])
    print()

Example 0:
Question: Are there any apple orchards using your product?
Real Answer: Yes, Ørskov Food is for example a customer
Predicted Answer: Yes, Ørskov Foods, Denmark's largest producer of apples, is using SoilSense to control the growth of their apple trees and optimize water usage.
Predicted Grade: CORRECT

Example 1:
Question: Does the system automatically conduct data analysis on the soil measurements?
Real Answer: Yes
Predicted Answer: Yes, the system conducts automatic data analysis on the soil measurements to provide simple insights.
Predicted Grade: CORRECT

Example 2:
Question: What does the soil sensor system offered by the company aim to help orchard managers, greenhouse growers, and high-value field crop farmers with?
Real Answer: The soil sensor system offered by the company aims to help orchard managers, greenhouse growers, and high-value field crop farmers manage and optimize irrigation.
Predicted Answer: The soil sensor system offered by the company aims to help orc

## LangChain evaluation platform

The LangChain evaluation platform, LangChain Plus, can be accessed here https://www.langchain.plus/.