# LangChain: Evaluation

## Outline:

* Example generation
* Manual evaluation (and debuging)
* LLM-assisted evaluation

In [1]:
import os
import openai

openai.api_key = os.environ['OPENAI_API_KEY']

import warnings
warnings.filterwarnings('ignore')

## Create our QandA application

In [2]:
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI
from langchain.document_loaders import CSVLoader
from langchain.indexes import VectorstoreIndexCreator
from langchain.vectorstores import DocArrayInMemorySearch

In [3]:
file = 'Tweets.csv'
loader = CSVLoader(file_path=file)
data = loader.load()

In [4]:
index = VectorstoreIndexCreator(
    vectorstore_cls=DocArrayInMemorySearch
).from_loaders([loader])

In [5]:
llm = ChatOpenAI(temperature = 0.0)
qa = RetrievalQA.from_chain_type(
    llm=llm, 
    chain_type="stuff", 
    retriever=index.vectorstore.as_retriever(), 
    verbose=True,
    chain_type_kwargs = {
        "document_separator": "<<<<>>>>>"
    }
)

### Coming up with test datapoints

In [6]:
data[10]

Document(page_content='\ufeffTweet: 微调LLM+向量记忆索引+text to video+control net+deepfake换脸+声音克隆+LLM微调情绪标签，最后再加个高清VR。\nProperty: https://twitter.com/goldengrape\nCreated: June 7, 2023 6:37 AM\nLink: \nTags: Digitlife\nTweet Link: https://twitter.com/goldengrape/status/1666134189488640000\nType: Tweet', metadata={'source': 'Tweets.csv', 'row': 10})

In [7]:
data[11]

Document(page_content='\ufeffTweet: 苹果官方介绍并没有把 #AppleVisionPro  定义为XR眼镜，操作系统也不是XrOS，苹果的定义是进入空间计算时代，连接数字世界和物理世界的桥梁。似乎苹果有着更大的野心和规划。\nProperty: https://twitter.com/xiaohuggg\nCreated: June 6, 2023 1:00 PM\nLink: \nTags: \nTweet Link: https://twitter.com/xiaohuggg/status/1665896981259231232\nType: Tweet', metadata={'source': 'Tweets.csv', 'row': 11})

### Hard-coded examples

In [9]:
examples = [
    {
        "query": "Does Apple published new \
            AR VR devices?",
        "answer": "Yes"
    },
    {
        "query": "How to make use of  \
        AI for personal productivity?",
        "answer": "Make use of OpenAI for work flow and coding"
    }
]

### LLM-Generated examples

In [10]:
from langchain.evaluation.qa import QAGenerateChain


In [11]:
example_gen_chain = QAGenerateChain.from_llm(ChatOpenAI())

In [12]:
new_examples = example_gen_chain.apply_and_parse(
    [{"doc": t} for t in data[:5]]
)

In [36]:
import langchain
langchain.debug = True

In [37]:
new_examples[3]

{'query': 'Who are the two researchers who wrote the book "Why greatness can\'t be planned-the myth of the objectives" and when was it translated into Chinese?',
 'answer': 'The two researchers who wrote the book are @kenneth0stanley and @joelbot3000, and it was translated into Chinese two months prior to the tweet in March 2015.'}

In [16]:
data[0]

Document(page_content='\ufeffTweet: CHAT GPT几个相对实用的场景\nProperty: https://twitter.com/vista8\nCreated: June 9, 2023 12:38 PM\nLink: \nTags: Tool\nTweet Link: https://twitter.com/vista8/status/1658508138050629632\nType: Tweet', metadata={'source': 'Tweets.csv', 'row': 0})

### Combine examples

In [17]:
examples += new_examples

In [None]:
examples[0]
examples[1]
qa.run(examples[0]["query"])

## Manual Evaluation

In [19]:
import langchain
langchain.debug = True

In [20]:
qa.run(examples[1]["query"])



[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


"One way to make use of AI for personal productivity is to integrate it into your workflow. This can involve using AI-powered tools and services to automate repetitive tasks, such as scheduling appointments or organizing your email inbox. Additionally, AI can be used to analyze data and provide insights that can help you make better decisions and optimize your workflow. Some examples of AI-powered tools that can be used for personal productivity include chatbots like ChatGPT, code completion tools like GitHub Copilot, and virtual assistants like Siri or Alexa. It's important to note that while AI can be a powerful tool for personal productivity, it's not a silver bullet solution and should be used in conjunction with other productivity strategies and techniques."

In [21]:
# Turn off the debug mode
langchain.debug = False

## LLM assisted evaluation

In [22]:
predictions = qa.apply(examples)



[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


In [23]:
from langchain.evaluation.qa import QAEvalChain

In [24]:
llm = ChatOpenAI(temperature=0)
eval_chain = QAEvalChain.from_llm(llm)

In [26]:
graded_outputs = eval_chain.evaluate(examples, predictions)

In [27]:
for i, eg in enumerate(examples):
    print(f"Example {i}:")
    print("Question: " + predictions[i]['query'])
    print("Real Answer: " + predictions[i]['answer'])
    print("Predicted Answer: " + predictions[i]['result'])
    print("Predicted Grade: " + graded_outputs[i]['text'])
    print()

Example 0:
Question: Does Apple published new             AR VR devices?
Real Answer: Yes
Predicted Answer: There is no information in the provided context that suggests Apple has published new AR/VR devices. The first tweet mentions that Apple has released job descriptions related to AR/VR and mentions a potential use of LLM technology in these devices. The second tweet discusses Apple's vision for their technology, but does not mention any new devices being released.
Predicted Grade: INCORRECT

Example 1:
Question: How to make use of          AI for personal productivity?
Real Answer: Make use of OpenAI for work flow and coding
Predicted Answer: One way to make use of AI for personal productivity is to integrate it into your workflow. This can involve using AI-powered tools and services to automate repetitive tasks, such as scheduling appointments or organizing your email inbox. Additionally, AI can be used to analyze data and provide insights that can help you make better decisions 

In [28]:
examples

[{'query': 'Does Apple published new             AR VR devices?',
  'answer': 'Yes'},
 {'query': 'How to make use of          AI for personal productivity?',
  'answer': 'Make use of OpenAI for work flow and coding'},
 {'query': 'What is the source of the metadata in this document?',
  'answer': "The source of the metadata in this document is 'Tweets.csv'."},
 {'query': 'What is the Twitter handle of the account mentioned in the document?',
  'answer': 'The Twitter handle mentioned in the document is @fuxiangPro.'},
 {'query': 'What are the two main things necessary to create a successful product according to the tweet?',
  'answer': 'The two main things necessary to create a successful product according to the tweet are building and selling.'},
 {'query': 'Who are the two researchers who wrote the book "Why greatness can\'t be planned-the myth of the objectives" and when was it translated into Chinese?',
  'answer': 'The two researchers who wrote the book are @kenneth0stanley and @joe

In [29]:
predictions

[{'query': 'Does Apple published new             AR VR devices?',
  'answer': 'Yes',
  'result': "There is no information in the provided context that suggests Apple has published new AR/VR devices. The first tweet mentions that Apple has released job descriptions related to AR/VR and mentions a potential use of LLM technology in these devices. The second tweet discusses Apple's vision for their technology, but does not mention any new devices being released."},
 {'query': 'How to make use of          AI for personal productivity?',
  'answer': 'Make use of OpenAI for work flow and coding',
  'result': "One way to make use of AI for personal productivity is to integrate it into your workflow. This can involve using AI-powered tools and services to automate repetitive tasks, such as scheduling appointments or organizing your email inbox. Additionally, AI can be used to analyze data and provide insights that can help you make better decisions and optimize your workflow. Some examples of 

In [33]:
for i ,example in enumerate(examples):
    print(f"Example {i}:")
    print("Question: " + predictions[i]['query'])
    print("Real Answer: " + predictions[i]['answer'])
    print("Predicted Answer: " + predictions[i]['result'])
    print("Predicted Grade: " + graded_outputs[i]['text'])
    print()

Example 0:
Question: Does Apple published new             AR VR devices?
Real Answer: Yes
Predicted Answer: There is no information in the provided context that suggests Apple has published new AR/VR devices. The first tweet mentions that Apple has released job descriptions related to AR/VR and mentions a potential use of LLM technology in these devices. The second tweet discusses Apple's vision for their technology, but does not mention any new devices being released.
Predicted Grade: INCORRECT

Example 1:
Question: How to make use of          AI for personal productivity?
Real Answer: Make use of OpenAI for work flow and coding
Predicted Answer: One way to make use of AI for personal productivity is to integrate it into your workflow. This can involve using AI-powered tools and services to automate repetitive tasks, such as scheduling appointments or organizing your email inbox. Additionally, AI can be used to analyze data and provide insights that can help you make better decisions 