# LangChain: Evaluation

## Outline:

* Example generation
* Manual evaluation (and debuging)
* LLM-assisted evaluation
* LangChain evaluation platform

这一节介绍怎样对LangChain应用程序的效果进行评估。

## Create our QandA application

In [1]:
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI
from langchain.document_loaders import CSVLoader
from langchain.indexes import VectorstoreIndexCreator
from langchain.vectorstores import DocArrayInMemorySearch

In [2]:
file = 'OutdoorClothingCatalog_1000.csv'
loader = CSVLoader(file_path=file)
data = loader.load()

In [3]:
from langchain.indexes import VectorstoreIndexCreator
from langchain.embeddings import HuggingFaceEmbeddings

# 初始化 Hugging Face 嵌入模型
embeddings = HuggingFaceEmbeddings()

index = VectorstoreIndexCreator(
    vectorstore_cls=DocArrayInMemorySearch,
    embedding=embeddings
).from_loaders([loader])

  embeddings = HuggingFaceEmbeddings()
  embeddings = HuggingFaceEmbeddings()
  from .autonotebook import tqdm as notebook_tqdm


1. 使用ollama下载deepseek-r1:32b  
2. 后台启动ollama服务： nohup ollama serve  >/dev/null 2>&1 &  
3. 我用的是3090, 共24G显存，占用了21G。

In [4]:
llm = ChatOpenAI(
    model_name='qwen2',
    openai_api_base="http://127.0.0.1:11434/v1",
    openai_api_key="EMPTY",
    streaming=False
)

  llm = ChatOpenAI(


In [5]:
qa = RetrievalQA.from_chain_type(
    llm=llm, 
    chain_type="stuff", 
    retriever=index.vectorstore.as_retriever(), 
    verbose=True,
    chain_type_kwargs = {
        "document_separator": "<<<<>>>>>"
    }
)

### Coming up with test datapoints

In [6]:
data[10]

Document(metadata={'source': 'OutdoorClothingCatalog_1000.csv', 'row': 10}, page_content=": 10\nname: Cozy Comfort Pullover Set, Stripe\ndescription: Perfect for lounging, this striped knit set lives up to its name. We used ultrasoft fabric and an easy design that's as comfortable at bedtime as it is when we have to make a quick run out.\n\nSize & Fit\n- Pants are Favorite Fit: Sits lower on the waist.\n- Relaxed Fit: Our most generous fit sits farthest from the body.\n\nFabric & Care\n- In the softest blend of 63% polyester, 35% rayon and 2% spandex.\n\nAdditional Features\n- Relaxed fit top with raglan sleeves and rounded hem.\n- Pull-on pants have a wide elastic waistband and drawstring, side pockets and a modern slim leg.\n\nImported.")

In [7]:
data[11]

Document(metadata={'source': 'OutdoorClothingCatalog_1000.csv', 'row': 11}, page_content=': 11\nname: Ultra-Lofty 850 Stretch Down Hooded Jacket\ndescription: This technical stretch down jacket from our DownTek collection is sure to keep you warm and comfortable with its full-stretch construction providing exceptional range of motion. With a slightly fitted style that falls at the hip and best with a midweight layer, this jacket is suitable for light activity up to 20° and moderate activity up to -30°. The soft and durable 100% polyester shell offers complete windproof protection and is insulated with warm, lofty goose down. Other features include welded baffles for a no-stitch construction and excellent stretch, an adjustable hood, an interior media port and mesh stash pocket and a hem drawcord. Machine wash and dry. Imported.')

### Hard-coded examples

方法一：手动设计问答对。

In [8]:
examples = [
    {
        "query": "Do the Cozy Comfort Pullover Set\
        have side pockets?",
        "answer": "Yes"
    },
    {
        "query": "What collection is the Ultra-Lofty \
        850 Stretch Down Hooded Jacket from?",
        "answer": "The DownTek collection"
    }
]

### LLM-Generated examples

方法二：使用QAGenerateChain自动生成针对文档的问答对。

In [9]:
from langchain.evaluation.qa import QAGenerateChain


In [10]:
example_gen_chain = QAGenerateChain.from_llm(llm)

In [11]:
# the warning below can be safely ignored

In [12]:
new_examples = example_gen_chain.apply_and_parse(
    [{"doc": t} for t in data[:5]]
)



In [13]:
new_examples[0]

{'qa_pairs': {'query': "What are the key features of Women's Campside Oxfords?",
  'answer': "The Women's Campside Oxfords come with several notable features:"}}

In [14]:
data[0]

Document(metadata={'source': 'OutdoorClothingCatalog_1000.csv', 'row': 0}, page_content=": 0\nname: Women's Campside Oxfords\ndescription: This ultracomfortable lace-to-toe Oxford boasts a super-soft canvas, thick cushioning, and quality construction for a broken-in feel from the first time you put them on. \n\nSize & Fit: Order regular shoe size. For half sizes not offered, order up to next whole size. \n\nSpecs: Approx. weight: 1 lb.1 oz. per pair. \n\nConstruction: Soft canvas material for a broken-in feel and look. Comfortable EVA innersole with Cleansport NXT® antimicrobial odor control. Vintage hunt, fish and camping motif on innersole. Moderate arch contour of innersole. EVA foam midsole for cushioning and support. Chain-tread-inspired molded rubber outsole with modified chain-tread pattern. Imported. \n\nQuestions? Please contact us for any inquiries.")

### Combine examples

In [15]:
examples += new_examples

In [16]:
qa.run(examples[0]["query"])

  qa.run(examples[0]["query"])




[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


'Yes, the Cozy Comfort Pullover Set has pull-on pants with side pockets.'

## Manual Evaluation

In [17]:
import langchain
langchain.debug = True

In [18]:
qa.run(examples[0]["query"])

[32;1m[1;3m[chain/start][0m [1m[chain:RetrievalQA] Entering Chain run with input:
[0m{
  "query": "Do the Cozy Comfort Pullover Set        have side pockets?"
}
[32;1m[1;3m[chain/start][0m [1m[chain:RetrievalQA > chain:StuffDocumentsChain] Entering Chain run with input:
[0m[inputs]
[32;1m[1;3m[chain/start][0m [1m[chain:RetrievalQA > chain:StuffDocumentsChain > chain:LLMChain] Entering Chain run with input:
[0m{
  "question": "Do the Cozy Comfort Pullover Set        have side pockets?",
  "context": ": 73\nname: Cozy Cuddles Knit Pullover Set\ndescription: Perfect for lounging, this knit set lives up to its name. We used ultrasoft fabric and an easy design that's as comfortable at bedtime as it is when we have to make a quick run out. \n\nSize & Fit \nPants are Favorite Fit: Sits lower on the waist. \nRelaxed Fit: Our most generous fit sits farthest from the body. \n\nFabric & Care \nIn the softest blend of 63% polyester, 35% rayon and 2% spandex.\n\nAdditional Features \

'Yes, the Cozy Comfort Pullover Set does have side pockets.'

In [19]:
# Turn off the debug mode
langchain.debug = False

## LLM assisted evaluation

In [21]:
examples

[{'query': 'Do the Cozy Comfort Pullover Set        have side pockets?',
  'answer': 'Yes'},
 {'query': 'What collection is the Ultra-Lofty         850 Stretch Down Hooded Jacket from?',
  'answer': 'The DownTek collection'},
 {'qa_pairs': {'query': "What are the key features of Women's Campside Oxfords?",
   'answer': "The Women's Campside Oxfords come with several notable features:"}},
 {'qa_pairs': {'query': 'What makes the Waterhog dog mat eco-friendly?',
   'answer': "The Waterhog dog mat is eco-friendly because it's constructed from recycled plastic materials, thereby helping to keep plastic out of landfills, trails, and oceans."}},
 {'qa_pairs': {'query': "What features does the Infant and Toddler Girls' Coastal Chill Swimsuit offer in terms of sun protection?",
   'answer': "The Infant and Toddler Girls' Coastal Chill Swimsuit offers UPF 50+ rated fabric which provides the highest rated sun protection possible, blocking 98% of the sun's harmful rays."}},
 {'qa_pairs': {'query':

In [22]:
print(qa.input_keys)

['query']


In [23]:
# 转换为统一格式
fixed_examples = []
for example in examples:
    if 'qa_pairs' in example:
        fixed_examples.append({
            "query": example['qa_pairs']['query'],
            "answer": example['qa_pairs']['answer']
        })
    else:
        fixed_examples.append(example)

# 打印修复后的 examples
print(fixed_examples)

[{'query': 'Do the Cozy Comfort Pullover Set        have side pockets?', 'answer': 'Yes'}, {'query': 'What collection is the Ultra-Lofty         850 Stretch Down Hooded Jacket from?', 'answer': 'The DownTek collection'}, {'query': "What are the key features of Women's Campside Oxfords?", 'answer': "The Women's Campside Oxfords come with several notable features:"}, {'query': 'What makes the Waterhog dog mat eco-friendly?', 'answer': "The Waterhog dog mat is eco-friendly because it's constructed from recycled plastic materials, thereby helping to keep plastic out of landfills, trails, and oceans."}, {'query': "What features does the Infant and Toddler Girls' Coastal Chill Swimsuit offer in terms of sun protection?", 'answer': "The Infant and Toddler Girls' Coastal Chill Swimsuit offers UPF 50+ rated fabric which provides the highest rated sun protection possible, blocking 98% of the sun's harmful rays."}, {'query': 'What makes the Refresh Swimwear, V-Neck Tankini Contrasts eco-friendly?

In [25]:
predictions = qa.apply(fixed_examples)



[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


In [26]:
from langchain.evaluation.qa import QAEvalChain

In [27]:
eval_chain = QAEvalChain.from_llm(llm)

In [29]:
graded_outputs = eval_chain.evaluate(fixed_examples, predictions)



In [32]:
for i, eg in enumerate(fixed_examples):
    print(f"Example {i}:")
    print("Question: " + predictions[i]['query'])
    print("Real Answer: " + predictions[i]['answer'])
    print("Predicted Answer: " + predictions[i]['result'])
    #print("Predicted Grade: " + graded_outputs[i]['text'])
    print()

Example 0:
Question: Do the Cozy Comfort Pullover Set        have side pockets?
Real Answer: Yes
Predicted Answer: Yes, the Cozy Comfort Pullover Set has side pockets.

Example 1:
Question: What collection is the Ultra-Lofty         850 Stretch Down Hooded Jacket from?
Real Answer: The DownTek collection
Predicted Answer: The Ultra-Lofty 850 Stretch Down Hooded Jacket is from the DownTek collection.

Example 2:
Question: What are the key features of Women's Campside Oxfords?
Real Answer: The Women's Campside Oxfords come with several notable features:
Predicted Answer: The key features of Women's Campside Oxfords include:

1. **Ultracomfortable Design**: They boast a super-soft canvas material that provides a broken-in feel from the first time you wear them.
2. **Quality Construction**: The construction ensures durability and comfort throughout use.
3. **Comfortable Innersole**: Thick cushioning is provided by an EVA innersole with Cleansport NXT® antimicrobial odor control, which help

In [33]:
graded_outputs[0]

{'results': 'CORRECT'}