# LangChain: Q&A over Documents

An example might be a tool that would allow you to query a product catalog for items of interest.

In [1]:
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI
from langchain.document_loaders import CSVLoader
from langchain.vectorstores import DocArrayInMemorySearch
from IPython.display import display, Markdown

In [2]:
file = './data/OutdoorClothingCatalog_1000.csv'
loader = CSVLoader(file_path=file)

## Using LangChain `VectorstoreIndexCreator`

In [3]:
from langchain.indexes import VectorstoreIndexCreator

In [8]:
#!pip install docarray

In [4]:
index = VectorstoreIndexCreator(
    vectorstore_cls=DocArrayInMemorySearch
).from_loaders([loader])

In [5]:
query ="Please list all your shirts with sun protection \
in a table in markdown and summarize each one."

In [6]:
response = index.query(llm=ChatOpenAI(), question=query, chain_type="stuff")

In [7]:
display(Markdown(response))

| Name | Description |
| --- | --- |
| Men's Tropical Plaid Short-Sleeve Shirt | Rated UPF 50+ for superior protection from the sun's UV rays. Made of 100% polyester and is wrinkle-resistant. Provides the highest rated sun protection possible. Front and back cape venting that lets in cool breezes and two front bellows pockets. |
| Men's Plaid Tropic Shirt, Short-Sleeve | Ultracomfortable sun protection rated to UPF 50+. Originally designed for fishing, this lightest hot-weather shirt offers UPF 50+ coverage and is great for extended travel. SunSmart technology blocks 98% of the sun's harmful UV rays, while the high-performance fabric is wrinkle-free and quickly evaporates perspiration. |
| Sun Shield Shirt by | High-performance sun shirt is guaranteed to protect from harmful UV rays. Made of 78% nylon and 22% Lycra Xtra Life fiber, this shirt is UPF 50+ rated. Wicks moisture for quick-drying comfort and fits comfortably over your favorite swimsuit. Abrasion-resistant for season after season of wear. |
| Men's TropicVibe Shirt, Short-Sleeve | Men’s sun-protection shirt with built-in UPF 50+. Made of 71% Nylon and 29% Polyester. Wrinkle-resistant. Front and back cape venting lets in cool breezes and two front bellows pockets. |

All four shirts provide sun protection with UPF 50+ rating, blocking 98% of the sun's harmful rays. The Men's Tropical Plaid Short-Sleeve Shirt and Men's TropicVibe Shirt, Short-Sleeve both have front and back cape venting that lets in cool breezes and two front bellows pockets. The Men's Plaid Tropic Shirt, Short-Sleeve is ultracomfortable and originally designed for fishing, while the Sun Shield Shirt by is made of nylon and Lycra Xtra Life fiber, wicks moisture for quick-drying comfort and fits comfortably over your favorite swimsuit.

## Using LLM

In [14]:
from langchain.document_loaders import CSVLoader
loader = CSVLoader(file_path=file)

In [15]:
docs = loader.load()

In [16]:
docs[0]

Document(page_content="Unnamed: 0: 0\nname: Women's Campside Oxfords\ndescription: This ultracomfortable lace-to-toe Oxford boasts a super-soft canvas, thick cushioning, and quality construction for a broken-in feel from the first time you put them on. \n\nSize & Fit: Order regular shoe size. For half sizes not offered, order up to next whole size. \n\nSpecs: Approx. weight: 1 lb.1 oz. per pair. \n\nConstruction: Soft canvas material for a broken-in feel and look. Comfortable EVA innersole with Cleansport NXT® antimicrobial odor control. Vintage hunt, fish and camping motif on innersole. Moderate arch contour of innersole. EVA foam midsole for cushioning and support. Chain-tread-inspired molded rubber outsole with modified chain-tread pattern. Imported. \n\nQuestions? Please contact us for any inquiries.", metadata={'source': './data/OutdoorClothingCatalog_1000.csv', 'row': 0})

In [17]:
from langchain.embeddings import OpenAIEmbeddings # OpenAI embeddings 的封装
embeddings = OpenAIEmbeddings()

### Try to play with embeddings

[embeddings](https://python.langchain.com/en/latest/reference/modules/embeddings.html?highlight=OpenAIEmbeddings#langchain.embeddings.OpenAIEmbeddings)会把文本向量化方便找到相似含义的文本

![embeddings](./img/embeddings.png)

In [20]:
embed = embeddings.embed_query("Hi, my name is Elton. Nice to meet you.")

In [21]:
print(len(embed)) # OpenAI embeddings 使用1536个定长度的向量来表示引用内容

1536


In [17]:
print(embed[:5]) # show the first 5 items.

[-0.01969853788614273, -0.020303869619965553, -0.005772730801254511, -0.030367527157068253, 0.01030326820909977]


In [18]:
db = DocArrayInMemorySearch.from_documents(
    docs, 
    embeddings
)

In [19]:
query = "Please suggest a shirt with sunblocking"

In [20]:
docs = db.similarity_search(query)

In [21]:
len(docs)

4

In [22]:
docs[0]

Document(page_content='Unnamed: 0: 255\nname: Sun Shield Shirt by\ndescription: "Block the sun, not the fun – our high-performance sun shirt is guaranteed to protect from harmful UV rays. \n\nSize & Fit: Slightly Fitted: Softly shapes the body. Falls at hip.\n\nFabric & Care: 78% nylon, 22% Lycra Xtra Life fiber. UPF 50+ rated – the highest rated sun protection possible. Handwash, line dry.\n\nAdditional Features: Wicks moisture for quick-drying comfort. Fits comfortably over your favorite swimsuit. Abrasion resistant for season after season of wear. Imported.\n\nSun Protection That Won\'t Wear Off\nOur high-performance fabric provides SPF 50+ sun protection, blocking 98% of the sun\'s harmful rays. This fabric is recommended by The Skin Cancer Foundation as an effective UV protectant.', metadata={'source': 'OutdoorClothingCatalog_1000.csv', 'row': 255})

In [23]:
retriever = db.as_retriever()

In [32]:
llm = ChatOpenAI(temperature = 0.0)
llm

ChatOpenAI(verbose=False, callbacks=None, callback_manager=None, client=<class 'openai.api_resources.chat_completion.ChatCompletion'>, model_name='gpt-3.5-turbo', temperature=0.0, model_kwargs={}, openai_api_key='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', openai_api_base='', openai_organization='', openai_proxy='', request_timeout=None, max_retries=6, streaming=False, n=1, max_tokens=None)

In [25]:
qdocs = "".join([docs[i].page_content for i in range(len(docs))])

In [26]:
response = llm.call_as_llm(f"{qdocs} Question: Please list all your \
shirts with sun protection in a table in markdown and summarize each one.") 

In [27]:
display(Markdown(response))

| Name | Description |
| --- | --- |
| Sun Shield Shirt | High-performance sun shirt with UPF 50+ sun protection, moisture-wicking fabric, and abrasion resistance. Recommended by The Skin Cancer Foundation. |
| Men's Plaid Tropic Shirt | Ultracomfortable shirt with UPF 50+ sun protection, wrinkle-free fabric, and front/back cape venting. Made with 52% polyester and 48% nylon. |
| Men's TropicVibe Shirt | Lightweight shirt with built-in UPF 50+ sun protection, front/back cape venting, and wrinkle-resistant fabric. Made with 71% nylon and 29% polyester. |
| Men's Tropical Plaid Short-Sleeve Shirt | Lightest hot-weather shirt with UPF 50+ sun protection, front/back cape venting, and wrinkle-resistant fabric. Made with 100% polyester. |

Each shirt provides UPF 50+ sun protection, blocking 98% of the sun's harmful rays. They also have additional features such as moisture-wicking fabric, front/back cape venting, and wrinkle-resistant fabric. The Sun Shield Shirt is recommended by The Skin Cancer Foundation.

### RetrievalQA不会把所有的文档都提交给llm，而是只把与query相关的内容提交给llm,可以节约token.

In [28]:
qa_stuff = RetrievalQA.from_chain_type(
    llm=llm, 
    chain_type="stuff", 
    retriever=retriever, 
    verbose=True
)

In [29]:
query =  "Please list all your shirts with sun protection in a table \
in markdown and summarize each one."

In [30]:
response = qa_stuff.run(query)



[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


In [31]:
display(Markdown(response))

| Name | Description |
| --- | --- |
| Men's Tropical Plaid Short-Sleeve Shirt | Rated UPF 50+ for superior protection from the sun's UV rays. Made of 100% polyester and is wrinkle-resistant. With front and back cape venting that lets in cool breezes and two front bellows pockets. Provides the highest rated sun protection possible. |
| Men's Plaid Tropic Shirt, Short-Sleeve | Rated to UPF 50+, helping you stay cool and dry. Made with 52% polyester and 48% nylon, this shirt is machine washable and dryable. Additional features include front and back cape venting, two front bellows pockets and an imported design. With UPF 50+ coverage, you can limit sun exposure and feel secure with the highest rated sun protection available. |
| Sun Shield Shirt by | High-performance sun shirt is guaranteed to protect from harmful UV rays. Made of 78% nylon, 22% Lycra Xtra Life fiber. UPF 50+ rated – the highest rated sun protection possible. Wicks moisture for quick-drying comfort. Fits comfortably over your favorite swimsuit. Abrasion resistant for season after season of wear. |
| Men's TropicVibe Shirt, Short-Sleeve | Sun-protection shirt with built-in UPF 50+. Made of Shell: 71% Nylon, 29% Polyester. Lining: 100% Polyester knit mesh. Machine wash and dry. Additional features include wrinkle resistance, front and back cape venting, and two front bellows pockets. |

All of the shirts listed provide sun protection with a UPF rating of 50+ which blocks 98% of the sun's harmful rays. The Men's Tropical Plaid Short-Sleeve Shirt and Men's Plaid Tropic Shirt, Short-Sleeve both have front and back cape venting and two front bellows pockets. The Sun Shield Shirt by is made of nylon and Lycra Xtra Life fiber, and is abrasion-resistant. The Men's TropicVibe Shirt, Short-Sleeve is made of nylon and polyester, and is wrinkle-resistant.

### VectorstoreIndexCreator

可以用 `VectorstoreIndexCreator` 来简化操作，它底层也是用的 `RetrievalQA`

In [12]:
from langchain.text_splitter import CharacterTextSplitter
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import DocArrayInMemorySearch

file = './data/OutdoorClothingCatalog_1000.csv'
loader = CSVLoader(file_path=file)

index = VectorstoreIndexCreator(
    text_splitter=CharacterTextSplitter(chunk_size=10000, chunk_overlap=0),
    embedding=OpenAIEmbeddings(),
    vectorstore_cls=DocArrayInMemorySearch,
    vectorstore_kwargs={"k":2} # defined k as 2 meaning that we are only interested in getting two relevant text chunks.
).from_loaders([loader])

In [13]:
query =  "Please list all your shirts with sun protection in a table \
in markdown and summarize each one."
response = index.query(llm=ChatOpenAI(), question=query, chain_type="stuff")

In [14]:
display(Markdown(response))

| **Shirt Name** | **Description** | **UPF Rating** | **Fabric** | 
| --- | --- | --- | --- |
| Men's Tropical Plaid Short-Sleeve Shirt | Lightweight and wrinkle-resistant shirt with front and back cape venting, two front bellows pockets, and UPF 50+ sun protection. | UPF 50+ | 100% Polyester |
| Men's Plaid Tropic Shirt, Short-Sleeve | Ultracomfortable shirt originally designed for fishing with front and back cape venting, two front bellows pockets, and UPF 50+ sun protection. | UPF 50+ | 52% Polyester, 48% Nylon |
| Sun Shield Shirt by | High-performance sun shirt with moisture-wicking fabric that fits comfortably over swimsuits and is abrasion-resistant. Has UPF 50+ sun protection. | UPF 50+ | 78% Nylon, 22% Lycra Xtra Life fiber |
| Men's TropicVibe Shirt, Short-Sleeve | Lightweight and wrinkle-resistant shirt with front and back cape venting, two front bellows pockets, and built-in UPF 50+ sun protection. | UPF 50+ | Shell: 71% Nylon, 29% Polyester. Lining: 100% Polyester knit mesh. |

Each of these shirts provides UPF 50+ sun protection, blocking 98% of the sun's harmful rays. The Men's Tropical Plaid Short-Sleeve Shirt and Men's Plaid Tropic Shirt, Short-Sleeve both have front and back cape venting and two front bellows pockets. The Sun Shield Shirt by is designed to fit comfortably over swimsuits and is abrasion-resistant. The Men's TropicVibe Shirt, Short-Sleeve has built-in UPF 50+ sun protection and is wrinkle-resistant.

# LangChain: Evaluation

验证模型的质量

## Outline:

* Example generation
* Manual evaluation (and debuging)
* LLM-assisted evaluation

In [15]:
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI
from langchain.document_loaders import CSVLoader
from langchain.indexes import VectorstoreIndexCreator
from langchain.vectorstores import DocArrayInMemorySearch

In [16]:
file = 'data/OutdoorClothingCatalog_1000.csv'
loader = CSVLoader(file_path=file)
data = loader.load()

In [17]:
index = VectorstoreIndexCreator(
    vectorstore_cls=DocArrayInMemorySearch
).from_loaders([loader])

In [18]:
llm = ChatOpenAI(temperature = 0.0)
qa = RetrievalQA.from_chain_type(
    llm=llm, 
    chain_type="stuff", 
    retriever=index.vectorstore.as_retriever(), 
    verbose=True,
    chain_type_kwargs = {
        "document_separator": "<<<<>>>>>" # 每段文档间的分隔符， retriver会返回多段符合query提问的相关内容，每段内容之间的分隔符
    }
)

In [19]:
data[10]

Document(lc_kwargs={'page_content': "Unnamed: 0: 10\nname: Cozy Comfort Pullover Set, Stripe\ndescription: Perfect for lounging, this striped knit set lives up to its name. We used ultrasoft fabric and an easy design that's as comfortable at bedtime as it is when we have to make a quick run out.\n\nSize & Fit\n- Pants are Favorite Fit: Sits lower on the waist.\n- Relaxed Fit: Our most generous fit sits farthest from the body.\n\nFabric & Care\n- In the softest blend of 63% polyester, 35% rayon and 2% spandex.\n\nAdditional Features\n- Relaxed fit top with raglan sleeves and rounded hem.\n- Pull-on pants have a wide elastic waistband and drawstring, side pockets and a modern slim leg.\n\nImported.", 'metadata': {'source': 'data/OutdoorClothingCatalog_1000.csv', 'row': 10}}, page_content="Unnamed: 0: 10\nname: Cozy Comfort Pullover Set, Stripe\ndescription: Perfect for lounging, this striped knit set lives up to its name. We used ultrasoft fabric and an easy design that's as comfortable 

In [20]:
data[11]

Document(lc_kwargs={'page_content': 'Unnamed: 0: 11\nname: Ultra-Lofty 850 Stretch Down Hooded Jacket\ndescription: This technical stretch down jacket from our DownTek collection is sure to keep you warm and comfortable with its full-stretch construction providing exceptional range of motion. With a slightly fitted style that falls at the hip and best with a midweight layer, this jacket is suitable for light activity up to 20° and moderate activity up to -30°. The soft and durable 100% polyester shell offers complete windproof protection and is insulated with warm, lofty goose down. Other features include welded baffles for a no-stitch construction and excellent stretch, an adjustable hood, an interior media port and mesh stash pocket and a hem drawcord. Machine wash and dry. Imported.', 'metadata': {'source': 'data/OutdoorClothingCatalog_1000.csv', 'row': 11}}, page_content='Unnamed: 0: 11\nname: Ultra-Lofty 850 Stretch Down Hooded Jacket\ndescription: This technical stretch down jack

### Hard-coded examples

手工准备问题

In [21]:
examples = [
    {
        "query": "Do the Cozy Comfort Pullover Set\
        have side pockets?",
        "answer": "Yes"
    },
    {
        "query": "What collection is the Ultra-Lofty \
        850 Stretch Down Hooded Jacket from?",
        "answer": "The DownTek collection"
    }
]

### Using LLM to generate examples

In [30]:
from langchain.evaluation.qa import QAGenerateChain

In [23]:
example_gen_chain = QAGenerateChain.from_llm(ChatOpenAI())

In [24]:
new_examples = example_gen_chain.apply_and_parse(
    [{"doc": t} for t in data[:5]]
)

In [25]:
new_examples[0]

{'query': "What is the weight of one pair of Women's Campside Oxfords?",
 'answer': "The approximate weight of one pair of Women's Campside Oxfords is 1 lb.1 oz."}

In [28]:
data[0]

Document(lc_kwargs={'page_content': "Unnamed: 0: 0\nname: Women's Campside Oxfords\ndescription: This ultracomfortable lace-to-toe Oxford boasts a super-soft canvas, thick cushioning, and quality construction for a broken-in feel from the first time you put them on. \n\nSize & Fit: Order regular shoe size. For half sizes not offered, order up to next whole size. \n\nSpecs: Approx. weight: 1 lb.1 oz. per pair. \n\nConstruction: Soft canvas material for a broken-in feel and look. Comfortable EVA innersole with Cleansport NXT® antimicrobial odor control. Vintage hunt, fish and camping motif on innersole. Moderate arch contour of innersole. EVA foam midsole for cushioning and support. Chain-tread-inspired molded rubber outsole with modified chain-tread pattern. Imported. \n\nQuestions? Please contact us for any inquiries.", 'metadata': {'source': 'data/OutdoorClothingCatalog_1000.csv', 'row': 0}}, page_content="Unnamed: 0: 0\nname: Women's Campside Oxfords\ndescription: This ultracomfortab

In [26]:
len(new_examples)

5

## Manual Evaluation

In [32]:
import langchain
langchain.debug = True # 打开调试模式

In [33]:
examples += new_examples
qa.run(examples[0]['query'])

[32;1m[1;3m[chain/start][0m [1m[1:chain:RetrievalQA] Entering Chain run with input:
[0m{
  "query": "Do the Cozy Comfort Pullover Set        have side pockets?"
}
[32;1m[1;3m[chain/start][0m [1m[1:chain:RetrievalQA > 2:chain:StuffDocumentsChain] Entering Chain run with input:
[0m[inputs]
[32;1m[1;3m[chain/start][0m [1m[1:chain:RetrievalQA > 2:chain:StuffDocumentsChain > 3:chain:LLMChain] Entering Chain run with input:
[0m{
  "question": "Do the Cozy Comfort Pullover Set        have side pockets?",
  "context": "Unnamed: 0: 73\nname: Cozy Cuddles Knit Pullover Set\ndescription: Perfect for lounging, this knit set lives up to its name. We used ultrasoft fabric and an easy design that's as comfortable at bedtime as it is when we have to make a quick run out. \n\nSize & Fit \nPants are Favorite Fit: Sits lower on the waist. \nRelaxed Fit: Our most generous fit sits farthest from the body. \n\nFabric & Care \nIn the softest blend of 63% polyester, 35% rayon and 2% spandex.\n\

'Yes, the Cozy Comfort Pullover Set has side pockets.'

> 上面的回复跟我们测试例子的预期一致，表明模型这个回答是正确的

In [34]:
langchain.debug = False

## LLM assisted evaluation

In [35]:
predictions = qa.apply(examples)

Error in on_chain_start callback: 'name'
Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 1.0 seconds as it raised RateLimitError: That model is currently overloaded with other requests. You can retry your request, or contact us through our help center at help.openai.com if the error persists. (Please include the request ID a93cbc7abd6d55d7609133d314a6642e in your message.).
Error in on_chain_start callback: 'name'



[1m> Finished chain.[0m


Error in on_chain_start callback: 'name'



[1m> Finished chain.[0m


Error in on_chain_start callback: 'name'



[1m> Finished chain.[0m


Error in on_chain_start callback: 'name'



[1m> Finished chain.[0m


Error in on_chain_start callback: 'name'



[1m> Finished chain.[0m


Error in on_chain_start callback: 'name'



[1m> Finished chain.[0m


Error in on_chain_start callback: 'name'



[1m> Finished chain.[0m


Error in on_chain_start callback: 'name'



[1m> Finished chain.[0m


Error in on_chain_start callback: 'name'



[1m> Finished chain.[0m


Error in on_chain_start callback: 'name'



[1m> Finished chain.[0m


Error in on_chain_start callback: 'name'



[1m> Finished chain.[0m

[1m> Finished chain.[0m


In [36]:
from langchain.evaluation.qa import QAEvalChain

In [37]:
llm = ChatOpenAI(temperature=0)
eval_chain = QAEvalChain.from_llm(llm)

In [38]:
graded_outputs = eval_chain.evaluate(examples, predictions)

In [39]:
for i, eg in enumerate(examples):
    print(f"Example {i}:")
    print("Question: " + predictions[i]['query'])
    print("Real Answer: " + predictions[i]['answer'])
    print("Predicted Answer: " + predictions[i]['result'])
    print("Predicted Grade: " + graded_outputs[i]['text'])
    print()

Example 0:
Question: Do the Cozy Comfort Pullover Set        have side pockets?
Real Answer: Yes
Predicted Answer: Yes, the Cozy Comfort Pullover Set has side pockets.
Predicted Grade: CORRECT

Example 1:
Question: What collection is the Ultra-Lofty         850 Stretch Down Hooded Jacket from?
Real Answer: The DownTek collection
Predicted Answer: The Ultra-Lofty 850 Stretch Down Hooded Jacket is from the DownTek collection.
Predicted Grade: CORRECT

Example 2:
Question: What is the weight of one pair of Women's Campside Oxfords?
Real Answer: The approximate weight of one pair of Women's Campside Oxfords is 1 lb.1 oz.
Predicted Answer: The weight of one pair of Women's Campside Oxfords is approximately 1 lb. 1 oz.
Predicted Grade: CORRECT

Example 3:
Question: What are the dimensions of the small and medium Recycled Waterhog Dog Mat, Chevron Weave?
Real Answer: The small Recycled Waterhog Dog Mat, Chevron Weave has dimensions of 18" x 28" and the medium has dimensions of 22.5" x 34.5".
