# LangChain: Q&A over Documents

An example might be a tool that would allow you to query a product catalog for items of interest.

In [13]:
%pip install -r requirements.txt

import os

from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file

from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_core.prompts import ChatPromptTemplate
from langchain_classic.chains import LLMChain

llm_model = "gpt-4o-mini"

Note: you may need to restart the kernel to use updated packages.


In [8]:
from langchain_classic.chains import RetrievalQA
from langchain_community.document_loaders import CSVLoader
from langchain_classic.vectorstores import DocArrayInMemorySearch
from IPython.display import display, Markdown
from langchain_classic.llms import OpenAI

In [None]:
from langchain_community.document_loaders import CSVLoader

file = "OutdoorClothingCatalog_1000.csv"
loader = CSVLoader(file_path=file, encoding="utf-8")

In [None]:
from langchain_classic.indexes import VectorstoreIndexCreator
# %pip install docarray

Note: you may need to restart the kernel to use updated packages.


In [22]:
index = VectorstoreIndexCreator(
    vectorstore_cls=DocArrayInMemorySearch, 
    embedding=OpenAIEmbeddings()
).from_loaders([loader])

In [23]:
query ="Please list all your shirts with sun protection \
in a table in markdown and summarize each one."

**Note**:
- The notebook uses `langchain==0.0.179` and `openai==0.27.7`
- For these library versions, `VectorstoreIndexCreator` uses `text-davinci-003` as the base model, which has been deprecated since 1 January 2024.
- The replacement model, `gpt-3.5-turbo-instruct` will be used instead for the `query`.
- The `response` format might be different than the video because of this replacement model.

In [24]:
llm_replacement_model = OpenAI(temperature=0, 
                               model='gpt-3.5-turbo-instruct')

response = index.query(query, 
                       llm = llm_replacement_model)

  llm_replacement_model = OpenAI(temperature=0,


In [25]:
display(Markdown(response))



| Name | Description | Sun Protection Rating |
| --- | --- | --- |
| Men's Tropical Plaid Short-Sleeve Shirt | Made of 100% polyester, UPF 50+ rating, wrinkle-resistant, front and back cape venting, two front bellows pockets | SPF 50+, blocks 98% of harmful UV rays |
| Men's Plaid Tropic Shirt, Short-Sleeve | Made of 52% polyester and 48% nylon, UPF 50+ rating, SunSmart technology, wrinkle-free, front and back cape venting, two front bellows pockets | SPF 50+, blocks 98% of harmful UV rays |
| Men's TropicVibe Shirt, Short-Sleeve | Made of 71% nylon and 29% polyester, UPF 50+ rating, front and back cape venting, two front bellows pockets | SPF 50+, blocks 98% of harmful UV rays |
| Sun Shield Shirt | Made of 78% nylon and 22% Lycra Xtra Life fiber, UPF 50+ rating, moisture-wicking, abrasion-resistant, fits over swimsuit | SPF 50+, blocks 98% of harmful UV rays |

## Step By Step

In [29]:
from langchain_community.document_loaders import CSVLoader
loader = CSVLoader(file_path=file, encoding="utf-8")


In [30]:
docs = loader.load()

In [31]:
docs[0]

Document(metadata={'source': 'OutdoorClothingCatalog_1000.csv', 'row': 0}, page_content=": 0\nname: Women's Campside Oxfords\ndescription: This ultracomfortable lace-to-toe Oxford boasts a super-soft canvas, thick cushioning, and quality construction for a broken-in feel from the first time you put them on. \n\nSize & Fit: Order regular shoe size. For half sizes not offered, order up to next whole size. \n\nSpecs: Approx. weight: 1 lb.1 oz. per pair. \n\nConstruction: Soft canvas material for a broken-in feel and look. Comfortable EVA innersole with Cleansport NXT® antimicrobial odor control. Vintage hunt, fish and camping motif on innersole. Moderate arch contour of innersole. EVA foam midsole for cushioning and support. Chain-tread-inspired molded rubber outsole with modified chain-tread pattern. Imported. \n\nQuestions? Please contact us for any inquiries.")

In [33]:
embeddings = OpenAIEmbeddings()

In [34]:
embed = embeddings.embed_query("Hi my name is Harrison")

In [35]:
print(len(embed))

1536


In [36]:
print(embed[:5])

[-0.021964654326438904, 0.006758837960660458, -0.01824948936700821, -0.03923514857888222, -0.014007173478603363]


In [37]:
db = DocArrayInMemorySearch.from_documents(
    docs, 
    embeddings
)

In [38]:
query = "Please suggest a shirt with sunblocking"

In [39]:
docs = db.similarity_search(query)

In [40]:
len(docs)

4

In [41]:
docs[0]

Document(metadata={'source': 'OutdoorClothingCatalog_1000.csv', 'row': 255}, page_content=': 255\nname: Sun Shield Shirt by\ndescription: "Block the sun, not the fun – our high-performance sun shirt is guaranteed to protect from harmful UV rays. \n\nSize & Fit: Slightly Fitted: Softly shapes the body. Falls at hip.\n\nFabric & Care: 78% nylon, 22% Lycra Xtra Life fiber. UPF 50+ rated – the highest rated sun protection possible. Handwash, line dry.\n\nAdditional Features: Wicks moisture for quick-drying comfort. Fits comfortably over your favorite swimsuit. Abrasion resistant for season after season of wear. Imported.\n\nSun Protection That Won\'t Wear Off\nOur high-performance fabric provides SPF 50+ sun protection, blocking 98% of the sun\'s harmful rays. This fabric is recommended by The Skin Cancer Foundation as an effective UV protectant.')

In [42]:
retriever = db.as_retriever()

In [43]:
llm = ChatOpenAI(temperature = 0.0, model=llm_model)

In [44]:
qdocs = "".join([docs[i].page_content for i in range(len(docs))])


In [46]:
response = llm.invoke(f"{qdocs} Question: Please list all your \
shirts with sun protection in a table in markdown and summarize each one.") 


In [48]:
display(Markdown(response.content))

Here’s a summary of the shirts with sun protection in a markdown table format:

| Name                                   | Description Summary                                                                                                                                                                                                                     |
|----------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Sun Shield Shirt                       | High-performance sun shirt with UPF 50+ protection, blocking 98% of UV rays. Slightly fitted, made of 78% nylon and 22% Lycra Xtra Life fiber. Moisture-wicking, abrasion-resistant, and comfortable over swimsuits. Handwash recommended.  |
| Men's Plaid Tropic Shirt, Short-Sleeve| Lightweight, UPF 50+ rated shirt originally designed for fishing. Made of 52% polyester and 48% nylon, it features wrinkle-free fabric, evaporates perspiration quickly, and includes front and back cape venting with two bellows pockets.  |
| Men's TropicVibe Shirt, Short-Sleeve  | Lightweight sun-protection shirt with UPF 50+ rating. Traditional fit, made of 71% nylon and 29% polyester with a 100% polyester knit mesh lining. Features include wrinkle resistance, cape venting, and two front bellows pockets.          |
| Men's Tropical Plaid Short-Sleeve Shirt| Lightest hot-weather shirt with UPF 50+ protection, made of 100% polyester. Traditional fit, wrinkle-resistant, with front and back cape venting and two front bellows pockets. Provides high sun protection by blocking 98% of UV rays.     |

This table summarizes the key features and benefits of each shirt, focusing on their sun protection capabilities.

In [49]:
qa_stuff = RetrievalQA.from_chain_type(
    llm=llm, 
    chain_type="stuff", 
    retriever=retriever, 
    verbose=True
)

In [50]:
query =  "Please list all your shirts with sun protection in a table \
in markdown and summarize each one."

In [51]:
response = qa_stuff.run(query)

  response = qa_stuff.run(query)




[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


In [52]:
display(Markdown(response))

Here is a table summarizing the shirts with sun protection:

| Name                                      | Description Summary                                                                                      |
|-------------------------------------------|---------------------------------------------------------------------------------------------------------|
| Men's Tropical Plaid Short-Sleeve Shirt   | Lightweight, UPF 50+ sun protection, 100% polyester, wrinkle-resistant, with cape venting and bellows pockets. |
| Men's Plaid Tropic Shirt, Short-Sleeve    | Designed for fishing, UPF 50+ coverage, made of 52% polyester and 48% nylon, moisture-wicking, with cape venting and bellows pockets. |
| Men's TropicVibe Shirt, Short-Sleeve      | Lightweight, UPF 50+ protection, 71% nylon and 29% polyester, wrinkle-resistant, with cape venting and bellows pockets. |
| Sun Shield Shirt                          | Slightly fitted, UPF 50+ protection, 78% nylon and 22% Lycra, moisture-wicking, abrasion resistant, recommended by The Skin Cancer Foundation. |

In [53]:
response = index.query(query, llm=llm)

In [54]:
index = VectorstoreIndexCreator(
    vectorstore_cls=DocArrayInMemorySearch,
    embedding=embeddings,
).from_loaders([loader])