# LangChain: Q&A over Documents

An example might be a tool that would allow you to query a product catalog for items of interest.

![QnA](pics/lesson%204%20-%20QnA.png)
![Embeddings](pics/lesson%204%20-%20Embeddings.png)
![Vectordatabase](pics/lesson%204%20-%20vectordatabase.png)
![Vectordatabase](pics/lesson%204%20-%20vectordatabase%202.png)
![Stuff method](pics/lesson%204%20-%20stuff%20method.png)
![Additional methods](pics/lesson%204%20-%20additional%20methods.png)

In [1]:
from dotenv import load_dotenv

env_path="../.env"
load_dotenv(dotenv_path=env_path)

True

In [31]:
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI
from langchain.document_loaders import CSVLoader
from langchain.vectorstores import DocArrayInMemorySearch
from IPython.display import display, Markdown

from langchain.indexes import VectorstoreIndexCreator
from langchain.embeddings import OpenAIEmbeddings

from sklearn.metrics.pairwise import cosine_similarity
import numpy as np

In [32]:
file = 'data/Lesson 4 - OutdoorClothingCatalog_1000.csv'

loader = CSVLoader(file_path=file, encoding="utf-8")
docs = loader.load()

In [33]:
print("Total rows:", len(docs))
print("Total symbols:",sum([len(docs[i].page_content) for i in range(len(docs))]))
print("\nExample: -----------")
docs[0]

Total rows: 1000
Total symbols: 765924

Example: -----------


Document(page_content=": 0\nname: Women's Campside Oxfords\ndescription: This ultracomfortable lace-to-toe Oxford boasts a super-soft canvas, thick cushioning, and quality construction for a broken-in feel from the first time you put them on. \n\nSize & Fit: Order regular shoe size. For half sizes not offered, order up to next whole size. \n\nSpecs: Approx. weight: 1 lb.1 oz. per pair. \n\nConstruction: Soft canvas material for a broken-in feel and look. Comfortable EVA innersole with Cleansport NXT® antimicrobial odor control. Vintage hunt, fish and camping motif on innersole. Moderate arch contour of innersole. EVA foam midsole for cushioning and support. Chain-tread-inspired molded rubber outsole with modified chain-tread pattern. Imported. \n\nQuestions? Please contact us for any inquiries.", metadata={'source': 'data/Lesson 4 - OutdoorClothingCatalog_1000.csv', 'row': 0})

In [34]:
embeddings = OpenAIEmbeddings()

In [35]:
embed_0 = embeddings.embed_query("Hi my name is Harrison")
embed_1 = embeddings.embed_query("Hi my name is Henry")
embed_2 = embeddings.embed_query("This car is broken")
embed_3 = embeddings.embed_query("Привет, мое имя Гаррисон")


print(len(embed_0))
print(embed_0[:5])

# Reshape to 2D arrays because sklearn's cosine_similarity expects 2D arrays
embed_0_np = np.array(embed_0).reshape(1, -1)
embed_1_np = np.array(embed_1).reshape(1, -1)
embed_2_np = np.array(embed_2).reshape(1, -1)
embed_3_np = np.array(embed_3).reshape(1, -1)

# Косинусная близость
# -1 - противоположность
#  0 - ортогональные (нет связи)
#  1 - идентичные

print("Близость embed_0 и embed_1:", cosine_similarity(embed_0_np, embed_1_np)[0][0])
print("Близость embed_0 и embed_2:",cosine_similarity(embed_0_np, embed_2_np)[0][0])
print("Близость embed_0 и embed_3:",cosine_similarity(embed_0_np, embed_3_np)[0][0])

1536
[-0.021913960576057434, 0.006774206645786762, -0.018190348520874977, -0.039148248732089996, -0.014089343138039112]
Близость embed_0 и embed_1: 0.8918040809585281
Близость embed_0 и embed_2: 0.7504907710763276
Близость embed_0 и embed_3: 0.8609367148963373


## Vectorstore Index

In [36]:
index = VectorstoreIndexCreator(
    vectorstore_cls=DocArrayInMemorySearch,
    embedding=embeddings,
).from_loaders([loader])

In [37]:
query ="Please list all your shirts with sun protection \
in a table in markdown and summarize each one."

response = index.query(query)

In [38]:
display(Markdown(response))



| Name | Description |
| --- | --- |
| Men's Tropical Plaid Short-Sleeve Shirt | UPF 50+ rated, 100% polyester, wrinkle-resistant, front and back cape venting, two front bellows pockets |
| Men's Plaid Tropic Shirt, Short-Sleeve | UPF 50+ rated, 52% polyester and 48% nylon, machine washable and dryable, front and back cape venting, two front bellows pockets |
| Men's TropicVibe Shirt, Short-Sleeve | UPF 50+ rated, 71% Nylon, 29% Polyester, 100% Polyester knit mesh, machine wash and dry, front and back cape venting, two front bellows pockets |
| Sun Shield Shirt by | UPF 50+ rated, 78% nylon, 22% Lycra Xtra Life fiber, handwash, line dry, wicks moisture, fits comfortably over swimsuit, abrasion resistant |

All four shirts provide UPF 50+ sun protection, blocking 98% of the sun's harmful rays. The Men's Tropical Plaid Short-Sleeve Shirt is made of 100% polyester and is wrinkle-resistant

## Vector Index Step by step

In [39]:
db = DocArrayInMemorySearch.from_documents(
    docs, 
    embeddings
)

In [42]:
# response_docs = db.similarity_search(query, k=4)

In [49]:
response_docs = db.similarity_search_with_score(query, 4)

In [54]:
print(len(response_docs))
response_docs[0]

4


(Document(page_content=": 618\nname: Men's Tropical Plaid Short-Sleeve Shirt\ndescription: Our lightest hot-weather shirt is rated UPF 50+ for superior protection from the sun's UV rays. With a traditional fit that is relaxed through the chest, sleeve, and waist, this fabric is made of 100% polyester and is wrinkle-resistant. With front and back cape venting that lets in cool breezes and two front bellows pockets, this shirt is imported and provides the highest rated sun protection possible. \n\nSun Protection That Won't Wear Off. Our high-performance fabric provides SPF 50+ sun protection, blocking 98% of the sun's harmful rays.", metadata={'source': 'data/Lesson 4 - OutdoorClothingCatalog_1000.csv', 'row': 618}),
 0.8367698057250467)

In [58]:
# Create a ChatOpenAI object with a 'temperature' parameter set to 0.0.
# 'temperature' determines the randomness of the model's output. A value of 0.0
# makes the output deterministic, meaning it will always return the most likely 
# response. The higher the temperature, the more random the output will be.
llm = ChatOpenAI(temperature = 0.0)

# Join all the page_content of each document in the docs list into a single string.
qdocs = "".join([response_docs[i][0].page_content for i in range(len(response_docs))])

# Call the language model (llm) with a string that consists of all the page_contents and a specific question.
response = llm.call_as_llm(f"{qdocs} Question: {query}") 

In [59]:
display(Markdown(response))

| Name | Description | Sun Protection Rating |
| --- | --- | --- |
| Men's Tropical Plaid Short-Sleeve Shirt | Made of 100% polyester, this shirt is rated UPF 50+ for superior protection from the sun's UV rays. It has front and back cape venting, two front bellows pockets, and is wrinkle-resistant. | SPF 50+ |
| Men's Plaid Tropic Shirt, Short-Sleeve | This shirt is made with 52% polyester and 48% nylon, and is rated UPF 50+. It has front and back cape venting, two front bellows pockets, and is wrinkle-free and quickly evaporates perspiration. | SPF 50+ |
| Men's TropicVibe Shirt, Short-Sleeve | Made of 71% Nylon and 29% Polyester, this shirt has built-in UPF 50+ sun protection. It has front and back cape venting, two front bellows pockets, and is wrinkle-resistant. | SPF 50+ |
| Sun Shield Shirt | Made of 78% nylon and 22% Lycra Xtra Life fiber, this shirt is UPF 50+ rated and wicks moisture for quick-drying comfort. It fits comfortably over swimsuits and is abrasion-resistant. | SPF 50+ |

Each of these shirts provides UPF 50+ sun protection, blocking 98% of the sun's harmful rays. They are made of high-performance fabrics that are wrinkle-resistant, quick-drying, and comfortable to wear. They also feature front and back cape venting and two front bellows pockets. The Sun Shield Shirt is specifically designed to fit comfortably over swimsuits and is abrasion-resistant.

## RetrievalQA

In [60]:
# Create a retriever object using a database (db) method. This retriever object
# can be used to fetch data or information from the database.
retriever = db.as_retriever()

In [62]:
qa_stuff = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=retriever,
    verbose=True
)

In [63]:
response = qa_stuff.run(query)



[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


In [64]:
display(Markdown(response))

| Shirt Number | Name | Description |
| --- | --- | --- |
| 618 | Men's Tropical Plaid Short-Sleeve Shirt | This shirt is made of 100% polyester and is wrinkle-resistant. It has front and back cape venting that lets in cool breezes and two front bellows pockets. It is rated UPF 50+ for superior protection from the sun's UV rays. |
| 374 | Men's Plaid Tropic Shirt, Short-Sleeve | This shirt is made with 52% polyester and 48% nylon. It is machine washable and dryable. It has front and back cape venting, two front bellows pockets, and is rated to UPF 50+. |
| 535 | Men's TropicVibe Shirt, Short-Sleeve | This shirt is made of 71% Nylon and 29% Polyester. It has front and back cape venting that lets in cool breezes and two front bellows pockets. It is rated UPF 50+ for superior protection from the sun's UV rays. |
| 255 | Sun Shield Shirt | This shirt is made of 78% nylon and 22% Lycra Xtra Life fiber. It is handwashable and line dry. It is rated UPF 50+ for superior protection from the sun's UV rays. It is abrasion-resistant and wicks moisture for quick-drying comfort. |

All of these shirts provide UPF 50+ sun protection, blocking 98% of the sun's harmful rays. They are all designed to be lightweight and comfortable in hot weather. They all have front and back cape venting that lets in cool breezes and two front bellows pockets. The Men's Tropical Plaid Short-Sleeve Shirt is made of 100% polyester and is wrinkle-resistant. The Men's Plaid Tropic Shirt, Short-Sleeve is made with 52% polyester and 48% nylon. The Men's TropicVibe Shirt, Short-Sleeve is made of 71% Nylon and 29% Polyester. The Sun Shield Shirt is made of 78% nylon and 22% Lycra Xtra Life fiber and is abrasion-resistant and wicks moisture for quick-drying comfort.