# LangChain: Q&A over Documents

Watch:

https://learn.deeplearning.ai/langchain/lesson/5/question-and-answer 

An example might be a tool that would allow you to query a product catalog for items of interest.

In [1]:
import os
import openai

from dotenv import load_dotenv, find_dotenv
_ : bool = load_dotenv(find_dotenv()) # read local .env file

In [2]:
llm_model: str = "gpt-3.5-turbo-1106"

In [10]:
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI
from langchain.document_loaders import CSVLoader
from langchain.vectorstores import DocArrayInMemorySearch
from IPython.display import display
from IPython.core.display import Markdown

In [4]:
file: str = 'OutdoorClothingCatalog_1000.csv'
loader: CSVLoader = CSVLoader(file_path=file)

In [5]:
from langchain.indexes import VectorstoreIndexCreator
from langchain.indexes.vectorstore import VectorStoreIndexWrapper

In [6]:
index: VectorStoreIndexWrapper = VectorstoreIndexCreator(
    vectorstore_cls=DocArrayInMemorySearch
).from_loaders([loader])

Retrying langchain.embeddings.openai.embed_with_retry.<locals>._embed_with_retry in 4.0 seconds as it raised RateLimitError: Rate limit reached for text-embedding-ada-002 in organization org-9oP4S86HpNNbJMudY026ckbW on tokens per min. Limit: 150000 / min. Current: 130962 / min. Visit https://platform.openai.com/account/rate-limits to learn more. You can increase your rate limit by adding a payment method to your account at https://platform.openai.com/account/billing..


In [7]:
query: str ="Please list all your shirts with sun protection \
in a table in markdown and summarize each one."

In [8]:
response: str = index.query(query)

Retrying langchain.embeddings.openai.embed_with_retry.<locals>._embed_with_retry in 4.0 seconds as it raised RateLimitError: Rate limit reached for text-embedding-ada-002 in organization org-9oP4S86HpNNbJMudY026ckbW on requests per min. Limit: 3 / min. Please try again in 20s. Visit https://platform.openai.com/account/rate-limits to learn more. You can increase your rate limit by adding a payment method to your account at https://platform.openai.com/account/billing..
Retrying langchain.embeddings.openai.embed_with_retry.<locals>._embed_with_retry in 4.0 seconds as it raised RateLimitError: Rate limit reached for text-embedding-ada-002 in organization org-9oP4S86HpNNbJMudY026ckbW on requests per min. Limit: 3 / min. Please try again in 20s. Visit https://platform.openai.com/account/rate-limits to learn more. You can increase your rate limit by adding a payment method to your account at https://platform.openai.com/account/billing..


In [9]:
display(Markdown(response))



| Name | Description |
| --- | --- |
| Men's Tropical Plaid Short-Sleeve Shirt | UPF 50+ rated, 100% polyester, wrinkle-resistant, front and back cape venting, two front bellows pockets |
| Men's Plaid Tropic Shirt, Short-Sleeve | UPF 50+ rated, 52% polyester and 48% nylon, machine washable and dryable, front and back cape venting, two front bellows pockets |
| Men's TropicVibe Shirt, Short-Sleeve | UPF 50+ rated, 71% nylon, 29% polyester, 100% polyester knit mesh, machine washable and dryable, front and back cape venting, two front bellows pockets |
| Sun Shield Shirt by | UPF 50+ rated, 78% nylon, 22% Lycra Xtra Life fiber, handwash, line dry, wicks moisture, fits comfortably over swimsuit, abrasion resistant |

All four shirts provide UPF 50+ sun protection, blocking 98% of the sun's harmful rays. The Men's Tropical Plaid Short-Sleeve Shirt is made of 100% polyester and is wrinkle-

![LLM_on_Documents](llm.png)

![Embeddings](embeddings.png)

![Vector_Database](vector_db.png)

![Vector_Database](vector_db2.png)

## Step By Step

In [12]:
from langchain.document_loaders import CSVLoader

loader: CSVLoader = CSVLoader(file_path=file)

In [13]:
from langchain.schema import Document

docs: list[Document] = loader.load()

In [14]:
doc : Document = docs[0]

In [15]:
from langchain.embeddings import OpenAIEmbeddings

embeddings: OpenAIEmbeddings = OpenAIEmbeddings()

In [16]:
embed: list[float] = embeddings.embed_query("Hi my name is Harrison")

In [17]:
print(len(embed))

1536


In [18]:
print(embed[:5])

[-0.021867523073664925, 0.0068068644355059665, -0.01818099626590071, -0.039104867912227115, -0.014066680401294425]


In [19]:
db: DocArrayInMemorySearch  = DocArrayInMemorySearch.from_documents(
    docs, 
    embeddings
)

Retrying langchain.embeddings.openai.embed_with_retry.<locals>._embed_with_retry in 4.0 seconds as it raised RateLimitError: Rate limit reached for text-embedding-ada-002 in organization org-9oP4S86HpNNbJMudY026ckbW on tokens per min. Limit: 150000 / min. Current: 1 / min. Visit https://platform.openai.com/account/rate-limits to learn more. You can increase your rate limit by adding a payment method to your account at https://platform.openai.com/account/billing..
Retrying langchain.embeddings.openai.embed_with_retry.<locals>._embed_with_retry in 4.0 seconds as it raised RateLimitError: Rate limit reached for text-embedding-ada-002 in organization org-9oP4S86HpNNbJMudY026ckbW on tokens per min. Limit: 150000 / min. Current: 1 / min. Visit https://platform.openai.com/account/rate-limits to learn more. You can increase your rate limit by adding a payment method to your account at https://platform.openai.com/account/billing..
Retrying langchain.embeddings.openai.embed_with_retry.<locals>._

RateLimitError: Rate limit reached for text-embedding-ada-002 in organization org-9oP4S86HpNNbJMudY026ckbW on tokens per min. Limit: 150000 / min. Current: 1 / min. Visit https://platform.openai.com/account/rate-limits to learn more. You can increase your rate limit by adding a payment method to your account at https://platform.openai.com/account/billing.

In [None]:
query: str = "Please suggest a shirt with sunblocking"

In [None]:
docs: list[Document] = db.similarity_search(query)

In [None]:
len(docs)

In [None]:
docs[0]

In [None]:
from langchain.schema.vectorstore import VectorStoreRetriever

retriever: VectorStoreRetriever = db.as_retriever()

In [None]:
llm = ChatOpenAI(temperature = 0.0, model=llm_model)

In [None]:
qdocs: str = "".join([docs[i].page_content for i in range(len(docs))])

In [None]:
response: str = llm.call_as_llm(f"{qdocs} Question: Please list all your \
shirts with sun protection in a table in markdown and summarize each one.") 


In [None]:
display(Markdown(response))

In [None]:
from langchain.chains.retrieval_qa.base import BaseRetrievalQA

qa_stuff: BaseRetrievalQA = RetrievalQA.from_chain_type(
    llm=llm, 
    chain_type="stuff", 
    retriever=retriever, 
    verbose=True
)

In [None]:
query: str =  "Please list all your shirts with sun protection in a table \
in markdown and summarize each one."

In [None]:
response: str = qa_stuff.run(query)

In [None]:
display(Markdown(response))

In [None]:
response: str = index.query(query, llm=llm)

In [None]:
index: VectorStoreIndexWrapper = VectorstoreIndexCreator(
    vectorstore_cls=DocArrayInMemorySearch,
    embedding=embeddings,
).from_loaders([loader])