# LangChain: Q&A over Documents

An example might be a tool that would allow you to query a product catalog for items of interest.

In [None]:
#pip install --upgrade langchain

In [1]:
import os

from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file

Note: LLM's do not always produce the same results. When executing the code in your notebook, you may get slightly different answers that those in the video.

In [2]:
llm_model = "gpt-4-32k"

In [3]:
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI
from langchain.document_loaders import CSVLoader
from langchain.vectorstores import DocArrayInMemorySearch
from IPython.display import display, Markdown

In [4]:
file = 'Chat-with-Your-Data\sameer_mahajan.pdf'
from langchain.document_loaders import PyPDFLoader
loader = PyPDFLoader(file_path=file)

In [5]:
from langchain.indexes import VectorstoreIndexCreator

In [None]:
#!pip install docarray

In [6]:
from langchain.embeddings.azure_openai import AzureOpenAIEmbeddings

embedding = AzureOpenAIEmbeddings(
    deployment = "embeddings",
    openai_api_version = os.environ['OPENAI_DEPLOYMENT_VERSION'],
    openai_api_key = os.environ['OPENAI_API_KEY'],
    azure_endpoint = os.environ['OPENAI_ENDPOINT'])

index = VectorstoreIndexCreator(
    vectorstore_cls=DocArrayInMemorySearch,
    embedding=embedding,
).from_loaders([loader])

In [7]:
query ="what is this person's education?"

In [8]:
from langchain.chat_models import AzureChatOpenAI

chat = AzureChatOpenAI(
    openai_api_base=os.environ["OPENAI_ENDPOINT"],
    openai_api_version=os.environ["OPENAI_DEPLOYMENT_VERSION"],
    deployment_name="exploratory-deployment",
    openai_api_key=os.environ["OPENAI_API_KEY"],
    openai_api_type = "azure",
    temperature=0.0
)
response = index.query(question=query,llm=chat)



In [9]:
display(Markdown(response))

Sameer Mahajan has a BTech in Computer Science and Engineering from the Indian Institute of Technology, Bombay, India, which he completed in 1993. He also has an MS in Computer Science from Georgia Institute of Technology, Atlanta, USA, with a minor in MIS, which he completed in December 1997.

In [11]:
print(index.query(question="What is this person's name?",llm=chat))
print(index.query(question="which cities are listed here?",llm=chat))
print(index.query(question="what does this document contain?",llm=chat))
print(index.query(question="what are universities in this document?",llm=chat))
print(index.query(question="what is profile summary?",llm=chat))

This person's name is Sameer Mahajan.
The cities listed here are Pune, India; Portland, USA; and Leuven, Belgium.
This document appears to be a professional resume or CV. It contains information about the individual's work experience, including roles as a Software Design Engineer/Test and Software Development Engineer at various companies. It also details specific projects and responsibilities within these roles, such as working on file system adapter, ghosted sync, distributed test infrastructure, and more. The document also lists the individual's education, including a MS in Computer Science from Georgia Institute of Technology and a B. Tech. in Computer Science and Engineering from Indian Institute of Technology. Additionally, it mentions several US patents held by the individual and their participation in various conferences and symposiums.
The universities mentioned in this document are Georgia Institute of Technology in Atlanta, USA and Indian Institute of Technology in Bombay, I

## Step By Step

In [None]:
from langchain.document_loaders import CSVLoader
loader = CSVLoader(file_path=file)

In [None]:
docs = loader.load()

In [None]:
docs[0]

In [None]:
from langchain.embeddings import OpenAIEmbeddings
embeddings = OpenAIEmbeddings()

In [None]:
embed = embeddings.embed_query("Hi my name is Harrison")

In [None]:
print(len(embed))

In [None]:
print(embed[:5])

In [None]:
db = DocArrayInMemorySearch.from_documents(
    docs, 
    embeddings
)

In [None]:
query = "Please suggest a shirt with sunblocking"

In [None]:
docs = db.similarity_search(query)

In [None]:
len(docs)

In [None]:
docs[0]

In [None]:
retriever = db.as_retriever()

In [None]:
llm = ChatOpenAI(temperature = 0.0, model=llm_model)

In [None]:
qdocs = "".join([docs[i].page_content for i in range(len(docs))])


In [None]:
response = llm.call_as_llm(f"{qdocs} Question: Please list all your \
shirts with sun protection in a table in markdown and summarize each one.") 


In [None]:
display(Markdown(response))

In [None]:
qa_stuff = RetrievalQA.from_chain_type(
    llm=llm, 
    chain_type="stuff", 
    retriever=retriever, 
    verbose=True
)

In [None]:
query =  "Please list all your shirts with sun protection in a table \
in markdown and summarize each one."

In [None]:
response = qa_stuff.run(query)

In [None]:
display(Markdown(response))

In [None]:
response = index.query(query, llm=llm)

In [None]:
index = VectorstoreIndexCreator(
    vectorstore_cls=DocArrayInMemorySearch,
    embedding=embeddings,
).from_loaders([loader])

Reminder: Download your notebook to you local computer to save your work.