# LangChain: Q&A over Documents

An example might be a tool that would allow you to query a product catalog for items of interest.

Das ist eine der populärsten Use Cases von Langchain. Wir werden eine Beauskunftung über ein Dokument machen, das nicht zum Training des Modells verwendet wurde. 

Hierzu werden wir Vektorembeddings und Vektordatenbank verwenden und damit zu den avancierteren Bausteinen von langchain übergehen.

In [11]:
pip install --upgrade langchain

Collecting langchain
  Obtaining dependency information for langchain from https://files.pythonhosted.org/packages/7a/7d/d7be01f6f02a1a7720b8953cfec14f945b02001842516c92c4169daf99f3/langchain-0.0.322-py3-none-any.whl.metadata
  Downloading langchain-0.0.322-py3-none-any.whl.metadata (15 kB)
Downloading langchain-0.0.322-py3-none-any.whl (1.9 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.9/1.9 MB[0m [31m22.4 MB/s[0m eta [36m0:00:00[0m00:01[0m
[?25hInstalling collected packages: langchain
  Attempting uninstall: langchain
    Found existing installation: langchain 0.0.321
    Uninstalling langchain-0.0.321:
      Successfully uninstalled langchain-0.0.321
Successfully installed langchain-0.0.322

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.2.1[0m[39;49m -> [0m[32;49m23.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may n

In [1]:
import os

from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file

Note: LLM's do not always produce the same results. When executing the code in your notebook, you may get slightly different answers that those in the video.

In [2]:
# account for deprecation of LLM model
import datetime
# Get the current date
current_date = datetime.datetime.now().date()

# Define the date after which the model should be set to "gpt-3.5-turbo"
target_date = datetime.date(2024, 6, 12)

# Set the model variable based on the current date
if current_date > target_date:
    llm_model = "gpt-3.5-turbo"
else:
    llm_model = "gpt-3.5-turbo-0301"

Hier nutzen wir eine leichtgewichtige Vektordatenbank, die im Arbeitsspeicher lebt und daher keine gesonderte Installation erfordert.

In [3]:
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI
from langchain.document_loaders import CSVLoader
from langchain.vectorstores import DocArrayInMemorySearch
from IPython.display import display, Markdown

In [15]:
file = 'OutdoorClothingCatalog_1000.csv'
loader = CSVLoader(file_path=file)

In [5]:
from langchain.indexes import VectorstoreIndexCreator

In [6]:
pip install docarray

Collecting docarray
  Obtaining dependency information for docarray from https://files.pythonhosted.org/packages/1e/50/89ebe5de8189fa049f3a36a1a2151a72de27ccddbb3786eeeee75388ab64/docarray-0.39.1-py3-none-any.whl.metadata
  Downloading docarray-0.39.1-py3-none-any.whl.metadata (36 kB)
Collecting orjson>=3.8.2 (from docarray)
  Obtaining dependency information for orjson>=3.8.2 from https://files.pythonhosted.org/packages/f4/66/d13cb488f5e74d603ebeb0b7b5032a81462d76d0b197a7b2002843ca1055/orjson-3.9.9-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata
  Downloading orjson-3.9.9-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (49 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m49.3/49.3 kB[0m [31m1.4 MB/s[0m eta [36m0:00:00[0m
Collecting rich>=13.1.0 (from docarray)
  Obtaining dependency information for rich>=13.1.0 from https://files.pythonhosted.org/packages/be/2a/4e62ff633612f746f88618852a626bbe24226eba5e7ac90e91dcfd6a414e/

In [16]:
index = VectorstoreIndexCreator(
    vectorstore_cls=DocArrayInMemorySearch
).from_loaders([loader])

In [17]:
query ="Please list all your shirts with sun protection \
in a table in markdown and summarize each one."

In [18]:
response = index.query(query)

ValidationError: 2 validation errors for DocArrayDoc
text
  Field required [type=missing, input_value={'embedding': [0.00331889..., -0.02122992287370924]}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.4/v/missing
metadata
  Field required [type=missing, input_value={'embedding': [0.00331889..., -0.02122992287370924]}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.4/v/missing

In [19]:
display(Markdown(response))

NameError: name 'response' is not defined

## Step By Step

In [None]:
from langchain.document_loaders import CSVLoader
loader = CSVLoader(file_path=file)

In [None]:
docs = loader.load()

In [None]:
docs[0]

In [None]:
from langchain.embeddings import OpenAIEmbeddings
embeddings = OpenAIEmbeddings()

In [None]:
embed = embeddings.embed_query("Hi my name is Harrison")

In [None]:
print(len(embed))

In [None]:
print(embed[:5])

In [None]:
db = DocArrayInMemorySearch.from_documents(
    docs, 
    embeddings
)

In [None]:
query = "Please suggest a shirt with sunblocking"

In [None]:
docs = db.similarity_search(query)

In [None]:
len(docs)

In [None]:
docs[0]

In [None]:
retriever = db.as_retriever()

In [None]:
llm = ChatOpenAI(temperature = 0.0, model=llm_model)

In [None]:
qdocs = "".join([docs[i].page_content for i in range(len(docs))])


In [None]:
response = llm.call_as_llm(f"{qdocs} Question: Please list all your \
shirts with sun protection in a table in markdown and summarize each one.") 


In [None]:
display(Markdown(response))

In [None]:
qa_stuff = RetrievalQA.from_chain_type(
    llm=llm, 
    chain_type="stuff", 
    retriever=retriever, 
    verbose=True
)

In [None]:
query =  "Please list all your shirts with sun protection in a table \
in markdown and summarize each one."

In [None]:
response = qa_stuff.run(query)

In [None]:
display(Markdown(response))

In [None]:
response = index.query(query, llm=llm)

In [None]:
index = VectorstoreIndexCreator(
    vectorstore_cls=DocArrayInMemorySearch,
    embedding=embeddings,
).from_loaders([loader])

Reminder: Download your notebook to you local computer to save your work.