# LangChain: Q&A over Documents

An example might be a tool that would allow you to query a product catalog for items of interest.

Das ist eine der populärsten Use Cases von Langchain. Wir werden eine Beauskunftung über ein Dokument machen, das nicht zum Training des Modells verwendet wurde. 

Hierzu werden wir Vektorembeddings und Vektordatenbank verwenden und damit zu den avancierteren Bausteinen von langchain übergehen.

In [1]:
pip install --upgrade langchain

Collecting langchain
  Downloading langchain-0.0.323-py3-none-any.whl.metadata (15 kB)
Collecting SQLAlchemy<3,>=1.4 (from langchain)
  Downloading SQLAlchemy-2.0.22-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (9.4 kB)
Collecting aiohttp<4.0.0,>=3.8.3 (from langchain)
  Downloading aiohttp-3.8.6-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (7.7 kB)
Collecting anyio<4.0 (from langchain)
  Downloading anyio-3.7.1-py3-none-any.whl.metadata (4.7 kB)
Collecting async-timeout<5.0.0,>=4.0.0 (from langchain)
  Downloading async_timeout-4.0.3-py3-none-any.whl.metadata (4.2 kB)
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain)
  Downloading dataclasses_json-0.6.1-py3-none-any.whl.metadata (24 kB)
Collecting jsonpatch<2.0,>=1.33 (from langchain)
  Downloading jsonpatch-1.33-py2.py3-none-any.whl.metadata (3.0 kB)
Collecting langsmith<0.1.0,>=0.0.43 (from langchain)
  Downloading langsmith-0.0.51-py3-none-any.whl.metadata (10 kB)
Collecting pyda

In [10]:
!pip install python-dotenv

Found existing installation: dotenv 0.0.5
Uninstalling dotenv-0.0.5:
  Would remove:
    /usr/local/python/3.10.8/bin/dotenv
    /usr/local/python/3.10.8/lib/python3.10/site-packages/dotenv-0.0.5.dist-info/*
    /usr/local/python/3.10.8/lib/python3.10/site-packages/dotenv/*
  Would not remove (might be manually added):
    /usr/local/python/3.10.8/lib/python3.10/site-packages/dotenv/cli.py
    /usr/local/python/3.10.8/lib/python3.10/site-packages/dotenv/ipython.py
    /usr/local/python/3.10.8/lib/python3.10/site-packages/dotenv/main.py
    /usr/local/python/3.10.8/lib/python3.10/site-packages/dotenv/parser.py
    /usr/local/python/3.10.8/lib/python3.10/site-packages/dotenv/py.typed
    /usr/local/python/3.10.8/lib/python3.10/site-packages/dotenv/variables.py
    /usr/local/python/3.10.8/lib/python3.10/site-packages/dotenv/version.py
Proceed (Y/n)? ^C
[31mERROR: Operation cancelled by user[0m[31m
[0mFound existing installation: python-dotenv 1.0.0
Uninstalling python-dotenv-1.0.0:
 

^C
[31mERROR: Operation cancelled by user[0m[31m

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.3[0m[39;49m -> [0m[32;49m23.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython -m pip install --upgrade pip[0m


In [8]:
import os

from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file

ImportError: cannot import name 'load_dotenv' from 'dotenv' (/usr/local/python/3.10.8/lib/python3.10/site-packages/dotenv/__init__.py)

Note: LLM's do not always produce the same results. When executing the code in your notebook, you may get slightly different answers that those in the video.

In [None]:
# account for deprecation of LLM model
import datetime
# Get the current date
current_date = datetime.datetime.now().date()

# Define the date after which the model should be set to "gpt-3.5-turbo"
target_date = datetime.date(2024, 6, 12)

# Set the model variable based on the current date
if current_date > target_date:
    llm_model = "gpt-3.5-turbo"
else:
    llm_model = "gpt-3.5-turbo-0301"

Hier nutzen wir eine leichtgewichtige Vektordatenbank, die im Arbeitsspeicher lebt und daher keine gesonderte Installation erfordert.

In [None]:
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI
from langchain.document_loaders import CSVLoader
from langchain.vectorstores import DocArrayInMemorySearch
from IPython.display import display, Markdown

In [None]:
file = 'OutdoorClothingCatalog_1000.csv'
loader = CSVLoader(file_path=file)

In [None]:
from langchain.indexes import VectorstoreIndexCreator

In [None]:
pip install docarray

In [None]:
index = VectorstoreIndexCreator(
    vectorstore_cls=DocArrayInMemorySearch
).from_loaders([loader])

In [None]:
query ="Please list all your shirts with sun protection \
in a table in markdown and summarize each one."

In [None]:
response = index.query(query)

In [None]:
display(Markdown(response))

## Step By Step

Embeddings capture the semantic meaning of a piece of text (rather than just referencing to this meaning)

![Embeddings](Embeddings.png)

The following image shows the vector DB during buildup time (embbeddings of snippets are stored together with the snippets in the vector DB) and during query time, the most relevant snippets are retrieved from the vector store.

![Vector DB](VectorDB.png)

In [None]:
from langchain.document_loaders import CSVLoader
loader = CSVLoader(file_path=file)

In [None]:
docs = loader.load()

In [None]:
docs[0]

In [None]:
from langchain.embeddings import OpenAIEmbeddings
embeddings = OpenAIEmbeddings()

In [None]:
embed = embeddings.embed_query("Hi my name is Harrison")

In [None]:
print(len(embed))

In [None]:
print(embed[:5])

In [None]:
db = DocArrayInMemorySearch.from_documents(
    docs, 
    embeddings
)

In [None]:
query = "Please suggest a shirt with sunblocking"

In [None]:
docs = db.similarity_search(query)

In [None]:
len(docs)

In [None]:
docs[0]

In [None]:
retriever = db.as_retriever()

In [None]:
llm = ChatOpenAI(temperature = 0.0, model=llm_model)

In [None]:
qdocs = "".join([docs[i].page_content for i in range(len(docs))])


In [None]:
response = llm.call_as_llm(f"{qdocs} Question: Please list all your \
shirts with sun protection in a table in markdown and summarize each one.") 


In [None]:
display(Markdown(response))

In [None]:
qa_stuff = RetrievalQA.from_chain_type(
    llm=llm, 
    chain_type="stuff", 
    retriever=retriever, 
    verbose=True
)

In [None]:
query =  "Please list all your shirts with sun protection in a table \
in markdown and summarize each one."

In [None]:
response = qa_stuff.run(query)

In [None]:
display(Markdown(response))

In [None]:
response = index.query(query, llm=llm)

In [None]:
index = VectorstoreIndexCreator(
    vectorstore_cls=DocArrayInMemorySearch,
    embedding=embeddings,
).from_loaders([loader])

Reminder: Download your notebook to you local computer to save your work.