In [19]:
# !pip install faiss-cpu

## Indexes

In [20]:
from dotenv import load_dotenv, find_dotenv
import os
import openai 

load_dotenv(find_dotenv())
openai.api_key = os.environ['OPENAI_API_KEY']

#### Loaders

To use data with an LLM, document must first be loaded into a vector database. The first step is to load them into memory via a loader

In [21]:
from langchain.document_loaders import DirectoryLoader, TextLoader

loader = DirectoryLoader('./FAQ/', glob="**/*.txt", loader_cls=TextLoader, show_progress=True)
docs = loader.load()

100%|██████████| 3/3 [00:00<00:00, 83.80it/s]


### Text Splitter

Texts are not loaded 1:1 into the database, but in pieces, so called "chunks".
You can define the chunk size and overlap between the chunks.

In [22]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=100,
)

documents = text_splitter.split_documents(docs)
documents[0]

Document(page_content='Q: What are the hours of operation for your restaurant?\nA: Our restaurant is open from 11 a.m. to 10 p.m. from Monday to Saturday. On Sundays, we open at 12 p.m. and close at 9 p.m.\n\nQ: What type of cuisine does your restaurant serve?\nA: Our restaurant specializes in contemporary American cuisine with an emphasis on local and sustainable ingredients.', metadata={'source': 'FAQ\\General.txt'})

### Embeddings
Texts are not stored as text in the database, but as vector representations. Embedding are type of word representaion that represents the semantic meaning of words in a vector space.

In [23]:
from langchain.embeddings import OpenAIEmbeddings
embeddings = OpenAIEmbeddings(openai_api_key=openai.api_key)

### Loading vectors into VectorDB (FAISS)

As created by OpenAIEmbeddings vectors can now be stored in the database. The DB can be stored as .pkl file

In [24]:
from langchain.vectorstores.faiss import FAISS
import pickle

vectorstore = FAISS.from_documents(documents, embeddings)

with open("vectorstore.pkl","wb") as f:
    pickle.dump(vectorstore, f)

### Prompts
With an LLM you have the possibility to give it an identify before a convesation or to define how question and answer should look like.

In [25]:
from langchain.prompts import PromptTemplate

prompt_template = """You are helpful assistant of our restaurant.

{context}

Question: {question}
Answer here."""

PROMPT = PromptTemplate(
    template = prompt_template, input_variables=["context", "question"]
)

### Chains

With chain classes you can easily influence the behavior of the LLM

In [26]:
from langchain.llms import OpenAI
from langchain.chains import RetrievalQA

chain_type_kwargs = {"prompt":PROMPT}

llm = OpenAI(openai_api_key=openai.api_key)
qa = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", 
                                 retriever=vectorstore.as_retriever(),
                                 chain_type_kwargs=chain_type_kwargs)

query = "When does the restaurant open?"
qa.run(query)

' Our restaurant is open from 11 a.m. to 10 p.m. from Monday to Saturday. On Sundays, we open at 12 p.m. and close at 9 p.m.'

### Memory

In the example just shown, each reuqest stands alone. A greate strength of an LLM, however, is that it can take the entire that history into account when responding

In [27]:
from langchain.memory import ConversationBufferMemory

memory = ConversationBufferMemory(memory_keys='chat_history',
                                  return_message=True,
                                  output_key='answer')

### Use Memory in chains

The memory class can now easily be used in a chain. This is recognizable, for example, by the fact that when one speaks of "it", the bot understands the rabbit in this context.

In [28]:
from langchain.chains import ConversationalRetrievalChain

qa = ConversationalRetrievalChain.from_llm(
    llm=OpenAI(model_name="text-davinci-003", temperature=0.7, openai_api_key=openai.api_key),
    memory=memory,
    retriever=vectorstore.as_retriever(),
    combine_docs_chain_kwargs={"prompt": PROMPT},
)


query = "Do you offer vegan food?"
qa({"question": query})
qa({"question": "How much does it cost?"})

ValueError: Missing some input keys: {'chat_history'}