# RAG implementation using LangChain
In this notebook we will explore a simple RAG pipeline using LangChain framework. Having the knowledge on basic concepts that you gained from the intro notebook you should be able to fill in the gaps and run your first RAG pipeline. If you want to explore more check out the LangChain documentation with a [RAG Q&A example](https://python.langchain.com/docs/use_cases/question_answering).

## Loading documents
First, we need to load the relevant knowledge documents so the model can refer to them while answering the questions. I have prepared a couple of files with customer support policies that are located inside the `policies` directory. LangChain has a large number of document loaders available, for example you can load content of websites and remote storages. For more details refer to [documentation](https://python.langchain.com/docs/modules/data_connection/document_loaders/).

In this exercise we are going to use `DirectoryLoader` that parses the directory for files and uses `UnstructuredLoader` to load textual data. Files are found using the pattern matching for `txt` extension.

In [None]:
from langchain_community.document_loaders import DirectoryLoader

loader = DirectoryLoader(
    'policies', 
    glob="*.txt", 
    show_progress=True,
)

docs = loader.load()

print("Documents loaded:", len(docs))

In [None]:
# Preview content of a document
print(docs[0].page_content)

## Splitting documents
It is especially desirable to retrieve knowledge from enormous knowledge bases that are hard to traverse by humans. For example, imagine thousands of pages of legal documentation. Reading it would take long days for a single person. One of the limitations of LLMs are limited context windows which comes from the quadratic complexity of the [transformer attention layer](https://nlp.seas.harvard.edu/2018/04/03/attention.html). Because of that, long documents should be split into smaller, meaningful chunks of text. The split can't be done randomly, it would break the meaning of sentences and may cause loss of information. Thankfully, LangChain delivers a library of [text splitters](https://python.langchain.com/docs/modules/data_connection/document_transformers/) that you can use. In this exercise the policies are relatively short and can easily fit the context window. The default text splitter will leave them undivided. However, you can experiment with the `chunk_size` to see how the splitter slices the document into meaningful chunks of text.

In [None]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

# Split texts into chunks. Our documents are quite short so they won't be split. 
# To experiment with different settings uncomment the arguments to override default settings.
text_splitter = RecursiveCharacterTextSplitter(
    # chunk_size=1000,
    # chunk_overlap=20,
    # length_function=len,
)

documents = text_splitter.split_documents(docs)
print("Number of chunks:", len(documents))

## Initialize vector store
There are a number of vector databases supported by LangChain, ranging from Sklearn implementation to cloud based databases. For the full list of integrations refer to [documentation](https://python.langchain.com/docs/integrations/vectorstores). Here we are going to use [FAISS](https://python.langchain.com/docs/integrations/vectorstores/faiss) - Facebook AI Similarity Search, which is easy to install using python package manager. Create a vector store by passing documents and embedding models to the method.

In [None]:
from langchain_community.vectorstores import FAISS

embeddings = ### TODO: create embedding model ###

vector_store = FAISS.from_documents(### TODO: provide documetns and embedding model ###)

Vector store provides a method for similarity search out of the box. It is very easy to retrieve related documents.

In [None]:
retrieved = vector_store.similarity_search("I received wrong size of the item")
print("Retrieved documents:", len(retrieved))
print("Document content:", retrieved[0].page_content)

## RAG pipeline
Having all the pieces of the pipeline we can create a chain that takes a question and answers it given the knowledge from the policies. In the previous notebook you learned how to assemble components into a pipeline using the pipe operator `|`. Here we are going to use [helper functions](https://python.langchain.com/docs/modules/chains) provided by LangChain to compose complex RAG chain. 

- [`create_stuff_documents_chain`](https://api.python.langchain.com/en/latest/chains/langchain.chains.combine_documents.stuff.create_stuff_documents_chain.html#langchain.chains.combine_documents.stuff.create_stuff_documents_chain): This chain takes a list of documents and formats them all into a prompt, then passes that prompt to an LLM. It passes ALL documents, so you should make sure it fits within the context window the LLM you are using.
- [`create_retrieval_chain`](https://api.python.langchain.com/en/latest/chains/langchain.chains.retrieval.create_retrieval_chain.html#langchain.chains.retrieval.create_retrieval_chain): This chain takes in a user inquiry, which is then passed to the retriever to fetch relevant documents. Those documents (and original inputs) are then passed to an LLM to generate a response.

Your chat prompt template should take `{context}` and `{input}` fields. Having that, you can chain the prompt and the llm using the `create_stuff_documents_chain`.

In [None]:
from langchain_core.prompts import ChatPromptTemplate
from langchain.chains.combine_documents import create_stuff_documents_chain

llm = ### TODO: Create a model ###

prompt = ChatPromptTemplate.from_template("""
   ### TODO: create prompt template with `context` and `input` fields ###
""")

document_chain = create_stuff_documents_chain(### TODO: provide llm and prompt ###)
document_chain

Next, we will chain together the retriever (which is simply a wrapper around the vector store) and the combined document chain that you created above. It will make a chain that is able to retrieve relevant documents from the vector store and give the output for a given query. 

In [None]:
from langchain.chains import create_retrieval_chain

retriever = vector_store.as_retriever()
retrieval_chain = create_retrieval_chain(### TODO: provide retriever and document chain ###)
retrieval_chain

## Running the chain
The final chain implements a runnable interface as well. All you need to do is to provide your question as an input.

Some of the questions you can ask: 
- Accepted methods of payments
- Customer was charged twice
- Package was lost
- Order cancellation
- Item arrived damaged

In [None]:
response = retrieval_chain.invoke({"input": "### TODO: Insert your question ###"})
print(response["answer"])

## Further work
- Check if LLM is willing to give away your company secrets, ask it to tell something confidential
- Try using system prompt from the intro notebook to prevent model from going astray and perform only allowed actions - `ChatPromptTemplate.from_messages`
- To further improve the pipeline you can implement [memory mechanism](https://python.langchain.com/docs/use_cases/question_answering/chat_history) that holds previous conversation so you can ask follow-up questions!