# RAG: putting everything together

In this notebook we will put all the building blocks together to have our own RAG application

## Bringing back all of our work from previous notebooks

In [None]:
from langchain_openai import AzureChatOpenAI
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

In [None]:
azure_deployment=""
api_key=""
openai_api_version="2024-02-01"
azure_endpoint=""

### Define 

In [None]:
gpt_35 = AzureChatOpenAI(
    azure_deployment=azure_deployment,
    api_key=api_key,
    openai_api_version=openai_api_version,
    azure_endpoint=azure_endpoint
)

llm=gpt_35

### Load our vectorsore and embeddings

In [None]:
embedding_model = HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2")
vectorstore = FAISS.load_local("vector_store", embeddings=embedding_model, allow_dangerous_deserialization=True)

## Retrieval and Generation: Retrieve
Now let’s write the actual application logic. We want to create a simple application that takes a user question, searches for documents relevant to that question, passes the retrieved documents and initial question to a model, and returns an answer.

First we need to define our logic for searching over documents. LangChain defines a Retriever interface which wraps an index that can return relevant Documents given a string query.

The most common type of [Retriever](https://python.langchain.com/v0.1/docs/modules/data_connection/retrievers/) is the VectorStoreRetriever, which uses the similarity search capabilities of a vector store to facilitate retrieval. Any VectorStore can easily be turned into a Retriever with `VectorStore.as_retriever()`:


In [None]:
retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 6})

In [None]:
query = "What are revenue units?"
retrieved_docs = retriever.invoke(query)

In [None]:
retrieved_docs[2]

## What's happening under the hood?
It may seem really obscur what one line of code is doing but it's really simple. It's a 4 step process:
1. The `query` is passed through our embedding model and gets transformed into a vector, let's called it `query_vector`
2. The `query_vector` is then compared to all the vectors in the vectorstore. Remember that those vectors in the vectorstore are just a mathematical representation of parts of the documents
3. We then take the vectors that are the most "similar" to our `query_vector`
4. We return a list with the documents that had the nearest distance to the `query`

# Retrieval and Generation: Generate
Let’s put it all together into a chain that takes a question, retrieves relevant documents, constructs a prompt, passes that to a model, and parses the output.

Let's start by defining the message we will send to the LLM

In [None]:
template = """Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
Use three sentences maximum and keep the answer as concise as possible.
Always say "thanks for asking!" at the end of the answer.

{context}

Question: {question}

Helpful Answer:"""
prompt = PromptTemplate.from_template(template)


In [None]:
example_messages = prompt.invoke(
    {"context": "filler context", "question": "filler question"}
).to_messages()
print(example_messages[0].content)

## Putting everything together

We will create a chain called `rag_chain` that will have only one input: the user's `question`.

The `question` be forked and passed through two different pipelines:
1. The retrieval pipeline, where the question will be compared to the documents inside the vectorstore using the `retriever` and its output will be appended usint the `format_docs` function. The output of this chain will be a string and be passed to `prompt` on the `context` property.
2. The `question` will be other property passed to the `prompt`.

Once the prompt is filled with context and the question, we will send it to the `llm`, and we will print out the outcome.

In [None]:
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

In [None]:
rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

In [None]:
rag_chain.invoke("what is a revenue unit?")

# Your turn!

Now it's up to you, here we propose some exercises for you to play with, feel free to mess around with it :)

# Exercise 1: Validation
Right now our agent can answer questions about Planday... but also about coding in python, or about the weather in Mexico. I think you can see how this can be abused... How can you put some guard rails to avoid it?

Maybe modify the prompt... maybe separate it into two prompts... who knows

The following prompt shouldn't be possible:

In [None]:
print(rag_chain.invoke("Write a python function that somes all fibonacci numbers between 1-18"))

In [None]:
# Your code:

# Exercise 2: Follow-up questions
Right now, our agent can answer questions about Planday. But if you ask a follow up question, it has no idea about what you were talking about as an LLM has **no memory**. The only way to provide it with memory is by somehow adding the past requests manually to the request. How could you do it...?

In [None]:
# Your code:

# Exercise 3: Cite your sources!
We know LLMs are prompt to hallucinate... how can you make it retourn the sources of where the knowledge came from?

Pssst: maybe you want to look into modifyin the `format_docs` function, although there are several ways of doing it

In [None]:
# Your code: