# The basics
This jupyter notebook will make sure that you have everything you need to run langchain and explain a little bit the basics of interacting with GPT models.
Adapted from [LangChain's tutorial](https://python.langchain.com/v0.2/docs/tutorials/llm_chain/)

## Installing packages
We will use [LangChain](https://www.langchain.com/langchain) as our SDK to interact with different LLM's. It's abstractions concerning the different models as well as easy to plug-in vector DB's and adding "memory" to a use-case make it one of the best tools to prototype GenAI products. For that we need to make sure we have the right packages installed.

Quick note on jupyter-notebooks: 
* ctrl+enter will run the cell
* Any cell starting with `!` will me a command that you could also run in a terminal
* Feel free to modify the code inside them and play around with the results

In [None]:
!git clone https://github.com/ericksantillan-planday/chatbot-tutorial.git

In [None]:
!pip install -r chatbot-tutorial/requirements.txt

# Calling our GPT model
We will use a model deployed on Azure. The way that we interact with the model is through a POST request to a specific endpoint. This is how the request looks like:
```json
{
  "temperature": 1,
  "top_p": 1,
  "stream": false,
  "stop": null,
  "max_tokens": 4096,
  "presence_penalty": 0,
  "frequency_penalty": 0,
  "logit_bias": {},
  "user": "user-1234",
  "messages": [
    {}
  ],
  "data_sources": [
    {}
  ],
  "n": 1,
  "seed": 1,
  "response_format": {
    "type": "json_object"
  },
  "tools": [
    {
      "type": "function",
      "function": {
        "description": "string",
        "name": "string",
        "parameters": {
          "additionalProp1": {}
        }
      }
    }
  ],
  "tool_choice": "none",
  "functions": [
    {
      "name": "string",
      "description": "string",
      "parameters": {
        "additionalProp1": {}
      }
    }
  ],
  "function_call": "none"
}
```
As you can see, there are different properties and we won't go through all of them on this workshop. The most important one will be the `messages` properties which contains all the different messages from the user as well as the answers from the model.

At this point, you could use whatever you want to interact with the LLM, for example, postman or curl from your terminal. This is one examle:

```shell
curl https://YOUR_RESOURCE_NAME.openai.azure.com/openai/deployments/YOUR_DEPLOYMENT_NAME/chat/completions?api-version=2024-02-01 \
  -H "Content-Type: application/json" \
  -H "api-key: YOUR_API_KEY" \
  -d '{"messages":[{"role": "user", "content": "hello!"}]}'
```

Please ask the moderator for the endpoint, API key and all the needed information to run the model.

## Using LangChain
The easiest way to start interacting with the API with python is to use LangChain's [AzureChatOpenAI](https://api.python.langchain.com/en/latest/chat_models/langchain_openai.chat_models.azure.AzureChatOpenAI.html#langchain_openai.chat_models.azure.AzureChatOpenAI). This is object inherits from `ChatModels`. They are instances of LangChain "Runnables", which means they expose a standard interface for interacting with them. This allows us also to easily change of LLM without changing the code.

To just simply call the model, we can pass in a list of messages to the `.invoke()` method.

In [None]:
from langchain_openai import AzureChatOpenAI

In [None]:
azure_deployment=""
api_key=""
openai_api_version="2024-02-01"
azure_endpoint=""

In [None]:
gpt_35 = AzureChatOpenAI(
    azure_deployment=azure_deployment,
    api_key=api_key,
    openai_api_version=openai_api_version,
    azure_endpoint=azure_endpoint
)

In [None]:
gpt_35.invoke("hello!")

### Messages
We can also use messages to keep track of our inputs, and separate between `SystemMessage`, `HumanMessage` and `AIMessage`. For example:

In [None]:
from langchain_core.messages import HumanMessage, SystemMessage

In [None]:
messages = [
    SystemMessage(content="Translate the following from English into Italian"),
    HumanMessage(content="hi!"),
]

gpt_35.invoke(messages)

### OutputParsers
Notice that the response from the model is an AIMessage. This contains a string response along with other metadata about the response. Oftentimes we may just want to work with the string response. We can parse out just this response by using a simple output parser.

In [None]:
from langchain_core.output_parsers import StrOutputParser

In [None]:
parser = StrOutputParser()

result = gpt_35.invoke(messages)
parser.invoke(result)

More commonly, we can "chain" the model with this output parser. This means this output parser will get called everytime in this chain. This chain takes on the input type of the language model (string or list of message) and returns the output type of the output parser (string).

We can easily create the chain using the `|` operator. The `|` operator is used in LangChain to combine two elements together.

In [None]:
chain =  gpt_35 | parser

In [None]:
chain.invoke(messages)

### Prompt Templates
Right now we are passing a list of messages directly into the language model. Where does this list of messages come from? Usually, it is constructed from a combination of user input and application logic. This application logic usually takes the raw user input and transforms it into a list of messages ready to pass to the language model. Common transformations include adding a system message or formatting a template with the user input.

PromptTemplates are a concept in LangChain designed to assist with this transformation. They take in raw user input and return data (a prompt) that is ready to pass into a language model.

Let's create a PromptTemplate here. It will take in two user variables:

* `language`: The language to translate text into
* `text`: The text to translate

In [None]:
from langchain_core.prompts import ChatPromptTemplate

In [None]:
system_template = "Translate the following into {language}:"

prompt_template = ChatPromptTemplate.from_messages(
    [("system", system_template), ("user", "{text}")]
)

In [None]:
prompt_template.messages

The input to this prompt template is a dictionary (a python JSON if you want...). We can play around with this prompt template by itself to see what it does by itself

In [None]:
prompt_template.invoke({"language" : "french", "text": "hello!"})

We can now combine this with the model and the output parser from above. This will chain all three components together.

In [None]:
chain = prompt_template | gpt_35 | parser

chain.invoke({"language": "french", "text": "hi"})

# Exercise!

Make a chain that will take as an input a superhero and an animal and returns a a creative name for the new superhero-animal.

## What is RAG?
RAG is a technique for augmenting LLM knowledge with additional data.

LLMs can reason about wide-ranging topics, but their knowledge is limited to the public data up to a specific point in time that they were trained on. If you want to build AI applications that can reason about private data or data introduced after a model's cutoff date, you need to augment the knowledge of the model with the specific information it needs. The process of bringing the appropriate information and inserting it into the model prompt is known as Retrieval Augmented Generation (RAG).

LangChain has a number of components designed to help build Q&A applications, and RAG applications more generally.

A typical RAG application has two main components:
* **Indexing**: a pipeline for ingesting data from a source and indexing it. This usually happens offline.

* **Retrieval and generation**: the actual RAG chain, which takes the user query at run time and retrieves the relevant data from the index, then passes that to the model.

In this notebook we will focus on Indexing

Indexing has three main parts:
* Load: First we need to load our data. This is done with DocumentLoaders.
* Split: Text splitters break large Documents into smaller chunks. This is useful both for indexing data and for passing it in to a model, since large chunks are harder to search over and won't fit in a model's finite context window.
* Store: We need somewhere to store and index our splits, so that they can later be searched over. This is often done using a VectorStore and Embeddings model.

## Getting data
For this workshop we have already downloaded and preprocess some data for you. You can see it under `/docs`.

## Loading data
The data has been saved as a markdown file. We will use the DirectoryLoader. To see other type of loaders: [Document Loaders](https://python.langchain.com/v0.2/docs/how_to/#document-loaders). You may want to go ahead and try to load your own data ;)

In [None]:
from langchain_community.document_loaders import DirectoryLoader, TextLoader

In [None]:
loader = DirectoryLoader("chatbot-tutorial/docs/", loader_cls=TextLoader, glob="*.md")
docs = loader.load()
len(docs)

In [None]:
docs[0].page_content

# Indexing: Split
Our loaded documents are not too long, which makes this step *optional* but for larger documents it is mandatory as there can be  too long to fit in the context window of many models. Even for those models that could fit the full post in their context window, models can struggle to find information in very long inputs.

To handle this we’ll split the Document into chunks for embedding and vector storage. This should help us retrieve only the most relevant bits of the blog post at run time.

In this case we’ll split our documents into chunks of 1000 characters with 200 characters of overlap between chunks. The overlap helps mitigate the possibility of separating a statement from important context related to it. We use the RecursiveCharacterTextSplitter, which will recursively split the document using common separators like new lines until each chunk is the appropriate size. This is the recommended text splitter for generic text use cases.

We set `add_start_index=True` so that the character index at which each split Document starts within the initial Document is preserved as metadata attribute “start_index”.

In [None]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

In [None]:
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000, chunk_overlap=200, add_start_index=True
)
all_splits = text_splitter.split_documents(docs)

In [None]:
len(all_splits)

In [None]:
all_splits[0].metadata

In [None]:
all_splits[0].page_content

# Indexing: Store

Now we need to index our text chunks so that we can search over them at runtime. 

Several ways to perform this can be used, this is a simple exercise of Information Retrieval. You could use for example TF-IDF, Bag of words, etc... 

However, the most common way to do this is to embed the contents of each document split and insert these embeddings into a vector database (or vector store). When we want to search over our splits, we take a text search query, embed it, and perform some sort of “similarity” search to identify the stored splits with the most similar embeddings to our query embedding. The simplest similarity measure is cosine similarity — we measure the cosine of the angle between each pair of embeddings (which are high dimensional vectors).

Different Embeddings can be used, we will use some local embeddings for this using a simple model called `sentence-transformers/all-mpnet-base-v2`. We will use a library called HuggingFace which has become like the "Github" of ML models. You can see more [here](https://huggingface.co/models)

In [None]:
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import FAISS

In [None]:
embedding_model = HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2")

In [None]:
vectorstore = FAISS.from_documents(documents=all_splits, embedding=embedding_model)

In [None]:
# Save locally for next steps
vectorstore.save_local("vector_store")

# For the curious ones
If you want to see how a document is stored you can run the following command:

In [None]:
all_splits[0].page_content

In [None]:
len(all_splits)

In [None]:
vectorstore.index_to_docstore_id

In [None]:
vectorstore.index.reconstruct(0)

# RAG: putting everything together

In this notebook we will put all the building blocks together to have our own RAG application

## Bringing back all of our work from previous notebooks

In [None]:
from langchain_openai import AzureChatOpenAI
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

### Define 

In [None]:
# This should allow us to change easily the LLM used
llm=gpt_35

### Load our vectorsore and embeddings

In [None]:
embedding_model = HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2")
vectorstore = FAISS.load_local("vector_store", embeddings=embedding_model, allow_dangerous_deserialization=True)

## Retrieval and Generation: Retrieve
Now let’s write the actual application logic. We want to create a simple application that takes a user question, searches for documents relevant to that question, passes the retrieved documents and initial question to a model, and returns an answer.

First we need to define our logic for searching over documents. LangChain defines a Retriever interface which wraps an index that can return relevant Documents given a string query.

The most common type of [Retriever](https://python.langchain.com/v0.1/docs/modules/data_connection/retrievers/) is the VectorStoreRetriever, which uses the similarity search capabilities of a vector store to facilitate retrieval. Any VectorStore can easily be turned into a Retriever with `VectorStore.as_retriever()`:


In [None]:
retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 6})

In [None]:
query = "What are revenue units?"
retrieved_docs = retriever.invoke(query)

In [None]:
retrieved_docs[2]

## What's happening under the hood?
It may seem really obscur what one line of code is doing but it's really simple. It's a 4 step process:
1. The `query` is passed through our embedding model and gets transformed into a vector, let's called it `query_vector`
2. The `query_vector` is then compared to all the vectors in the vectorstore. Remember that those vectors in the vectorstore are just a mathematical representation of parts of the documents
3. We then take the vectors that are the most "similar" to our `query_vector`
4. We return a list with the documents that had the nearest distance to the `query`

# Retrieval and Generation: Generate
Let’s put it all together into a chain that takes a question, retrieves relevant documents, constructs a prompt, passes that to a model, and parses the output.

Let's start by defining the message we will send to the LLM

In [None]:
template = """Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
Use three sentences maximum and keep the answer as concise as possible.
Always say "thanks for asking!" at the end of the answer.

{context}

Question: {question}

Helpful Answer:"""
prompt = PromptTemplate.from_template(template)


In [None]:
example_messages = prompt.invoke(
    {"context": "filler context", "question": "filler question"}
).to_messages()
print(example_messages[0].content)

## Putting everything together

We will create a chain called `rag_chain` that will have only one input: the user's `question`.

The `question` be forked and passed through two different pipelines:
1. The retrieval pipeline, where the question will be compared to the documents inside the vectorstore using the `retriever` and its output will be appended usint the `format_docs` function. The output of this chain will be a string and be passed to `prompt` on the `context` property.
2. The `question` will be other property passed to the `prompt`.

Once the prompt is filled with context and the question, we will send it to the `llm`, and we will print out the outcome.

In [None]:
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

In [None]:
rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

In [None]:
rag_chain.invoke("what is a revenue unit?")

# Your turn!

Now it's up to you, here we propose some exercises for you to play with, feel free to mess around with it :)

# Exercise 1: Validation
Right now our agent can answer questions about Planday... but also about coding in python, or about the weather in Mexico. I think you can see how this can be abused... How can you put some guard rails to avoid it?

Maybe modify the prompt... maybe separate it into two prompts... who knows

The following prompt shouldn't be possible:

In [None]:
print(rag_chain.invoke("Write a python function that somes all fibonacci numbers between 1-18"))

In [None]:
# Your code:

# Exercise 2: Follow-up questions
Right now, our agent can answer questions about Planday. But if you ask a follow up question, it has no idea about what you were talking about as an LLM has **no memory**. The only way to provide it with memory is by somehow adding the past requests manually to the request. How could you do it...?

In [None]:
# Your code:

# Exercise 3: Cite your sources!
We know LLMs are prompt to hallucinate... how can you make it retourn the sources of where the knowledge came from?

Pssst: maybe you want to look into modifyin the `format_docs` function, although there are several ways of doing it

In [None]:
# Your code: