## Building RAG Applications with LangChain

#### <div style= "font-family: Cambria; font-weight:bold; letter-spacing: 0px; color:white; font-size:180%; text-align:left;padding:3.0px; background: maroon; border-bottom: 8px solid black" > TABLE OF CONTENTS<br><div>
* [IMPORTS](#1)
* [INTRODUCTION](#2)
* [Prompt Templates](#3)
* [LLM Models](#4)
* [Retriever](#5)
* [Chains](#6)
    * [Prebuild Chains](#6a)
    * [Primitives](#6b)

* [Memory](#7)
* [Use Case Integration](#8)
* [PLANNED WAY FORWARD](#9) 

In [3]:
import langchain

# Prompts:
from langchain_core.prompts import (
    ChatPromptTemplate,
    HumanMessagePromptTemplate,
    PromptTemplate
)

from langchain_core.messages import SystemMessage

## LLMs:
from langchain_openai import OpenAI, ChatOpenAI
from langchain.llms import HuggingFaceHub

## Chains
from operator import itemgetter
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains import create_history_aware_retriever
from langchain_core.runnables import RunnableLambda, RunnablePassthrough, RunnableParallel

## Retriever
from langchain.vectorstores import Chroma
from langchain.embeddings.sentence_transformer import SentenceTransformerEmbeddings
from langchain.schema import Document

## Memory
from langchain_core.chat_history import BaseChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory



<a id="2"></a>
# <div style= "font-family: Cambria; font-weight:bold; letter-spacing: 0px; color: white; font-size:120%; text-align:left;padding:3.0px; background: maroon; border-bottom: 8px solid black" > INTRODUCTION<br><div>

This tutorial is only meant to be a general overview about how to use LangChain Framework to build LLMs applications. We are only going to focus in the subset of functionalities that implement the LCEL capabilities. Also, the aim of this tutorial is to group the main components of these solutions into a single notebook. To this end, it is mainly based on the documentation and examples provided by Langchain  with own comments ([Langchain Docs](https://python.langchain.com/v0.2/docs/concepts/#components)), and is not presented as an own creation.

<a id="3"></a>
# <div style= "font-family: Cambria; font-weight:bold; letter-spacing: 0px; color: white; font-size:120%; text-align:left;padding:3.0px; background: maroon; border-bottom: 8px solid black" > Prompt Templates<br><div>

The first point when building an application around a LLM is, indeed, the prompts, which are the text inputs that we send to the model.
Following the line of the introduction, we are only going to talk about the functionalities that this library provides around these.

As we pointed in the introduction, the main goal of the LCEL framework definition, is to be able to abstract complex interactions and chains, into a simple sequence of steps. For that methods that are part of a chain should be objects of the [Runnable Interface](https://python.langchain.com/v0.1/docs/expression_language/interface/), which implements object with a set of define methods, as invoke (to call the chain with an input)

In [4]:
prompt_template = PromptTemplate.from_template(
    "Tell me a joke about {topic}."
)


# Here the invoke method just replaces the placeholders
# If there is only one input the function identifies it with a string input
prompt_template.invoke("Topic")




StringPromptValue(text='Tell me a joke about Topic.')

Here the invoke method just replaces the placeholders. If there is only one input the function identifies it with a string input. 

But for cases were we have more than one variable we need to input a dictionary:

In [5]:

prompt_template_2_slots = PromptTemplate.from_template(
    "Tell me a {adjetive} joke about {topic}."
)

prompt_template_2_slots.invoke({'adjetive':"smart", "topic": "Football"})

StringPromptValue(text='Tell me a smart joke about Football.')

The other Prompt Template that implements the Interface is the ChatPromptTemplate, which is simmilar to the previous but thought for Chat applications:

In [6]:
chat_template = ChatPromptTemplate.from_messages(
    [
        SystemMessage(
            content=(
                "You are a helpful assistant that re-writes the user's text to "
                "sound more upbeat."
            )
        ),
        HumanMessagePromptTemplate.from_template("{text}"),
    ]
)

chat_template.invoke({"text": "i dont like eating tasty things."})

ChatPromptValue(messages=[SystemMessage(content="You are a helpful assistant that re-writes the user's text to sound more upbeat."), HumanMessage(content='i dont like eating tasty things.')])

As we said before, whenever we only have one possible variable input, it accepts string inputs, but this is not the recommended usage, because could lead to mistakes when adding more variables

In [7]:
chat_template.invoke("I dont like eating tasty things.")

ChatPromptValue(messages=[SystemMessage(content="You are a helpful assistant that re-writes the user's text to sound more upbeat."), HumanMessage(content='I dont like eating tasty things.')])

This Runnable is most likely going to be the entrypoint for most of out applications, but its usage it's quite straightforward, so we don't need to lose more time on it.

<a id="4"></a>
# <div style= "font-family: Cambria; font-weight:bold; letter-spacing: 0px; color: white; font-size:120%; text-align:left;padding:3.0px; background: maroon; border-bottom: 8px solid black" > LLM Models<br><div>
The most important component of any LLM integration system, is, as it's logic, the model being used, here we are going to show how to integrate several models inside of our chain.

## Let's start with native integrations with LangChain

In [8]:
# We need to have our key in the environment variables, as explained in README
llm = OpenAI()

We can see now that this instance of the object already has all necessary methods to be integrated in a chain (it is a Runnable Object).

In [9]:
llm.invoke("Hello")

'\n\nI have a problem with the stl files, when I import them into cura they are very small, I have to enlarge them by 1000% and then when I print them they are still very small.  \nDo you know a solution?\n\nThank you\n\nGreetings\n\nErnst\n\nHello Ernst,\n\nThank you for your question.\n\nThis is a common issue that can happen when stl files are not exported properly. It is possible that the stl files you are using have been exported in millimeters instead of meters, which is the standard unit for stl files. This can cause the files to appear very small when imported into Cura.\n\nOne solution is to scale the stl files by 1000% as you have been doing. However, this can result in low quality prints as the stl files were not designed to be printed at such a large scale.\n\nAnother solution is to re-export the stl files in the correct unit of measurement. Many 3D modeling software programs have the option to export stl files in different units, so make sure to double check the settings b

## Now we can Join our first two steps to see the results

In [10]:
first_chain = prompt_template_2_slots | llm

This is the syntax to be used to concatenate two Runnables to create a chain, were the output of the first element will serve as input for the second

In [11]:
first_chain.invoke({'adjetive':"smart", "topic": "Football"})

'\n\nWhy did the football coach go to the bank?\n\nTo get his quarterback!'

As we can see the resulting object has the invoke property as the original ones, and it takes as inputs the same inputs that the prompt template.

Let's now show a different LLM, as the ones that the HuggingFace API provides [Get HuggingFace Key](https://huggingface.co/docs/hub/security-tokens) (there are several free models):

In [12]:
llm_hugging_face = HuggingFaceHub(repo_id = "google/flan-t5-large")

  warn_deprecated(
  from .autonotebook import tqdm as notebook_tqdm


In [13]:
first_chain = prompt_template_2_slots | llm_hugging_face
first_chain.invoke({'adjetive':"smart", "topic": "Football"})

'football is a game of skill'

## What if we need to use an LLM that has not yet been wrapped by LangChain?
We still can define a class that implements the same interface around any LLM.

<a id="5"></a>
# <div style= "font-family: Cambria; font-weight:bold; letter-spacing: 0px; color: white; font-size:120%; text-align:left;padding:3.0px; background: maroon; border-bottom: 8px solid black" >Retriever<br><div>
In order the be able to further contextualize and implement 'Complex' Chains, we are going to introduce already here the context of Retriever. In general, this can be defined as the step of 'Retrieving' additional 'external' data that the application needs, in order to contextualize the process. The Most popular retrievers are those who interact with either Vector Stores, retrieving a set of documents by simmilarity or Knowledge graphs, generating a query for this based on the wanted information. As you might begin to guess, langchain provides multiple 'Built-in' methods to communicate with the most representative services, but more importantly, it gives the bases to create a solution following a defined schema. In that sense we could also implement a 'Custom' Retriever, following the definition of this Base class ([Custom Retriever](https://python.langchain.com/v0.1/docs/modules/data_connection/retrievers/custom_retriever/)).
Here We will not go into this component in detail, and will limit to a already existing integration, to create a basic retriever.

One of the key concepts for the Vector Database retrieving process, is that, we are going to store textual data, in a numerical high dimensional representation (Vectors), using for it a Embedding pretrained model. Therefore, for specific use cases, where the text belongs to a subdomain (Legal data, medical data, etc...), may be highly convenient to try to addapt the model used to those requirements.

In [14]:
# Initialize the pre-trained model for generating embeddings
model_name = "sentence-transformers/all-MiniLM-L6-v2"
embedding_model = SentenceTransformerEmbeddings(model_name=model_name)


# Sample public data: A list of short Wikipedia summaries
texts = [
    "The Eiffel Tower is a wrought-iron lattice tower on the Champ de Mars in Paris, France.",
    "The Mona Lisa is a half-length portrait painting by the Italian artist Leonardo da Vinci.",
    "The Great Wall of China is a series of fortifications that were built across the historical northern borders of China."
]

# Create Document objects from the texts
docs = [Document(page_content=text) for text in texts]

# load it into Chroma
db = Chroma.from_documents(docs, embedding_model, persist_directory="./chroma_db")



print("Chroma vector database has been populated successfully.")

  warn_deprecated(


Chroma vector database has been populated successfully.


With this code, we have saved out Vector Database to a local directory, now we can reload it to use that database instance as our retriever

In [15]:
from langchain.vectorstores import Chroma
from langchain.embeddings.sentence_transformer import SentenceTransformerEmbeddings
# load from disk
# Initialize the pre-trained model for generating embeddings
model_name = "sentence-transformers/all-MiniLM-L6-v2"
embedding_model = SentenceTransformerEmbeddings(model_name=model_name)
db3 = Chroma(persist_directory="./chroma_db", embedding_function=embedding_model)

  warn_deprecated(


Langchain already builds this client, so it has a functionality implemented with the Retriever format:

In [16]:
db3.as_retriever(search_kwargs= {'k':2}).invoke("What is La Mona Lisa?")

[Document(page_content='The Mona Lisa is a half-length portrait painting by the Italian artist Leonardo da Vinci.'),
 Document(page_content='The Mona Lisa is a half-length portrait painting by the Italian artist Leonardo da Vinci.')]

<a id="6"></a>
# <div style= "font-family: Cambria; font-weight:bold; letter-spacing: 0px; color: white; font-size:120%; text-align:left;padding:3.0px; background: maroon; border-bottom: 8px solid black" >Chains<br><div>


Even if we have introduced it earlier, here we will go deeper in the concept of chain, the main feature of LangChain. A Chain is the sequence of calls to a set of functionalities (LLMs, tools...). As we mentioned, the prefered way to deal with Chans, and build pipelines using LangChain framework, is by using their LangChain Expression Language ([LCEL](https://python.langchain.com/v0.1/docs/expression_language/)). In this section we are going to describe from Preexisting chains, to how to create a custom Runnable to add to a chain.

<a id="6a"></a>
# <div style= "font-family: Cambria; font-weight:bold; letter-spacing: 0px; color: white; font-size:100%; text-align:left;padding:2.0px; background: maroon; border-bottom: 6px solid black" >Existing Chains<br><div>

As, we said, here, and following recommendations from LangChain Documentation itself [Chains](https://python.langchain.com/v0.1/docs/modules/chains/), we will only focus in LCEL prebuilt chains, and show the funtionality of some of them. 

* [create_stuff_documents_chain](https://api.python.langchain.com/en/latest/chains/langchain.chains.combine_documents.stuff.create_stuff_documents_chain.html): This chain is meant specifically to improve and format the retrieved content in RAG applications. It allows an LLM, a prompt, an Output parser, and a Document Prompt (to format each document separately), and retrieves a Runnable that, given a set of Documents, formats them to the specified format.


In [17]:
prompt = ChatPromptTemplate.from_messages(
    [("system", "What are everyone's favorite colors:\n\n{context}")]
)
llm = ChatOpenAI(model="gpt-3.5-turbo")
chain = create_stuff_documents_chain(llm, prompt)

docs = [
    Document(page_content="Jesse loves red but not yellow"),
    Document(page_content = "Jamal loves green but not as much as he loves orange")
]

chain.invoke({"context": docs})

'\n\nJesse: Red\nJamal: Orange, Green'

This is really usefull, as we will see, in RAG cases, to not send the retrieved information without any structure or coherence, between them, and with the goal of the Main Prompt task. Another prebuild chain that would be really usefull is:

[create_history_aware_retriever](https://api.python.langchain.com/en/latest/chains/langchain.chains.history_aware_retriever.create_history_aware_retriever.html#langchain.chains.history_aware_retriever.create_history_aware_retriever): This chain modifies the query used to retrieve 'Documents' from a retriever, based in the conversation history. This would be really usefull for a 'Chatbot' Style assistant, so the retrieved information takes into consideration all the previous conversation, and not only the last query from the user. To fully test this functionality we will have to wait until we get into the 'Retriever' and Memory Sections of this Tutorial.

<a id="6b"></a>
# <div style= "font-family: Cambria; font-weight:bold; letter-spacing: 0px; color: white; font-size:100%; text-align:left;padding:2.0px; background: maroon; border-bottom: 6px solid black" >Primitives<br><div>

Following LangChain Naming convention, here we are going to explain how we can include any functionality that our Use case could require, using functionalities that allow to extend the Chain functionalities, by creating Runnables, passing information (fields) unaltered to the next step, execute steps in parallel, among others...

In [18]:
# Visualize that this prompt defines our problem:

multiple_variables_prompt = PromptTemplate.from_template(
    "This is a simple matematical operator, given a Value: {x}, another Value {y}, and {z}, calculate: x*y, x-y, z*x"
)

numerical_chain = multiple_variables_prompt | llm

Our entry to this prompt, and the consequent chain would be a dictionary with three keys (x,y,z):

In [19]:
numerical_chain.invoke({"x":-3,"y":6,"z":9})

AIMessage(content='x*y = -3 * 6 = -18\nx-y = -3 - 6 = -9\nz*x = 9 * -3 = -27', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 35, 'prompt_tokens': 42, 'total_tokens': 77}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-ddf698c1-63bf-4d0b-b1f0-25305e88a03a-0', usage_metadata={'input_tokens': 42, 'output_tokens': 35, 'total_tokens': 77})

But, what if, by any reason, the x specified in the prompt in not the same x that the one we want to use in the equation of the chain ? For instance, if we want x to be always positive, but we can receive negative values, we should apply the absulute value. 

In [20]:
def absolute_value(x):
    return abs(x)

In [21]:

abs_value_chain = (
    {
        "x": itemgetter("x") | RunnableLambda(absolute_value),
        "y": itemgetter("y") | RunnablePassthrough(),
        "z": itemgetter("z") | RunnablePassthrough()
    }
    | multiple_variables_prompt
    | llm
)


We see here both the concept of RunnableLambda:  A method that converts a custom function into a Runnable, and RunnablePassthrough, a Runnable that gets and input and forwards it to the next step in the chain

In [22]:
abs_value_chain.invoke({"x":-3,"y":6,"z":9})

AIMessage(content='x*y = 3 * 6 = 18\nx-y = 3 - 6 = -3\nz*x = 9 * 3 = 27', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 35, 'prompt_tokens': 42, 'total_tokens': 77}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-aad931d1-d1be-43f3-9094-6e77278a8616-0', usage_metadata={'input_tokens': 42, 'output_tokens': 35, 'total_tokens': 77})

*RunnableParallel*: We have just used this functionality, is nothing more than a 'dictionary' of Runnable values, that are applied in parallel. In the previous example we ruuned a operation for each key (x,y,z) in parallel. The following to notations are equivalent:

In [23]:
abs_value_chain = (
    {
        "x": itemgetter("x") | RunnableLambda(absolute_value),
        "y": itemgetter("y") | RunnablePassthrough(),
        "z": itemgetter("z") | RunnablePassthrough()
    }
    | multiple_variables_prompt
    | llm
)


abs_value_chain_parallel = (
    RunnableParallel(
        x= (itemgetter("x") | RunnableLambda(absolute_value)),
        y= (itemgetter("y") | RunnablePassthrough()),
        z= (itemgetter("z") | RunnablePassthrough())
    )
    | multiple_variables_prompt
    | llm

)
abs_value_chain_parallel.invoke({"x":-3,"y":6,"z":9})

AIMessage(content='x*y = 3*6 = 18\nx-y = 3-6 = -3\nz*x = 9*3 = 27', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 32, 'prompt_tokens': 42, 'total_tokens': 74}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-45a98804-4be6-4236-a18d-66ca7d84dc40-0', usage_metadata={'input_tokens': 42, 'output_tokens': 32, 'total_tokens': 74})

<a id="7"></a>
# <div style= "font-family: Cambria; font-weight:bold; letter-spacing: 0px; color: white; font-size:120%; text-align:left;padding:3.0px; background: maroon; border-bottom: 8px solid black" >Memory<br><div>


The last fundamental component, for some of the potential applications (specially chatbots) is the memory. Similar as what happened with the Chains, Langchain has several Legacy memory handling functionalities, tbat do not follow the LCEL schema, and are not recommended for production ready use cases [Memory Handling](https://python.langchain.com/v0.1/docs/modules/memory/). Therefore, we will focus here only in the *ChatMessageHistory*, the only one that is recommended for production ready scenarios.

In [24]:
from langchain.memory import ChatMessageHistory
history = ChatMessageHistory()
history.add_user_message("hi!")
history.add_ai_message("whats up?")

In [25]:
history.messages

[HumanMessage(content='hi!'), AIMessage(content='whats up?')]

As we can see this is just a really simple and straightforward Class, that implements several methods to store and retrieve messages, and we can access them throught an attribute. It would be completely feasible to handle ourself the message recording and tracking at runtime, but there are already prebuild (and probably more consistent) chains, that are build around this object, and take care of the Memory handling:

In [26]:
# We need to use a base chain:
prompt_template = PromptTemplate.from_template(
    """You are an art museum guide, your main objective is to provide information about paintings and artistic styles. You should consider the current history of the conversation:
    {history} 
    User Question
    {question}."""
)



memory_chain = prompt_template | llm


store = {}


def get_session_history(session_id: str) -> BaseChatMessageHistory:
    if session_id not in store:
        store[session_id] = ChatMessageHistory()
    return store[session_id]


with_message_history = RunnableWithMessageHistory(
    memory_chain,
    get_session_history,
    input_messages_key="question",
    # output_messages_key="output_message",
    history_messages_key="history",
)

In [27]:
with_message_history.invoke({"question":"Hello, can you tell me the style of the Mona Lisa?"},
                            config={"configurable": {"session_id": "baz"}})

AIMessage(content="Art Museum Guide Response\nOf course! The Mona Lisa, painted by Leonardo da Vinci in the early 16th century, is an iconic example of the High Renaissance style. This style is characterized by a focus on realistic representation, harmonious composition, and a sense of balance and proportion. Leonardo da Vinci's use of sfumato, a technique that creates soft, gradual transitions between colors and tones, is particularly evident in the Mona Lisa, giving her enigmatic smile and lifelike appearance.", additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 100, 'prompt_tokens': 58, 'total_tokens': 158}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-b33af289-3c45-4e93-ac3e-47c99950c961-0', usage_metadata={'input_tokens': 58, 'output_tokens': 100, 'total_tokens': 158})

In [28]:
with_message_history.invoke({"question":"What did you Just said?"},
                            config={"configurable": {"session_id": "baz"}})

AIMessage(content="Art Museum Guide Response\nOf course! The Mona Lisa, painted by Leonardo da Vinci in the early 16th century, is an iconic example of the High Renaissance style. This style is characterized by a focus on realistic representation, harmonious composition, and a sense of balance and proportion. Leonardo da Vinci's use of sfumato, a technique that creates soft, gradual transitions between colors and tones, is particularly evident in the Mona Lisa, giving her enigmatic smile and lifelike appearance.", additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 100, 'prompt_tokens': 299, 'total_tokens': 399}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-7363a676-fc07-4aad-b853-ed53c94d755a-0', usage_metadata={'input_tokens': 299, 'output_tokens': 100, 'total_tokens': 399})

In [29]:
with_message_history.get_session_history(session_id="baz")

InMemoryChatMessageHistory(messages=[HumanMessage(content='Hello, can you tell me the style of the Mona Lisa?'), AIMessage(content="Art Museum Guide Response\nOf course! The Mona Lisa, painted by Leonardo da Vinci in the early 16th century, is an iconic example of the High Renaissance style. This style is characterized by a focus on realistic representation, harmonious composition, and a sense of balance and proportion. Leonardo da Vinci's use of sfumato, a technique that creates soft, gradual transitions between colors and tones, is particularly evident in the Mona Lisa, giving her enigmatic smile and lifelike appearance.", additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 100, 'prompt_tokens': 58, 'total_tokens': 158}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-b33af289-3c45-4e93-ac3e-47c99950c961-0', usage_metadata={'input_tokens': 58, 'output_tokens': 100, 'total_toke

<a id="8"></a>
# <div style= "font-family: Cambria; font-weight:bold; letter-spacing: 0px; color: white; font-size:120%; text-align:left;padding:3.0px; background: maroon; border-bottom: 8px solid black" >Use Case Integration<br><div>



To finish this tutorial, we are going to develop an end to end real arquitecture for the following use case requirements:

An Art Museum Guide, that needs to remember previous questions, needs to answer with more focus in the context provided, and should only remember the last 2 messages.

In [30]:
# First step would be to define a prompt, with placeholders for all the information that we want to send to the LLM

prompt_museum_guide =  PromptTemplate.from_template("""
You are the guide of an Art museum, you should answer to people questions, about Paints and
artistic styles, focusing you answers in the usage of the provided context always if that is possible, and considering what has been already told in the conversation.

Context: {context}

Current Conversation: {history}

User question: {question}
""")


# Now that we have in mind the fields that we need to provide to our prompt, we can define the components that are going to provide them, first one would be the retriever:
retriever = db3.as_retriever(search_kwargs={'k': 2})

# Now the LLM that we will use:
llm = ChatOpenAI()

# Lets add this together already to check the behaviour:

first_chain = {'context':retriever,'question':RunnablePassthrough(),'history':RunnablePassthrough()} | prompt_museum_guide | llm


first_chain.invoke("What style is La Mona Lisa ?")





AIMessage(content='Guide: The Mona Lisa is painted in the style of the Italian Renaissance, which was a period of great artistic and cultural achievement in Italy. Leonardo da Vinci, the artist who painted the Mona Lisa, was a prominent figure in the Renaissance movement, known for his attention to detail, use of perspective, and realistic portrayal of the human form.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 68, 'prompt_tokens': 127, 'total_tokens': 195}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-ab7b0d8a-df91-48c9-af39-9f2300f72340-0', usage_metadata={'input_tokens': 127, 'output_tokens': 68, 'total_tokens': 195})

To get a more detailed view of the different steps that are happening in our chain we can set debug to true:

In [31]:
langchain.debug=True
first_chain.invoke("What style is La Mona Lisa ?")

[32;1m[1;3m[chain/start][0m [1m[chain:RunnableSequence] Entering Chain run with input:
[0m{
  "input": "What style is La Mona Lisa ?"
}
[32;1m[1;3m[chain/start][0m [1m[chain:RunnableSequence > chain:RunnableParallel<context,question,history>] Entering Chain run with input:
[0m{
  "input": "What style is La Mona Lisa ?"
}
[32;1m[1;3m[chain/start][0m [1m[chain:RunnableSequence > chain:RunnableParallel<context,question,history> > chain:RunnablePassthrough] Entering Chain run with input:
[0m{
  "input": "What style is La Mona Lisa ?"
}
[36;1m[1;3m[chain/end][0m [1m[chain:RunnableSequence > chain:RunnableParallel<context,question,history> > chain:RunnablePassthrough] [0ms] Exiting Chain run with output:
[0m{
  "output": "What style is La Mona Lisa ?"
}
[32;1m[1;3m[chain/start][0m [1m[chain:RunnableSequence > chain:RunnableParallel<context,question,history> > chain:RunnablePassthrough] Entering Chain run with input:
[0m{
  "input": "What style is La Mona Lisa ?"
}
[

AIMessage(content='Guide: La Mona Lisa is painted in the style of Renaissance art, which was a period of great cultural and artistic achievement in Europe, particularly in Italy. Leonardo da Vinci, the artist who painted the Mona Lisa, was a key figure in the Renaissance art movement, known for his attention to detail and realistic portrayal of the human form.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 67, 'prompt_tokens': 127, 'total_tokens': 194}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-38345606-466a-40a7-b5bb-92a9f7c4e80e-0', usage_metadata={'input_tokens': 127, 'output_tokens': 67, 'total_tokens': 194})

## Now we wrap this chain inside of the History chain

In [32]:
with_message_history = RunnableWithMessageHistory(
    itemgetter("input") | first_chain,
    get_session_history,
    input_messages_key="input",
    history_messages_key="history"
)

In [33]:
with_message_history.invoke({ "input": "What is the Mona Lisa?"},
    config={"configurable": {"session_id": "abc123"}}) 

[32;1m[1;3m[chain/start][0m [1m[chain:RunnableWithMessageHistory] Entering Chain run with input:
[0m{
  "input": "What is the Mona Lisa?"
}
[32;1m[1;3m[chain/start][0m [1m[chain:RunnableWithMessageHistory > chain:insert_history] Entering Chain run with input:
[0m{
  "input": "What is the Mona Lisa?"
}
[32;1m[1;3m[chain/start][0m [1m[chain:RunnableWithMessageHistory > chain:insert_history > chain:RunnableParallel<history>] Entering Chain run with input:
[0m{
  "input": "What is the Mona Lisa?"
}
[32;1m[1;3m[chain/start][0m [1m[chain:RunnableWithMessageHistory > chain:insert_history > chain:RunnableParallel<history> > chain:load_history] Entering Chain run with input:
[0m{
  "input": "What is the Mona Lisa?"
}
[36;1m[1;3m[chain/end][0m [1m[chain:RunnableWithMessageHistory > chain:insert_history > chain:RunnableParallel<history> > chain:load_history] [0ms] Exiting Chain run with output:
[0m{
  "output": []
}
[36;1m[1;3m[chain/end][0m [1m[chain:RunnableWithMess

AIMessage(content='The Mona Lisa is a half-length portrait painting by the Italian artist Leonardo da Vinci. It is one of the most famous and iconic paintings in the world, known for its enigmatic expression and masterful use of technique. Leonardo da Vinci is considered one of the greatest artists of the Renaissance period, known for his attention to detail and realistic depiction of his subjects.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 72, 'prompt_tokens': 125, 'total_tokens': 197}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-f1a3d52f-85cc-4c0d-8fc1-af2eed7f1ff6-0', usage_metadata={'input_tokens': 125, 'output_tokens': 72, 'total_tokens': 197})

In [34]:
with_message_history.invoke({ "input": "What?"},
    config={"configurable": {"session_id": "abc123"}}) 


[32;1m[1;3m[chain/start][0m [1m[chain:RunnableWithMessageHistory] Entering Chain run with input:
[0m{
  "input": "What?"
}
[32;1m[1;3m[chain/start][0m [1m[chain:RunnableWithMessageHistory > chain:insert_history] Entering Chain run with input:
[0m{
  "input": "What?"
}
[32;1m[1;3m[chain/start][0m [1m[chain:RunnableWithMessageHistory > chain:insert_history > chain:RunnableParallel<history>] Entering Chain run with input:
[0m{
  "input": "What?"
}
[32;1m[1;3m[chain/start][0m [1m[chain:RunnableWithMessageHistory > chain:insert_history > chain:RunnableParallel<history> > chain:load_history] Entering Chain run with input:
[0m{
  "input": "What?"
}
[36;1m[1;3m[chain/end][0m [1m[chain:RunnableWithMessageHistory > chain:insert_history > chain:RunnableParallel<history> > chain:load_history] [1ms] Exiting Chain run with output:
[0m[outputs]
[36;1m[1;3m[chain/end][0m [1m[chain:RunnableWithMessageHistory > chain:insert_history > chain:RunnableParallel<history>] [4ms] Ex

AIMessage(content="Guide: I apologize for any confusion. I was just mentioning that the Mona Lisa is a famous half-length portrait painting created by Leonardo da Vinci. Leonardo da Vinci was an Italian artist known for his remarkable artworks during the Renaissance period. The Mona Lisa is considered a masterpiece due to its intricate details and mysterious smile. If you have any specific questions about the painting or Leonardo da Vinci's artistic style, feel free to ask.", additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 84, 'prompt_tokens': 117, 'total_tokens': 201}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-1545fa75-1f4d-48cc-b0a1-03f908e716db-0', usage_metadata={'input_tokens': 117, 'output_tokens': 84, 'total_tokens': 201})

Now we have our chain, that retrieves documents and then uses also the history to feed the LLM and generate a more logical and up to date answer, but this is still quite raw, there are many things that could be optimised, let's tackle two:

In [35]:
rephrase_prompt =PromptTemplate.from_template("""Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question.

Chat History:
{chat_history}
Follow Up Input: {input}
Standalone Question:""")


Prompt template extracte from langchain hub: langchain-ai/chat-langchain-rephrase

In [36]:
llm = ChatOpenAI()
chat_retriever_chain = create_history_aware_retriever(
    llm, retriever, rephrase_prompt
)

In [37]:
rephrase_prompt

PromptTemplate(input_variables=['chat_history', 'input'], template='Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question.\n\nChat History:\n{chat_history}\nFollow Up Input: {input}\nStandalone Question:')

Now we can replace this inside of the real retriever, so the retriever will be used with a new rephrased sentence, considering both the current question and the previous context

In [38]:

# We need to change the prompt template fields
prompt_museum_guide =  PromptTemplate.from_template("""
You are the guide of an Art museum, you should answer to people questions, about Paints and
artistic styles, focusing you answers in the usage of the provided context always if that is possible, and considering what has been already told in the conversation.

Context: {context}

Current Conversation: {chat_history}

User question: {question}
""")

# Lets add this together already to check the behaviour:
first_chain = {'context':chat_retriever_chain,'question':RunnablePassthrough(),'chat_history':RunnablePassthrough()} | prompt_museum_guide | llm





In [39]:
with_message_history = RunnableWithMessageHistory(
    {'chat_history': RunnablePassthrough(), "input": RunnablePassthrough()  } | first_chain,
    get_session_history,
    input_messages_key="input",
    history_messages_key="chat_history"
)

In [40]:
with_message_history.invoke({ "input": "What is the Mona Lisa?"},
    config={"configurable": {"session_id": "abc123"}})

[32;1m[1;3m[chain/start][0m [1m[chain:RunnableWithMessageHistory] Entering Chain run with input:
[0m{
  "input": "What is the Mona Lisa?"
}
[32;1m[1;3m[chain/start][0m [1m[chain:RunnableWithMessageHistory > chain:insert_history] Entering Chain run with input:
[0m{
  "input": "What is the Mona Lisa?"
}
[32;1m[1;3m[chain/start][0m [1m[chain:RunnableWithMessageHistory > chain:insert_history > chain:RunnableParallel<chat_history>] Entering Chain run with input:
[0m{
  "input": "What is the Mona Lisa?"
}
[32;1m[1;3m[chain/start][0m [1m[chain:RunnableWithMessageHistory > chain:insert_history > chain:RunnableParallel<chat_history> > chain:load_history] Entering Chain run with input:
[0m{
  "input": "What is the Mona Lisa?"
}
[36;1m[1;3m[chain/end][0m [1m[chain:RunnableWithMessageHistory > chain:insert_history > chain:RunnableParallel<chat_history> > chain:load_history] [1ms] Exiting Chain run with output:
[0m[outputs]
[36;1m[1;3m[chain/end][0m [1m[chain:RunnableWi

AIMessage(content="The Mona Lisa is a renowned half-length portrait painting by Leonardo da Vinci, showcasing his exceptional talent and attention to detail. Leonardo da Vinci, an Italian artist of the Renaissance period, was known for his mastery of technique and realistic depiction of his subjects. The painting's enigmatic expression and intricate details contribute to its status as a masterpiece. If you have any more questions about Leonardo da Vinci's artistic style or the Mona Lisa itself, feel free to ask!", additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 91, 'prompt_tokens': 1953, 'total_tokens': 2044}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-98c2d458-bae3-4d19-9663-c3909c1e35a4-0', usage_metadata={'input_tokens': 1953, 'output_tokens': 91, 'total_tokens': 2044})

In [41]:
with_message_history.invoke({ "input": "And what is it's style?"},
    config={"configurable": {"session_id": "abc123"}}) 

[32;1m[1;3m[chain/start][0m [1m[chain:RunnableWithMessageHistory] Entering Chain run with input:
[0m{
  "input": "And what is it's style?"
}
[32;1m[1;3m[chain/start][0m [1m[chain:RunnableWithMessageHistory > chain:insert_history] Entering Chain run with input:
[0m{
  "input": "And what is it's style?"
}
[32;1m[1;3m[chain/start][0m [1m[chain:RunnableWithMessageHistory > chain:insert_history > chain:RunnableParallel<chat_history>] Entering Chain run with input:
[0m{
  "input": "And what is it's style?"
}
[32;1m[1;3m[chain/start][0m [1m[chain:RunnableWithMessageHistory > chain:insert_history > chain:RunnableParallel<chat_history> > chain:load_history] Entering Chain run with input:
[0m{
  "input": "And what is it's style?"
}
[36;1m[1;3m[chain/end][0m [1m[chain:RunnableWithMessageHistory > chain:insert_history > chain:RunnableParallel<chat_history> > chain:load_history] [1ms] Exiting Chain run with output:
[0m[outputs]
[36;1m[1;3m[chain/end][0m [1m[chain:Runnab

AIMessage(content="The style of the Mona Lisa can be classified as High Renaissance. This period was known for its emphasis on harmony, balance, and realism in art. Leonardo da Vinci's meticulous attention to detail and his use of sfumato (a technique of blending tones to create a soft transition between colors) in the Mona Lisa exemplify the characteristics of the High Renaissance style. This painting is a prime example of the era's focus on naturalism and the portrayal of the human form with precision and grace. If you have any more questions about this particular style or any other artistic styles, feel free to ask!", additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 120, 'prompt_tokens': 2909, 'total_tokens': 3029}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-c2f9ef7e-01e5-4a6a-9193-a78a11901c46-0', usage_metadata={'input_tokens': 2909, 'output_tokens': 120, 'total_tok

Not without effort, we can see there that now, using the conversation history as reference, the LLM is able to rephrase the question inputed by the user to specify that it is talking about 'La Mona Lisa'. 

This query rephrasing step, to not use the same query to filter out the Vector Database that the user has write in natural language, is one of the key points that can be greatly improved, and there are multiple techniques available ([Langchain Retriever Functions](https://python.langchain.com/v0.1/docs/modules/data_connection/retrievers/)). Let's try to customize other of the key points in this type of uses, the memory. We can add a custom Runnable to modify how the history of messages is inserted in the prompt, lets create a function that only takes the last 2 messages for instance

In [42]:
def select_last_2_messages(chat_history):

    if len(chat_history) <2:
        return chat_history
    else:
        print("Shortering chat messages")
        print(chat_history)
        print(chat_history[-2:])
        return chat_history[-2:]

In [43]:
with_message_history = RunnableWithMessageHistory(
    {'chat_history': itemgetter('chat_history') | RunnableLambda(select_last_2_messages), "input": itemgetter('input') | RunnablePassthrough()  } | first_chain,
    get_session_history,
    input_messages_key="input",
    history_messages_key="chat_history"
)

In [44]:
with_message_history.invoke({ "input": "What is the Mona Lisa?"},
    config={"configurable": {"session_id": "abc123"}})

[32;1m[1;3m[chain/start][0m [1m[chain:RunnableWithMessageHistory] Entering Chain run with input:
[0m{
  "input": "What is the Mona Lisa?"
}
[32;1m[1;3m[chain/start][0m [1m[chain:RunnableWithMessageHistory > chain:insert_history] Entering Chain run with input:
[0m{
  "input": "What is the Mona Lisa?"
}
[32;1m[1;3m[chain/start][0m [1m[chain:RunnableWithMessageHistory > chain:insert_history > chain:RunnableParallel<chat_history>] Entering Chain run with input:
[0m{
  "input": "What is the Mona Lisa?"
}
[32;1m[1;3m[chain/start][0m [1m[chain:RunnableWithMessageHistory > chain:insert_history > chain:RunnableParallel<chat_history> > chain:load_history] Entering Chain run with input:
[0m{
  "input": "What is the Mona Lisa?"
}
[36;1m[1;3m[chain/end][0m [1m[chain:RunnableWithMessageHistory > chain:insert_history > chain:RunnableParallel<chat_history> > chain:load_history] [1ms] Exiting Chain run with output:
[0m[outputs]
[36;1m[1;3m[chain/end][0m [1m[chain:RunnableWi

AIMessage(content="The style of the Mona Lisa is classified as High Renaissance. This period emphasized harmony, balance, and realism in art. Leonardo da Vinci's meticulous attention to detail and his use of sfumato in the painting exemplify the characteristics of the High Renaissance style. This painting is a prime example of the era's focus on naturalism and the portrayal of the human form with precision and grace. If you have any more questions about this particular style or any other artistic styles, feel free to ask!", additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 99, 'prompt_tokens': 685, 'total_tokens': 784}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-7feffd99-97d4-4c5d-aa27-b34d6f403a35-0', usage_metadata={'input_tokens': 685, 'output_tokens': 99, 'total_tokens': 784})

<a id="9"></a>
# <div style= "font-family: Cambria; font-weight:bold; letter-spacing: 0px; color: white; font-size:120%; text-align:left;padding:3.0px; background: maroon; border-bottom: 8px solid black" >Planed way forward<br><div>

In this tutorial we have covered in a general point of view the basics elements that allows us to build a RAG chain and application. In an upcoming 'Advanced langchain tutorial' we will cover more in depth components like:

* Evaluation (LLMs as Judge, RAGAS, langSmith...)
* Advanced RAG techniques (Reranking, GraphRAG, etc)
* Agents