<a href="https://colab.research.google.com/github/rcginne/ML-foundations/blob/master/2_2-langchain-rag.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<a target="_blank" href="https://colab.research.google.com/github/vanderbilt-data-science/ai_summer/blob/main/2_2-langchain-rag.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

# AI Solutions with Langchain and RAG
> For Vanderbilt University AI Summer 2024<br>Prepared by Dr. Charreau Bell

_Code versions applicable: May 14, 2024_

## Learning Outcomes:
* Participants will be able to articulate the essential steps and components of a retrieval-augmented generation (RAG) system and implement a standard RAG system using langchain.
* Participants will gain familiarity in inspecting the execution pathways of LLM-based systems.
* Participants will gain familiarity in approaches for the evaluation of LLM-based systems.

### Computing Environment Setup

In [10]:
! pip install langchain==0.1.20 langchain_openai grandalf sentence-transformers
! pip install pypdf chromadb faiss-cpu

Collecting langchain-core<0.2.0,>=0.1.52 (from langchain==0.1.20)
  Using cached langchain_core-0.1.53-py3-none-any.whl.metadata (5.9 kB)
Collecting langsmith<0.2.0,>=0.1.17 (from langchain==0.1.20)
  Using cached langsmith-0.1.147-py3-none-any.whl.metadata (14 kB)
Using cached langchain_core-0.1.53-py3-none-any.whl (303 kB)
Using cached langsmith-0.1.147-py3-none-any.whl (311 kB)
Installing collected packages: langsmith, langchain-core
  Attempting uninstall: langsmith
    Found existing installation: langsmith 0.4.33
    Uninstalling langsmith-0.4.33:
      Successfully uninstalled langsmith-0.4.33
  Attempting uninstall: langchain-core
    Found existing installation: langchain-core 0.3.78
    Uninstalling langchain-core-0.3.78:
      Successfully uninstalled langchain-core-0.3.78
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
langchain-google-genai 2.



In [1]:
%pip install -U langchain-google-genai

Collecting langchain-core>=0.3.75 (from langchain-google-genai)
  Using cached langchain_core-0.3.78-py3-none-any.whl.metadata (3.2 kB)
Collecting langsmith<1.0.0,>=0.3.45 (from langchain-core>=0.3.75->langchain-google-genai)
  Using cached langsmith-0.4.33-py3-none-any.whl.metadata (14 kB)
Using cached langchain_core-0.3.78-py3-none-any.whl (449 kB)
Using cached langsmith-0.4.33-py3-none-any.whl (387 kB)
Installing collected packages: langsmith, langchain-core
  Attempting uninstall: langsmith
    Found existing installation: langsmith 0.1.147
    Uninstalling langsmith-0.1.147:
      Successfully uninstalled langsmith-0.1.147
  Attempting uninstall: langchain-core
    Found existing installation: langchain-core 0.1.53
    Uninstalling langchain-core-0.1.53:
      Successfully uninstalled langchain-core-0.1.53
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflict

In [2]:
# Best practice is to do all imports at the beginning of the notebook, but we have separated them here for learning purposes.
import os

In [3]:
# auth replicated here for reference just in case you choose to do something similar
from google.colab import userdata
os.environ['OPENAI_API_KEY'] = userdata.get('OPENAI_API_KEY')
os.environ['GOOGLE_API_KEY'] = userdata.get('GOOGLE_API_KEY')

## Langchain Introduction

### Overview of System

[Overview of Langchain](https://python.langchain.com/v0.1/docs/get_started/introduction/)

<figure>
<img src='https://python.langchain.com/v0.1/svg/langchain_stack.svg' height=600/>
    <figcaption>
        Langchain Overview, from <a href=https://python.langchain.com/v0.1/docs/get_started/introduction>Langchain Introduction</a>
    </figcaption>
</figure>

### Quick Start
To, as it says - start quickly - get started using the [Quick Start](https://python.langchain.com/v0.1/docs/get_started/quickstart/) page.

### Details of Individual Composition Components
To learn more about any of the individual components used below, use the [Components Page](https://python.langchain.com/v0.1/docs/modules/)

## Review of python formatted strings
To prepare ourselves for langchain, we'll first review formatted strings.

In [None]:
# basic functionality of print
print('Tell me a story about cats')

# with variables
prompt_string = 'Tell me a story about cats'
print('As string ', prompt_string)

# as formatted string
prompt_string = 'Tell me a story about cats'
print(f"With formatted string: {prompt_string}")

Motivating example: you are building a GPT that tells stories. The user just needs to provide the topic.

In [None]:
# as a template string
string_prompt_template = f"Tell me a story about {{topic}}"
string_prompt_template

In [None]:
# you can fill in the template at a later time
string_prompt_template.format(topic='cats')

## Langchain Prompt Templates
> Formatting and arranging prompt strings

Langchain prompt templates work just like this, but with additional functionality targeted towards LLM interaction. There are lots of different prompt templates, but here, we'll focus on two: `PromptTemplate`, and `ChatPromptTemplate`.

**Additional resources**: [Guide on Prompt Templates](https://python.langchain.com/v0.1/docs/modules/model_io/prompts/quick_start/)

In [None]:
from langchain_core.prompts import PromptTemplate, ChatPromptTemplate, MessagesPlaceholder
from langchain_core.output_parsers import StrOutputParser

In [None]:
# create system messsage for shorter responses
brief_system_message = 'You are a helpful assistant. Be brief, succinct, and clear in your responses. Only answer what is asked.'

### PromptTemplate

In [None]:
# Example 1
template = """
You are a helpful assistant. Answer the user's question based ONLY on the provided context.
Context: {context}
Question: {question}
"""
context = "RAG stands for retrieval augmented generation"
question = "What is RAG?"

template.format(context=context, question=question)

In [None]:
lc_template = ChatPromptTemplate.from_template(template)
flc = lc_template.invoke({'context': context, 'question':question})
print(flc) # chat prompt template
print(flc.messages[0]) # basemessage
print(flc.messages[0].content) # content
print(flc.messages[0].type) # role

In [None]:
# Example 2
template_string = "Recommend a song for someone who likes {genre} music and is feeling {mood}."
template = PromptTemplate.from_template(template_string)
istr = template.invoke({"genre": "hiphop", "mood": "sad"})
fstr = template.format(genre='hiphop', mood='good')
istr.text

In [None]:
from langchain_core.prompts import ChatPromptTemplate

RAG_TEMPLATE = """
You are a brilliant research assistant. Use the following context to answer the user's question.
If the context does not contain the answer, politely state that you cannot answer based on the provided documents.

Context:
---
{context}
---

Question: {question}
"""

rag_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", "You are a brilliant research assistant, dedicated to accurate answers."),
        # Note the use of the RAG_TEMPLATE string here for the main user prompt
        ("human", RAG_TEMPLATE),
    ]
)

# Simulate the RAG inputs
simulated_context = "The LangChain Expression Language (LCEL) uses the pipe '|' operator to chain runnables."
simulated_question = "How do you chain components in LCEL?"

# Format the RAG prompt
rag_messages = rag_prompt.invoke({
    "context": simulated_context,
    "question": simulated_question
})

print("--- RAG-ready Message Content ---")
print(rag_messages.messages[1].content)

### ChatPromptTemplate

In [36]:
# create prompt template
lc_chat_prompt_template = ChatPromptTemplate.from_template("tell me a story about {topic}")

# has invocation functionality resulting to chat-style messages
lc_chat_prompt_template.invoke({'topic':'cats'})

ChatPromptValue(messages=[HumanMessage(content='tell me a story about cats', additional_kwargs={}, response_metadata={})])

In [None]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.messages import SystemMessage, HumanMessage

# create message-based chat prompt template
lc_chat_prompt_template = ChatPromptTemplate.from_messages(
    # Use tuples (role, template_string) for messages with placeholders
    [
        # System message (static content, no need for formatting)
        ("system", "You are a helpful and witty assistant."),

        # Human message (uses placeholder {topic})
        ("human", "Tell me a fun fact about {topic}."),
    ]
)

# invoke the chat prompt template
formatted_prompt = lc_chat_prompt_template.invoke({'topic':'cats'})

print("--- Resulting Prompt Messages ---")
for message in formatted_prompt.messages:
    print(f"Role: {message.type.capitalize()}")
    print(f"Content: {message.content}\n")

In [45]:
from langchain_core.messages import HumanMessage, AIMessage
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

# --- 2. Define the Chat History (Previous Turns) ---

# This is a list of previous interactions. Note that it uses
# HumanMessage and AIMessage objects (or the tuple shorthand)
# to define the role and content.
conversation_history = [
    HumanMessage(content="Hello, I am planning a trip to Italy."),
    AIMessage(content="That sounds wonderful! Italy has many great cities. Which city interests you the most?"),
]

prompt_template = ChatPromptTemplate.from_messages(
        [
        # 1. System Instruction: Sets the persona/rules
        ("system", "You are a helpful travel agent specialized in European cities. Keep your answers brief and friendly."),

        # 2. History Placeholder: This is where ALL previous messages will be injected
        # The variable name 'chat_history' must match the key used in the .invoke() call
        MessagesPlaceholder(variable_name="chat_history"),

        # 3. Human's Current Question: This is the newest question from the user
        # It uses a standard input variable {input}
        ("human", "{input}"),
    ]
)

messages = prompt_template.invoke({'chat_history': conversation_history, 'input': 'What is the best way to travel from Rome to Florence?'})
messages

ChatPromptValue(messages=[SystemMessage(content='You are a helpful travel agent specialized in European cities. Keep your answers brief and friendly.', additional_kwargs={}, response_metadata={}), HumanMessage(content='Hello, I am planning a trip to Italy.', additional_kwargs={}, response_metadata={}), AIMessage(content='That sounds wonderful! Italy has many great cities. Which city interests you the most?', additional_kwargs={}, response_metadata={}), HumanMessage(content='What is the best way to travel from Rome to Florence?', additional_kwargs={}, response_metadata={})])

## Langchain Expression Language (LCEL)
**Resource:** [LCEL Overview](https://python.langchain.com/v0.1/docs/expression_language/)
Main Points:
* Runnable Protocol
* Known inputs and outputs on invoke
* Flexibility in chain assembly
* [Standard Interface](https://python.langchain.com/v0.1/docs/expression_language/interface/)

# Basic Model Chains/ Model I/O

**Resource**: [Detailed Guide](https://python.langchain.com/v0.1/docs/modules/)

## Basic Prompt/Model Chain
See [Prompt+LLM](https://python.langchain.com/docs/expression_language/cookbook/prompt_llm_parser) for more information

In [6]:
# from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_google_genai import ChatGoogleGenerativeAI

In [38]:
prompt = ChatPromptTemplate.from_template("tell me a oneliner joke about {topic}")
# model = ChatOpenAI(model="gpt-3.5-turbo")
# chain = prompt | model

In [44]:
# Observe what the prompt looks like when we substitute words into it
chat_prompt_value= prompt.invoke({'topic':"cats"})
ai_message = model.invoke(chat_prompt)
messages = chat_prompt_value.messages
messages.append(ai_message)
messages

[HumanMessage(content='tell me a oneliner joke about cats', additional_kwargs={}, response_metadata={}),
 AIMessage(content='I\'m pretty sure my cat thinks my job is "human can opener."', additional_kwargs={}, response_metadata={'prompt_feedback': {'block_reason': 0, 'safety_ratings': []}, 'finish_reason': 'STOP', 'model_name': 'gemini-2.5-flash', 'safety_ratings': []}, id='run--77e45cb6-9ac8-481b-9ed9-14766ad8e143-0', usage_metadata={'input_tokens': 10, 'output_tokens': 1637, 'total_tokens': 1647, 'input_token_details': {'cache_read': 0}, 'output_token_details': {'reasoning': 1621}})]

In [9]:
model = ChatGoogleGenerativeAI(model="gemini-2.5-flash")
chain = prompt | model


In [10]:
# Now, actually call the entire chain on it
res = chain.invoke({'topic': 'cats'})
print(res)

content="Cats don't have owners, they have live-in staff." additional_kwargs={} response_metadata={'prompt_feedback': {'block_reason': 0, 'safety_ratings': []}, 'finish_reason': 'STOP', 'model_name': 'gemini-2.5-flash', 'safety_ratings': []} id='run--5494e93d-e404-4097-9f8f-b5e4f8a6804b-0' usage_metadata={'input_tokens': 10, 'output_tokens': 1163, 'total_tokens': 1173, 'input_token_details': {'cache_read': 0}, 'output_token_details': {'reasoning': 1149}}


A little helper visualization:

In [11]:
# create visualization
chain.get_graph().print_ascii()

        +-------------+          
        | PromptInput |          
        +-------------+          
                *                
                *                
                *                
     +--------------------+      
     | ChatPromptTemplate |      
     +--------------------+      
                *                
                *                
                *                
   +------------------------+    
   | ChatGoogleGenerativeAI |    
   +------------------------+    
                *                
                *                
                *                
+------------------------------+ 
| ChatGoogleGenerativeAIOutput | 
+------------------------------+ 


## Even more simplified prompt chains

In [13]:
from langchain_core.output_parsers import StrOutputParser

In [14]:
# Create total user prompt chain
prompt = ChatPromptTemplate.from_template("{text}")

# Add output parser
chain = prompt | model | StrOutputParser()

In [15]:
# Now, the user can submit literally whatever
res = chain.invoke({'text':"What is Rag in a single line"})
print(res)

A Raga is a melodic framework in Indian classical music, defined by specific notes, characteristic phrases, and rules for their use, designed to evoke a particular mood or emotion.


## What just happened? Inspecting model behavior
Several ways to do this:
* `langchain` verbosity/debugging
* `langsmith`

### Langchain
Resource: [Guides -> Langchain Debugging](https://python.langchain.com/v0.1/docs/guides/development/debugging/)

In [16]:
from langchain.globals import set_debug, set_verbose

In [17]:
set_debug(True)
set_verbose(True)

In [48]:
prompt = ChatPromptTemplate.from_template("{text}")
# alright. When there is only one place holder we don't need to specify it explicitely when invoking the chain
prompt.invoke("what is it?")

ChatPromptValue(messages=[HumanMessage(content='what is it?', additional_kwargs={}, response_metadata={})])

In [18]:
# Basic prompt -> model -> parser chain
chain = prompt | model | StrOutputParser()
chain.invoke('What is a python f-string?')

[32;1m[1;3m[chain/start][0m [1m[chain:RunnableSequence] Entering Chain run with input:
[0m{
  "input": "What is a python f-string?"
}
[32;1m[1;3m[chain/start][0m [1m[chain:RunnableSequence > prompt:ChatPromptTemplate] Entering Prompt run with input:
[0m{
  "input": "What is a python f-string?"
}
[36;1m[1;3m[chain/end][0m [1m[chain:RunnableSequence > prompt:ChatPromptTemplate] [1ms] Exiting Prompt run with output:
[0m[outputs]
[32;1m[1;3m[llm/start][0m [1m[chain:RunnableSequence > llm:ChatGoogleGenerativeAI] Entering LLM run with input:
[0m{
  "prompts": [
    "Human: What is a python f-string?"
  ]
}
[36;1m[1;3m[llm/end][0m [1m[chain:RunnableSequence > llm:ChatGoogleGenerativeAI] [27.45s] Exiting LLM run with output:
[0m{
  "generations": [
    [
      {
        "text": "A Python **f-string** (short for \"formatted string literal\") is a powerful and convenient way to embed expressions inside string literals. Introduced in Python 3.6, it provides a concise and 

'A Python **f-string** (short for "formatted string literal") is a powerful and convenient way to embed expressions inside string literals. Introduced in Python 3.6, it provides a concise and readable syntax for creating formatted strings.\n\n### How to Create an f-string\n\nYou create an f-string by prefixing a string literal with the letter `f` or `F` (e.g., `f"..."` or `F"..."`). Inside the string, you can then place Python expressions within curly braces `{}`. These expressions will be evaluated at runtime and their results inserted into the string.\n\n### Key Features and Benefits\n\n1.  **Conciseness and Readability:**\n    *   They are much more readable than older methods like the `%` operator or `str.format()` because the variables and expressions are placed directly inline with the string text.\n\n    ```python\n    name = "Alice"\n    age = 30\n\n    # f-string\n    print(f"My name is {name} and I am {age} years old.")\n    # Output: My name is Alice and I am 30 years old.\n

### Langsmith
Resource: [Tracing Langchain with Langsmith](https://docs.smith.langchain.com/how_to_guides/tracing/trace_with_langchain)

Don't have a langsmith API Key yet? You'll need a user account on [Langsmith](https://smith.langchain.com/). Then, follow these [instructions provided by langsmith](https://docs.smith.langchain.com/#2-create-an-api-key).

In [19]:
# reset this
set_debug(False)

In [None]:
# enable tracing and set project name
os.environ['LANGCHAIN_TRACING_V2'] = "false"

# uncomment the following two lines before running the cell if you have a Langchain/Langsmith API Key
#os.environ['LANGCHAIN_API_KEY'] = userdata.get('LANGCHAIN_API_KEY')
#os.environ['LANGCHAIN_TRACING_V2'] = "true"

# set langchain project
os.environ['LANGCHAIN_PROJECT'] = 'May15'

In [20]:
# use a the basic chain from above
chain = (prompt | model | StrOutputParser()) #add new component for tracing
response = chain.invoke("What is a python f-string?")
response

'A **Python f-string** (short for "formatted string literal") is a way to embed expressions inside string literals, using a minimal syntax. Introduced in **Python 3.6**, f-strings provide a concise and convenient way to create formatted strings.\n\n### How to use them:\n\nYou prefix a string literal with the letter `f` (or `F`). Inside the string, you can use curly braces `{}` to contain Python expressions. These expressions will be evaluated at runtime and their results will be inserted into the string.\n\n### Basic Syntax and Examples:\n\n```python\nname = "Alice"\nage = 30\n\n# Basic usage\ngreeting = f"Hello, my name is {name} and I am {age} years old."\nprint(greeting) # Output: Hello, my name is Alice and I am 30 years old.\n\n# Embedding expressions\nx = 10\ny = 5\ncalculation = f"The sum of {x} and {y} is {x + y}."\nprint(calculation) # Output: The sum of 10 and 5 is 15.\n\n# Calling functions/methods\ndef get_status():\n    return "active"\n\nstatus_message = f"User status: {g

#### View langsmith traces
We can take a look at this trace on [Langsmith](https://smith.langchain.com)

## Adding Memory
Adapted from: [LCEL Adding Message History](https://python.langchain.com/v0.1/docs/expression_language/how_to/message_history/)
Also see:
- [Langchain -> Use Cases -> Chatbots -> Memory Management](https://python.langchain.com/v0.1/docs/use_cases/chatbots/memory_management/)
- [Components -> More -> Memory](https://python.langchain.com/v0.1/docs/modules/memory/)

In [28]:
from langchain_community.chat_message_histories import ChatMessageHistory
from langchain_core.chat_history import BaseChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_core.prompts import MessagesPlaceholder

In [30]:
# create chat template with standard elements
# model = ChatOpenAI(name='gpt-3.5-turbo')
# 1. Define the System Instruction
brief_system_message = (
    "You are a concise and helpful AI assistant. Always respond with "
    "a single sentence, using the previous conversation for context."
)
prompt = ChatPromptTemplate.from_messages(
    [
        ("system", brief_system_message),
        MessagesPlaceholder(variable_name="chat_history"),
        ("human", "{most_recent_user_message}"),
    ]
)

turns_chain = prompt | model | StrOutputParser()

In [32]:
system_message = (
    "You are a concise and helpful AI assistant. Always reponsd with "
    "a single sentencce, using the previous conversation for context."
)
ex_prompt = ChatPromptTemplate.from_messages([
    ('system', system_message),
    MessagesPlaceholder(variable_name="chat_history"),
    ('human', "{most_recent_user_message}")
])
ex_prompt
history = [('human', 'tell me a joke about cats'), ('ai', 'cats jump on beds')]
user_message = 'what is so funny about that joke?'
ex_turns_chain = ex_prompt | model | StrOutputParser()

ex_turns_chain.invoke({'chat_history': history, 'most_recent_user_message': user_message})

"It's funny because cats often inconveniently jump on beds, especially when you're trying to sleep or have just made the bed."

In [33]:
# quickly try out chain, pretending we've already said something to the system
first_chat_turn_messages = [("human", "Tell me a joke about cats"),
                            ("ai", "Cats jump on beds")]

next_user_message = "What was funny about that joke?"
turns_chain.invoke({'most_recent_user_message': next_user_message,
                    'chat_history': first_chat_turn_messages})

'There was no inherent humor or punchline in that factual statement.'

In [54]:
# all saved conversations
chat_conversation_threads = {}

# define function to create new conversation or load old one based on session_id
def get_session_history(session_id: str) -> BaseChatMessageHistory:
    if session_id not in chat_conversation_threads:
        chat_conversation_threads[session_id] = ChatMessageHistory()
    return chat_conversation_threads[session_id]

# create chat history enabled chain
chat_with_message_history = RunnableWithMessageHistory(
    turns_chain,
    get_session_history,
    input_messages_key="most_recent_user_message",
    history_messages_key="chat_history",
).with_config(run_name = 'Chat with Message History')

Let's try it!

In [50]:
chat_conversation = {}
def get_user_session_history(session_id: str) -> BaseChatMessageHistory:
    if session_id not in chat_conversation:
        chat_conversation[session_id] = ChatMessageHistory()
    return chat_conversation[session_id]

chat_with_history = RunnableWithMessageHistory(
    turns_chain, # chain to invoke
    get_user_session_history, # method to call to store or retrieve the user history
    input_messages_key="most_recent_user_message", # tells which key in the invocation input contains the user messages
    history_messages_key='chat_history', # which key in the chain expects the chat history. Matches to the MessagesPlaceHolder variable name
).with_config(run_name = "Chat with Message History")

In [53]:
user_message = "tell me a joke"
session_id = 'user1'
chat_with_history.invoke({'most_recent_user_message': user_message},
                         config={"configurable": {"session_id": session_id}})

"Why don don't scientists trust atoms? Because they make up everything!"

In [55]:
# send first message
user_message_1 = "Tell me a joke about cats"
session_id_1 = "convo_1"
chat_with_message_history.invoke({'most_recent_user_message': user_message_1},
                                config={"configurable": {"session_id": session_id_1}})

'Why did the cat sit on the computer? To keep an eye on the mouse!'

In [56]:
user_message_2 = "What was funny about that joke?"
# send second message
chat_with_message_history.invoke({'most_recent_user_message': user_message_2},
                                    config={"configurable": {"session_id": session_id_1}})

'The humor comes from the pun on "mouse," referring to both a computer mouse and a rodent that cats hunt.'

In [58]:
# now lets look at the history
chat_conversation_threads[session_id_1].messages

[HumanMessage(content='Tell me a joke about cats', additional_kwargs={}, response_metadata={}),
 AIMessage(content='Why did the cat sit on the computer? To keep an eye on the mouse!', additional_kwargs={}, response_metadata={}),
 HumanMessage(content='What was funny about that joke?', additional_kwargs={}, response_metadata={}),
 AIMessage(content='The humor comes from the pun on "mouse," referring to both a computer mouse and a rodent that cats hunt.', additional_kwargs={}, response_metadata={})]

#### View langsmith traces
We can take a look at this trace on [Langsmith](https://smith.langchain.com)

# Retrieval Augmented Generation (RAG)
## Review
* Conceptual and step-by-step guide about [RAG](https://python.langchain.com/v0.1/docs/use_cases/question_answering/)
* Learn more about implementing [RAG](https://python.langchain.com/docs/expression_language/cookbook/retrieval)

**Data Ingestion (Creating a Vector Store of Documents)**
<figure>
<img src='https://python.langchain.com/v0.1/assets/images/rag_indexing-8160f90a90a33253d0154659cf7d453f.png' height=300/>
    <figcaption>
        Source: Data Ingestion (Preparing Embeddings), from <a href=https://python.langchain.com/v0.1/docs/use_cases/question_answering/>Langchain Use Case: Q&A with RAG</a>
    </figcaption>
</figure>

**Retrieval and Generation**
<figure>
<img src='https://python.langchain.com/v0.1/assets/images/rag_retrieval_generation-1046a4668d6bb08786ef73c56d4f228a.png' height=300/>
    <figcaption>
        Source: Retrieval and Generation, from <a href=https://python.langchain.com/v0.1/docs/use_cases/question_answering/>Langchain Use Case: Q&A with RAG</a>
    </figcaption>
</figure>



## Document Loaders and Splitters
[Data Ingestion/Vector Store Preparation Guide ](https://python.langchain.com/docs/modules/data_connection/)
<figure>
<img src='https://python.langchain.com/v0.1/assets/images/data_connection-95ff2033a8faa5f3ba41376c0f6dd32a.jpg' height=300/>
    <figcaption>
        Langchain Retrieval Component, from <a href=https://python.langchain.com/docs/modules/data_connection/>Langchain Components</a>
    </figcaption>
</figure>

**Other extremely useful resources**:
* **[Components -> Retrieval -> Document Loaders](https://python.langchain.com/v0.1/docs/modules/data_connection/document_loaders/)**: Use the sidebar to navigate through different types of document loaders. For all available integrations available through langchain, see [Components -> Integrations -> Components](https://python.langchain.com/v0.1/docs/integrations/document_loaders/)
* **[Components -> Retrieval -> Text Splitters](https://python.langchain.com/v0.1/docs/modules/data_connection/document_transformers/)**: Use the sidebar to navigate through different types of text splitters. For all available integrations available through langchain, see [Components -> Integrations -> Components](https://python.langchain.com/v0.1/docs/modules/data_connection/document_transformers/)

In [None]:
# example pdf links
doc_1 = 'https://registrar.vanderbilt.edu/documents/Undergraduate_School_Catalog_2023-24_UPDATED2.pdf'
doc_2 = 'https://www.tnmd.uscourts.gov/sites/tnmd/files/Pro%20Se%20Nonprisoner%20Handbook.pdf'
doc_3 = 'https://www.uscis.gov/sites/default/files/document/guides/M-654.pdf'

### Example: pdfloader and recursive character splitter

In [None]:
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

In [None]:
loader = #choose loader and document

# Add the kind of text splitter
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=800,
    chunk_overlap=250,
)

# use the text splitter to split the document
chunks = # load and split chunks

In [None]:
# see how many chunks were made
print(len(chunks))

In [None]:
# inspect a single chunk
chunks[0]

In [None]:
# view first 3 chunks
for chunk_index, chunk in enumerate(chunks[:3]):
    print(f'****** Chunk {chunk_index} ******\n{chunk.page_content}\n')

### Example: Loading website data and splitting

In [None]:
from bs4 import SoupStrainer
from langchain_community.document_loaders import WebBaseLoader

In [None]:
constitution_website = "https://constitutioncenter.org/the-constitution/full-text"

# load using WebBaseLoader
web_loader = WebBaseLoader(constitution_website,
                       bs_kwargs = {'parse_only':SoupStrainer(['article'])})

# read the document from the website (without splitting)
web_document = #load document

In [None]:
# only the first few characters
print(web_document[0].page_content[:330])

Now, we'll split in a slightly different way. Since we've already scraped the website, we will just directly use the splitter. Note that after we load the website, we have a data type of (list of) `Document`.

In [None]:
website_splitter = RecursiveCharacterTextSplitter(chunk_size=330, chunk_overlap=100, # add ability to use start_index
website_chunks = website_splitter.split_documents(web_document)
len(website_chunks)

In [None]:
website_chunks[:3]

If you know less about the constitution and more about Star wars (or another topic available on Wikipedia), feel free to run the cells below to use that text moving forward. It will replace the `website_chunks` variable. You may need to adjust the `chunk_size` and `chunk_overlap` options. Uncomment and run these cells.

In [None]:
# alternate data
webloader = WebBaseLoader('https://simple.wikipedia.org/wiki/Star_Wars_Episode_IV:_A_New_Hope',
                       bs_kwargs = {'parse_only':SoupStrainer('div', id='bodyContent')})
web_chunks = webloader.load_and_split(RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100, add_start_index=True))
print('Number of chunks generated: ', len(web_chunks))
print('\n\nSample: ')
web_chunks[:5]

## Vector Stores: A way to store embeddings (hidden states) of your data
The choice of vector store influences how "relevant" documents can be identified, speed of document retrieval, and organization.

Helpful resources:
* **[Brief Langchain Reference](https://python.langchain.com/v0.1/docs/modules/data_connection/vectorstores/)**
* **[Vector Store Integrations](https://python.langchain.com/v0.1/docs/integrations/vectorstores/)**

In [None]:
from langchain_community.vectorstores import Chroma
from langchain_openai import ChatOpenAI, OpenAIEmbeddings

In [None]:
# create the vector store
db = # code to create store

### Similarity Search for Documents

In [None]:
# query the vector store
query = 'When was a new hope released?'

# use a similarity search between the vectors


In [None]:
# get cosine distance alongside results
relevant_docs =
relevant_docs

In [None]:
# another query, but instead use normalized score
query = 'What is the plot of A New Hope?'
relevant_docs = # use normalized score
relevant_docs

## Retrievers: How we select the most relevant data

In [None]:
# or use the db as a retriever with lcel
retriever = # create retriever
retrieved_docs = retriever.invoke(query)
retrieved_docs

## RAG
For when we want to actually do generation, but want there to be retrieved documents included in the generation. For this, we're going to switch to a different embedding model which will be downloaded on our machine (or if on Google Colab, there).

In [None]:
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import FAISS

from langchain_core.runnables import RunnableLambda, RunnablePassthrough, RunnableSequence, RunnableParallel

In [None]:
# use different embedding model
embeddings_fn = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2") #, model_kwargs={"device":'mps'})
hf_db = FAISS.from_documents(web_chunks, embeddings_fn)
hf_retriever = hf_db.as_retriever(search_kwargs={"k":1})

# make sure it works
query = 'What is the plot of A New Hope?'
hf_retriever.invoke(query)

### Default RAG: Question Answering

In [None]:
# Basic question answering template
template = """Answer the question based only on the following context:
{context}

Question: {question}
"""

# compose prompt
rag_prompt = ChatPromptTemplate.from_template(template)

# create model (so we don't have to depend on the model definition at the top of the notebook)
model = ChatOpenAI(model_name='gpt-3.5-turbo')

In [None]:
# We need to format the retrieved documents better
def format_docs(docs):
    return "\n\n".join([f'Reference text:\n{doc.page_content}\n\Citation Info: {doc.metadata}' for doc in docs])

In [None]:
# inspect behavior of format_docs
format_docs(web_chunks[:3])

In [None]:
# compose the chain
rag_chain = (
    ## Add special rag part
    | rag_prompt
    | model
    | StrOutputParser()
)

In [None]:
# run the chain
rag_chain.with_config(run_name = 'basic_rag_chain').invoke('What is the plot of a new hope')

### RAG with Sources
Resource: [Langchain: Returning Sources](https://python.langchain.com/v0.1/docs/use_cases/question_answering/sources/)

In [None]:
from langchain_core.runnables import RunnableParallel

In [None]:
# Basic prompt -> model -> parser chain
single_turn_chain = (
    rag_prompt
    | model
    | StrOutputParser()
)

# Break previous chain in half to access context and question in returned response
rag_chain_with_source = RunnableParallel(
    {"context": hf_retriever | format_docs, "question": RunnablePassthrough()}
).assign(answer=single_turn_chain)

In [None]:
# invoke
response = rag_chain_with_source.with_config(run_name = 'sources_rag_chain').invoke("What happened to Princess Leia in a New Hope?")

# print full response
for key, value in response.items():
    print(f"{key}: {value}\n")

### RAG with Chat History?

We will have a one-turn system with our RAG system. How do we add chat memory? See below for implementation guides:
- [Use cases: Q&A with Rag: Add Chat History.](https://python.langchain.com/v0.1/docs/use_cases/question_answering/chat_history/)  Builds on a RAG system, so will be of interest.

## LLM System Metrics
Resource: [Guides -> Evaluation](https://python.langchain.com/v0.1/docs/guides/productionization/evaluation/)

In [None]:
from langchain.evaluation import load_evaluator

In [None]:
# configure what we want to evaluate
rs_question = "What happened to Princess Leia in a New Hope?"
rs_answer = rag_chain.with_config(run_name = 'basic_rag_chain').invoke(rs_question)

In [None]:
# load an evaluator that uses the conciseness criteria
evaluator = #load evaluator

# evaluate whether our model was concise or not
eval_result = evaluator.evaluate_strings(
    # add inputs to evaluate
)

# print result
eval_result

View other criteria available through LangChain

In [None]:
from langchain.evaluation import Criteria
list(Criteria)

# Homework
The following exercises are designed to help you gain depth in what you've learned about RAG today.

## [Required] Learning more about RAG
### Splitting Text (Conceptual)
There are so many ways to split the text, and each has an impact on the resultant RAG system. Below is a resource (with sidebar dropdown) for you to read over and then answer the following question for the text splitting approaches (as relevant to your application):
* What is the proposed value in adopting this text splitting approach? What are some drawbacks?

### Splitting Text (Programmatic)
Above, we have adopted specific chunk sizes and splitting approaches. Choose one of the documents (or use your own) and:
* Modify the chunk size. How does this impact the resulting RAG performance? The cost?
* Implement a different type of text splitter (as applicable, i.e., not code text splitters if you're not splitting code). How does this impact the resulting RAG performance? The cost?

### Customizing RAG
There are many, many, many ways to improve results with RAG. Below are some resources for you to read over then complete the following:
1. What is the proposed value in adopting this approach? In other words, what is the expected performance improvement by using this method?
2. How might it apply to your work?

* [**Query Analysis**](https://python.langchain.com/v0.1/docs/use_cases/query_analysis/). Make sure to peruse subtopics.
* [**Synthetic Data Generation**](https://python.langchain.com/v0.1/docs/use_cases/data_generation/).
* [**Tagging**](https://python.langchain.com/v0.1/docs/use_cases/tagging/).
* [**Routing Chain Logic Based on inputs****](https://python.langchain.com/v0.1/docs/expression_language/how_to/routing/).
* [**Chain Composition**](https://python.langchain.com/v0.1/docs/modules/chains/). Of particular interest here are the Legacy chains. Although they will probably be completely removed in the future, consider their behavior. In what cases might these behaviors be useful?

** This is highly recommended reading, but may not be suitable for those who are novices in programming. Although there is text, the code demonstrates concretely by the text. For novices, it may be better to copy/paste the code as well to understand the behavior, although it is noted that such a task may be outside of the the time constraints of for some participants.

## [Required] Learning more about Evaluation
Read the following text and answer these questions:
1. What is the purpose of the individual criterion? Does it require and external LLM for evaluation?
2. In what cases might this criteria be useful?

Depth Text: [**Evaluation, by Langchain**](https://python.langchain.com/v0.1/docs/guides/productionization/evaluation/)

## [Highly recommended] Learning more about the LLM System Lifecycle
There is more to an LLM-based system than a user interface and the LLM chain. There is a whole framework around inspecting, testing, and evaluating these systems. Read the following and answer the questions below:
1. Summarize the purpose of the individual components of the langsmith system (they generalize to all LLM systems).
2. Consider your favorite LLM UI (i.e., ChatGPT, Gemini, Claude, etc). Describe how you think these components are utilized the LLM system.

Depth Text: [**LangSmith User Guide**](https://docs.smith.langchain.com/old/user_guide)

## [Recommended] Practicing with RAG and Langchain
### Exercise 1: Modify the RAG system
Modify or create a new chain which:
1. Uses a different LLM than the one used in this notebook.
2. Uses a different document loader
3. Uses a different splitter than the one used in this notebook.
4. Uses a different vector store/retriever than the one used in this notebook.

Use the resources provided in the relevant sections of the notebook for other options.

### Exercise 2: Implement a new RAG system
1. [More challenging] Add chat history to one of your RAG chains. Make sure to enable tracing and inspect langsmith to ensure that the chat history is used.
2. Create a gradio user interface to use your chain in a more user-friendly way.
3. [Challenging] Implement an additional chain which uses one of the strategies you read about in the "Learning more about RAG" section.




