<a target="_blank" href="https://colab.research.google.com/github/vanderbilt-data-science/ai_summer/blob/main/2_2-langchain-rag.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

# AI Solutions with Langchain and RAG
> For Vanderbilt University AI Summer 2024<br>Prepared by Dr. Charreau Bell

_Code versions applicable: May 14, 2024_

## Learning Outcomes:
* Participants will be able to articulate the essential steps and components of a retrieval-augmented generation (RAG) system and implement a standard RAG system using langchain.
* Participants will gain familiarity in inspecting the execution pathways of LLM-based systems.
* Participants will gain familiarity in approaches for the evaluation of LLM-based systems.

### Computing Environment Setup

In [None]:
! pip install langchain==0.1.20 langchain_openai grandalf sentence-transformers
! pip install pypdf chromadb faiss-cpu

In [None]:
# Best practice is to do all imports at the beginning of the notebook, but we have separated them here for learning purposes.
import os

In [None]:
# auth replicated here for reference just in case you choose to do something similar
from google.colab import userdata
os.environ['OPENAI_API_KEY'] = userdata.get('OPENAI_API_KEY')

## Langchain Introduction

### Overview of System

[Overview of Langchain](https://python.langchain.com/v0.1/docs/get_started/introduction/)

<figure>
<img src='https://python.langchain.com/v0.1/svg/langchain_stack.svg' height=600/>
    <figcaption>
        Langchain Overview, from <a href=https://python.langchain.com/v0.1/docs/get_started/introduction>Langchain Introduction</a>
    </figcaption>
</figure>

### Quick Start
To, as it says - start quickly - get started using the [Quick Start](https://python.langchain.com/v0.1/docs/get_started/quickstart/) page.

### Details of Individual Composition Components
To learn more about any of the individual components used below, use the [Components Page](https://python.langchain.com/v0.1/docs/modules/)

## Review of python formatted strings
To prepare ourselves for langchain, we'll first review formatted strings.

In [None]:
# basic functionality of print
print('Tell me a story about cats')

# with variables
prompt_string = 'Tell me a story about cats'
print('As string ', prompt_string)

# as formatted string
prompt_string = 'Tell me a story about cats'
print(f"With formatted string: {prompt_string}")

Motivating example: you are building a GPT that tells stories. The user just needs to provide the topic.

In [None]:
# as a template string
string_prompt_template = f"Tell me a story about {{topic}}"
string_prompt_template

In [None]:
# you can fill in the template at a later time
string_prompt_template.format(topic='cats')

## Langchain Prompt Templates
> Formatting and arranging prompt strings

Langchain prompt templates work just like this, but with additional functionality targeted towards LLM interaction. There are lots of different prompt templates, but here, we'll focus on two: `PromptTemplate`, and `ChatPromptTemplate`.

**Additional resources**: [Guide on Prompt Templates](https://python.langchain.com/v0.1/docs/modules/model_io/prompts/quick_start/)

In [11]:
from langchain_core.prompts import PromptTemplate, ChatPromptTemplate, MessagesPlaceholder
from langchain_core.output_parsers import StrOutputParser

In [12]:
# create system messsage for shorter responses
brief_system_message = 'You are a helpful assistant. Be brief, succinct, and clear in your responses. Only answer what is asked.'

### PromptTemplate

In [30]:
# Example 1
template = """
You are a helpful assistant. Answer the user's question based ONLY on the provided context.
Context: {context}
Question: {question}
"""
context = "RAG stands for retrieval augmented generation"
question = "What is RAG?"

template.format(context=context, question=question)

"\nYou are a helpful assistant. Answer the user's question based ONLY on the provided context.\nContext: RAG stands for retrieval augmented generation\nQuestion: What is RAG?\n"

In [43]:
lc_template = ChatPromptTemplate.from_template(template)
flc = lc_template.invoke({'context': context, 'question':question})
print(flc) # chat prompt template
print(flc.messages[0]) # basemessage
print(flc.messages[0].content) # content
print(flc.messages[0].type) # role

messages=[HumanMessage(content="\nYou are a helpful assistant. Answer the user's question based ONLY on the provided context.\nContext: RAG stands for retrieval augmented generation\nQuestion: What is RAG?\n")]
content="\nYou are a helpful assistant. Answer the user's question based ONLY on the provided context.\nContext: RAG stands for retrieval augmented generation\nQuestion: What is RAG?\n"

You are a helpful assistant. Answer the user's question based ONLY on the provided context.
Context: RAG stands for retrieval augmented generation
Question: What is RAG?

human


In [26]:
# Example 2
template_string = "Recommend a song for someone who likes {genre} music and is feeling {mood}."
template = PromptTemplate.from_template(template_string)
istr = template.invoke({"genre": "hiphop", "mood": "sad"})
fstr = template.format(genre='hiphop', mood='good')
istr.text

'Recommend a song for someone who likes hiphop music and is feeling sad.'

### ChatPromptTemplate

In [None]:
# create prompt template
lc_chat_prompt_template = ChatPromptTemplate.from_template("tell me a story about {topic}")

# has invocation functionality resulting to chat-style messages
lc_chat_prompt_template.invoke({'topic':'cats'})

In [None]:
# create message-based chat prompt template
lc_chat_prompt_template = ChatPromptTemplate.from_messages(
    # create messages similar to OpenAI API
)

# invoke the chat prompt template
lc_chat_prompt_template.invoke({'topic':'cats'})

## Langchain Expression Language (LCEL)
**Resource:** [LCEL Overview](https://python.langchain.com/v0.1/docs/expression_language/)
Main Points:
* Runnable Protocol
* Known inputs and outputs on invoke
* Flexibility in chain assembly
* [Standard Interface](https://python.langchain.com/v0.1/docs/expression_language/interface/)

# Basic Model Chains/ Model I/O

**Resource**: [Detailed Guide](https://python.langchain.com/v0.1/docs/modules/)

## Basic Prompt/Model Chain
See [Prompt+LLM](https://python.langchain.com/docs/expression_language/cookbook/prompt_llm_parser) for more information

In [None]:
from langchain_openai import ChatOpenAI

In [None]:
prompt = ChatPromptTemplate.from_template("tell me a joke about {foo}")
model
chain

In [None]:
# Observe what the prompt looks like when we substitute words into it
prompt.invoke({'foo':"cats"})

In [None]:
# Now, actually call the entire chain on it

print(res)

A little helper visualization:

In [None]:
# create visualization
chain.get_graph().print_ascii()

## Even more simplified prompt chains

In [None]:
# Create total user prompt chain
prompt = ChatPromptTemplate.from_template("{text}")

# Add output parser
chain = prompt | model | StrOutputParser()

In [None]:
# Now, the user can submit literally whatever
res = chain.invoke({'text':"Briefly and succintly summarize Episodes 4-6 of Star Wars."})
print(res)

## What just happened? Inspecting model behavior
Several ways to do this:
* `langchain` verbosity/debugging
* `langsmith`

### Langchain
Resource: [Guides -> Langchain Debugging](https://python.langchain.com/v0.1/docs/guides/development/debugging/)

In [None]:
from langchain.globals import set_debug, set_verbose

In [None]:
set_debug(True)
set_verbose(True)

In [None]:
# Basic prompt -> model -> parser chain
chain = prompt | model | StrOutputParser()
chain.invoke('What is a python f-string?')

### Langsmith
Resource: [Tracing Langchain with Langsmith](https://docs.smith.langchain.com/how_to_guides/tracing/trace_with_langchain)

Don't have a langsmith API Key yet? You'll need a user account on [Langsmith](https://smith.langchain.com/). Then, follow these [instructions provided by langsmith](https://docs.smith.langchain.com/#2-create-an-api-key).

In [None]:
# reset this
set_debug(False)

In [None]:
# enable tracing and set project name
os.environ['LANGCHAIN_TRACING_V2'] = "false"

# uncomment the following two lines before running the cell if you have a Langchain/Langsmith API Key
#os.environ['LANGCHAIN_API_KEY'] = userdata.get('LANGCHAIN_API_KEY')
#os.environ['LANGCHAIN_TRACING_V2'] = "true"

# set langchain project
os.environ['LANGCHAIN_PROJECT'] = 'May15'

In [None]:
# use a the basic chain from above
chain = (prompt | model | StrOutputParser()) #add new component for tracing
response = chain.invoke("What is a python f-string?")
response

#### View langsmith traces
We can take a look at this trace on [Langsmith](https://smith.langchain.com)

## Adding Memory
Adapted from: [LCEL Adding Message History](https://python.langchain.com/v0.1/docs/expression_language/how_to/message_history/)
Also see:
- [Langchain -> Use Cases -> Chatbots -> Memory Management](https://python.langchain.com/v0.1/docs/use_cases/chatbots/memory_management/)
- [Components -> More -> Memory](https://python.langchain.com/v0.1/docs/modules/memory/)

In [None]:
from langchain_community.chat_message_histories import ChatMessageHistory
from langchain_core.chat_history import BaseChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory

In [None]:
# create chat template with standard elements
model = ChatOpenAI(name='gpt-3.5-turbo')
prompt = ChatPromptTemplate.from_messages(
    [
        ("system", brief_system_message),
        MessagesPlaceholder(variable_name="chat_history"),
        ("human", "{most_recent_user_message}"),
    ]
)

turns_chain = prompt | model | StrOutputParser()

In [None]:
# quickly try out chain, pretending we've already said something to the system
first_chat_turn_messages = [("human", "Tell me a joke about cats"),
                            ("ai", "Cats jump on beds")]

next_user_message = "What was funny about that joke?"
turns_chain.invoke({'most_recent_user_message': next_user_message,
                    'chat_history': first_chat_turn_messages})

In [None]:
# all saved conversations
chat_conversation_threads = {}

# define function to create new conversation or load old one based on session_id
def get_session_history(session_id: str) -> BaseChatMessageHistory:
    if session_id not in chat_conversation_threads:
        chat_conversation_threads[session_id] = ChatMessageHistory()
    return chat_conversation_threads[session_id]

# create chat history enabled chain
chat_with_message_history = RunnableWithMessageHistory(
    turns_chain,
    get_session_history,
    input_messages_key="most_recent_user_message",
    history_messages_key="chat_history",
).with_config(run_name = 'Chat with Message History')

Let's try it!

In [None]:
# send first message
user_message_1 = "Tell me a joke about cats"
session_id_1 = "convo_1"
chat_with_message_history.invoke({'most_recent_user_message': # add first message},
                                config={"configurable": {"session_id": # add session_id}})

In [None]:
# send second message
chat_with_message_history.invoke({'most_recent_user_message': # add another message to the chat},
                                    config={"configurable": {"session_id": session_id_1}})

#### View langsmith traces
We can take a look at this trace on [Langsmith](https://smith.langchain.com)

# Retrieval Augmented Generation (RAG)
## Review
* Conceptual and step-by-step guide about [RAG](https://python.langchain.com/v0.1/docs/use_cases/question_answering/)
* Learn more about implementing [RAG](https://python.langchain.com/docs/expression_language/cookbook/retrieval)

**Data Ingestion (Creating a Vector Store of Documents)**
<figure>
<img src='https://python.langchain.com/v0.1/assets/images/rag_indexing-8160f90a90a33253d0154659cf7d453f.png' height=300/>
    <figcaption>
        Source: Data Ingestion (Preparing Embeddings), from <a href=https://python.langchain.com/v0.1/docs/use_cases/question_answering/>Langchain Use Case: Q&A with RAG</a>
    </figcaption>
</figure>

**Retrieval and Generation**
<figure>
<img src='https://python.langchain.com/v0.1/assets/images/rag_retrieval_generation-1046a4668d6bb08786ef73c56d4f228a.png' height=300/>
    <figcaption>
        Source: Retrieval and Generation, from <a href=https://python.langchain.com/v0.1/docs/use_cases/question_answering/>Langchain Use Case: Q&A with RAG</a>
    </figcaption>
</figure>



## Document Loaders and Splitters
[Data Ingestion/Vector Store Preparation Guide ](https://python.langchain.com/docs/modules/data_connection/)
<figure>
<img src='https://python.langchain.com/v0.1/assets/images/data_connection-95ff2033a8faa5f3ba41376c0f6dd32a.jpg' height=300/>
    <figcaption>
        Langchain Retrieval Component, from <a href=https://python.langchain.com/docs/modules/data_connection/>Langchain Components</a>
    </figcaption>
</figure>

**Other extremely useful resources**:
* **[Components -> Retrieval -> Document Loaders](https://python.langchain.com/v0.1/docs/modules/data_connection/document_loaders/)**: Use the sidebar to navigate through different types of document loaders. For all available integrations available through langchain, see [Components -> Integrations -> Components](https://python.langchain.com/v0.1/docs/integrations/document_loaders/)
* **[Components -> Retrieval -> Text Splitters](https://python.langchain.com/v0.1/docs/modules/data_connection/document_transformers/)**: Use the sidebar to navigate through different types of text splitters. For all available integrations available through langchain, see [Components -> Integrations -> Components](https://python.langchain.com/v0.1/docs/modules/data_connection/document_transformers/)

In [None]:
# example pdf links
doc_1 = 'https://registrar.vanderbilt.edu/documents/Undergraduate_School_Catalog_2023-24_UPDATED2.pdf'
doc_2 = 'https://www.tnmd.uscourts.gov/sites/tnmd/files/Pro%20Se%20Nonprisoner%20Handbook.pdf'
doc_3 = 'https://www.uscis.gov/sites/default/files/document/guides/M-654.pdf'

### Example: pdfloader and recursive character splitter

In [None]:
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

In [None]:
loader = #choose loader and document

# Add the kind of text splitter
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=800,
    chunk_overlap=250,
)

# use the text splitter to split the document
chunks = # load and split chunks

In [None]:
# see how many chunks were made
print(len(chunks))

In [None]:
# inspect a single chunk
chunks[0]

In [None]:
# view first 3 chunks
for chunk_index, chunk in enumerate(chunks[:3]):
    print(f'****** Chunk {chunk_index} ******\n{chunk.page_content}\n')

### Example: Loading website data and splitting

In [None]:
from bs4 import SoupStrainer
from langchain_community.document_loaders import WebBaseLoader

In [None]:
constitution_website = "https://constitutioncenter.org/the-constitution/full-text"

# load using WebBaseLoader
web_loader = WebBaseLoader(constitution_website,
                       bs_kwargs = {'parse_only':SoupStrainer(['article'])})

# read the document from the website (without splitting)
web_document = #load document

In [None]:
# only the first few characters
print(web_document[0].page_content[:330])

Now, we'll split in a slightly different way. Since we've already scraped the website, we will just directly use the splitter. Note that after we load the website, we have a data type of (list of) `Document`.

In [None]:
website_splitter = RecursiveCharacterTextSplitter(chunk_size=330, chunk_overlap=100, # add ability to use start_index
website_chunks = website_splitter.split_documents(web_document)
len(website_chunks)

In [None]:
website_chunks[:3]

If you know less about the constitution and more about Star wars (or another topic available on Wikipedia), feel free to run the cells below to use that text moving forward. It will replace the `website_chunks` variable. You may need to adjust the `chunk_size` and `chunk_overlap` options. Uncomment and run these cells.

In [None]:
# alternate data
webloader = WebBaseLoader('https://simple.wikipedia.org/wiki/Star_Wars_Episode_IV:_A_New_Hope',
                       bs_kwargs = {'parse_only':SoupStrainer('div', id='bodyContent')})
web_chunks = webloader.load_and_split(RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100, add_start_index=True))
print('Number of chunks generated: ', len(web_chunks))
print('\n\nSample: ')
web_chunks[:5]

## Vector Stores: A way to store embeddings (hidden states) of your data
The choice of vector store influences how "relevant" documents can be identified, speed of document retrieval, and organization.

Helpful resources:
* **[Brief Langchain Reference](https://python.langchain.com/v0.1/docs/modules/data_connection/vectorstores/)**
* **[Vector Store Integrations](https://python.langchain.com/v0.1/docs/integrations/vectorstores/)**

In [None]:
from langchain_community.vectorstores import Chroma
from langchain_openai import ChatOpenAI, OpenAIEmbeddings

In [None]:
# create the vector store
db = # code to create store

### Similarity Search for Documents

In [None]:
# query the vector store
query = 'When was a new hope released?'

# use a similarity search between the vectors


In [None]:
# get cosine distance alongside results
relevant_docs =
relevant_docs

In [None]:
# another query, but instead use normalized score
query = 'What is the plot of A New Hope?'
relevant_docs = # use normalized score
relevant_docs

## Retrievers: How we select the most relevant data

In [None]:
# or use the db as a retriever with lcel
retriever = # create retriever
retrieved_docs = retriever.invoke(query)
retrieved_docs

## RAG
For when we want to actually do generation, but want there to be retrieved documents included in the generation. For this, we're going to switch to a different embedding model which will be downloaded on our machine (or if on Google Colab, there).

In [None]:
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import FAISS

from langchain_core.runnables import RunnableLambda, RunnablePassthrough, RunnableSequence, RunnableParallel

In [None]:
# use different embedding model
embeddings_fn = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2") #, model_kwargs={"device":'mps'})
hf_db = FAISS.from_documents(web_chunks, embeddings_fn)
hf_retriever = hf_db.as_retriever(search_kwargs={"k":1})

# make sure it works
query = 'What is the plot of A New Hope?'
hf_retriever.invoke(query)

### Default RAG: Question Answering

In [None]:
# Basic question answering template
template = """Answer the question based only on the following context:
{context}

Question: {question}
"""

# compose prompt
rag_prompt = ChatPromptTemplate.from_template(template)

# create model (so we don't have to depend on the model definition at the top of the notebook)
model = ChatOpenAI(model_name='gpt-3.5-turbo')

In [None]:
# We need to format the retrieved documents better
def format_docs(docs):
    return "\n\n".join([f'Reference text:\n{doc.page_content}\n\Citation Info: {doc.metadata}' for doc in docs])

In [None]:
# inspect behavior of format_docs
format_docs(web_chunks[:3])

In [None]:
# compose the chain
rag_chain = (
    ## Add special rag part
    | rag_prompt
    | model
    | StrOutputParser()
)

In [None]:
# run the chain
rag_chain.with_config(run_name = 'basic_rag_chain').invoke('What is the plot of a new hope')

### RAG with Sources
Resource: [Langchain: Returning Sources](https://python.langchain.com/v0.1/docs/use_cases/question_answering/sources/)

In [None]:
from langchain_core.runnables import RunnableParallel

In [None]:
# Basic prompt -> model -> parser chain
single_turn_chain = (
    rag_prompt
    | model
    | StrOutputParser()
)

# Break previous chain in half to access context and question in returned response
rag_chain_with_source = RunnableParallel(
    {"context": hf_retriever | format_docs, "question": RunnablePassthrough()}
).assign(answer=single_turn_chain)

In [None]:
# invoke
response = rag_chain_with_source.with_config(run_name = 'sources_rag_chain').invoke("What happened to Princess Leia in a New Hope?")

# print full response
for key, value in response.items():
    print(f"{key}: {value}\n")

### RAG with Chat History?

We will have a one-turn system with our RAG system. How do we add chat memory? See below for implementation guides:
- [Use cases: Q&A with Rag: Add Chat History.](https://python.langchain.com/v0.1/docs/use_cases/question_answering/chat_history/)  Builds on a RAG system, so will be of interest.

## LLM System Metrics
Resource: [Guides -> Evaluation](https://python.langchain.com/v0.1/docs/guides/productionization/evaluation/)

In [None]:
from langchain.evaluation import load_evaluator

In [None]:
# configure what we want to evaluate
rs_question = "What happened to Princess Leia in a New Hope?"
rs_answer = rag_chain.with_config(run_name = 'basic_rag_chain').invoke(rs_question)

In [None]:
# load an evaluator that uses the conciseness criteria
evaluator = #load evaluator

# evaluate whether our model was concise or not
eval_result = evaluator.evaluate_strings(
    # add inputs to evaluate
)

# print result
eval_result

View other criteria available through LangChain

In [None]:
from langchain.evaluation import Criteria
list(Criteria)

# Homework
The following exercises are designed to help you gain depth in what you've learned about RAG today.

## [Required] Learning more about RAG
### Splitting Text (Conceptual)
There are so many ways to split the text, and each has an impact on the resultant RAG system. Below is a resource (with sidebar dropdown) for you to read over and then answer the following question for the text splitting approaches (as relevant to your application):
* What is the proposed value in adopting this text splitting approach? What are some drawbacks?

### Splitting Text (Programmatic)
Above, we have adopted specific chunk sizes and splitting approaches. Choose one of the documents (or use your own) and:
* Modify the chunk size. How does this impact the resulting RAG performance? The cost?
* Implement a different type of text splitter (as applicable, i.e., not code text splitters if you're not splitting code). How does this impact the resulting RAG performance? The cost?

### Customizing RAG
There are many, many, many ways to improve results with RAG. Below are some resources for you to read over then complete the following:
1. What is the proposed value in adopting this approach? In other words, what is the expected performance improvement by using this method?
2. How might it apply to your work?

* [**Query Analysis**](https://python.langchain.com/v0.1/docs/use_cases/query_analysis/). Make sure to peruse subtopics.
* [**Synthetic Data Generation**](https://python.langchain.com/v0.1/docs/use_cases/data_generation/).
* [**Tagging**](https://python.langchain.com/v0.1/docs/use_cases/tagging/).
* [**Routing Chain Logic Based on inputs****](https://python.langchain.com/v0.1/docs/expression_language/how_to/routing/).
* [**Chain Composition**](https://python.langchain.com/v0.1/docs/modules/chains/). Of particular interest here are the Legacy chains. Although they will probably be completely removed in the future, consider their behavior. In what cases might these behaviors be useful?

** This is highly recommended reading, but may not be suitable for those who are novices in programming. Although there is text, the code demonstrates concretely by the text. For novices, it may be better to copy/paste the code as well to understand the behavior, although it is noted that such a task may be outside of the the time constraints of for some participants.

## [Required] Learning more about Evaluation
Read the following text and answer these questions:
1. What is the purpose of the individual criterion? Does it require and external LLM for evaluation?
2. In what cases might this criteria be useful?

Depth Text: [**Evaluation, by Langchain**](https://python.langchain.com/v0.1/docs/guides/productionization/evaluation/)

## [Highly recommended] Learning more about the LLM System Lifecycle
There is more to an LLM-based system than a user interface and the LLM chain. There is a whole framework around inspecting, testing, and evaluating these systems. Read the following and answer the questions below:
1. Summarize the purpose of the individual components of the langsmith system (they generalize to all LLM systems).
2. Consider your favorite LLM UI (i.e., ChatGPT, Gemini, Claude, etc). Describe how you think these components are utilized the LLM system.

Depth Text: [**LangSmith User Guide**](https://docs.smith.langchain.com/old/user_guide)

## [Recommended] Practicing with RAG and Langchain
### Exercise 1: Modify the RAG system
Modify or create a new chain which:
1. Uses a different LLM than the one used in this notebook.
2. Uses a different document loader
3. Uses a different splitter than the one used in this notebook.
4. Uses a different vector store/retriever than the one used in this notebook.

Use the resources provided in the relevant sections of the notebook for other options.

### Exercise 2: Implement a new RAG system
1. [More challenging] Add chat history to one of your RAG chains. Make sure to enable tracing and inspect langsmith to ensure that the chat history is used.
2. Create a gradio user interface to use your chain in a more user-friendly way.
3. [Challenging] Implement an additional chain which uses one of the strategies you read about in the "Learning more about RAG" section.




