# Lab 4: Build a RAG Application with LangChain, Part 3

In this lab we continue to build on [Lab2](./2_rag.ipynb) and [Lab 3](./3_rag-with-chunking.ipynb)

Learning Objectives

* Learn how to use Azure AI Search for a Vector Store
* Learn how to add citations to the response

### Step 1: Setup what we covered in Lab 2 and Lab 3

Run the following to get ready for this lesson:

In [1]:
import os
from dotenv import load_dotenv
from langchain_openai import AzureChatOpenAI
from langchain_openai.embeddings import AzureOpenAIEmbeddings
from langchain_core.output_parsers import StrOutputParser
from langchain.prompts import ChatPromptTemplate
from langchain_community.document_loaders import DataFrameLoader
from langchain_community.vectorstores import DocArrayInMemorySearch
from langchain_core.runnables import RunnableParallel, RunnablePassthrough
from langchain.text_splitter import RecursiveCharacterTextSplitter
import pandas as pd

load_dotenv()

llm = AzureChatOpenAI(
  openai_api_version="2023-05-15",
  azure_deployment= os.getenv("AZURE_OPENAI_MODEL_DEPLOYMENT_NAME")
)

embeddings = AzureOpenAIEmbeddings()

parser = StrOutputParser()

prompt_template = ChatPromptTemplate.from_messages(
    [
        ("system", """You are a helpful assistant that is very brief but polite in your answers. Answer questions in less than 50 words.
            Answer the question based on the context below. If you can't 
            answer the question, reply "I don't know".

            Context: {context}
         """),
        ("human", "{question}")
    ],
)

DATASET_NAME = "./prep/output/master.json"
transcripts_dataset = pd.read_json(DATASET_NAME)

loader = DataFrameLoader(transcripts_dataset, page_content_column="text")
transcripts = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
documents = text_splitter.split_documents(transcripts)

See how many documents we are working with in this lab:

In [2]:
len(documents)

723

### Step 2: Use Azure AI Search as the vector store

**Please change the index name to include your initials or name so it won't have duplicate entries from another user in it**

Run the below to configure the client used to interact with Azure Search

In [5]:
!pip3 install azure-storage-blob azure-identity

Collecting azure-storage-blob
  Using cached azure_storage_blob-12.19.1-py3-none-any.whl.metadata (26 kB)
Collecting azure-identity
  Using cached azure_identity-1.16.0-py3-none-any.whl.metadata (76 kB)
Collecting azure-core<2.0.0,>=1.28.0 (from azure-storage-blob)
  Using cached azure_core-1.30.1-py3-none-any.whl.metadata (37 kB)
Collecting cryptography>=2.1.4 (from azure-storage-blob)
  Using cached cryptography-42.0.5-cp39-abi3-macosx_10_12_universal2.whl.metadata (5.3 kB)
Collecting isodate>=0.6.1 (from azure-storage-blob)
  Using cached isodate-0.6.1-py2.py3-none-any.whl.metadata (9.6 kB)
Collecting msal>=1.24.0 (from azure-identity)
  Using cached msal-1.28.0-py3-none-any.whl.metadata (11 kB)
Collecting msal-extensions>=0.3.0 (from azure-identity)
  Using cached msal_extensions-1.1.0-py3-none-any.whl.metadata (7.7 kB)
Collecting cffi>=1.12 (from cryptography>=2.1.4->azure-storage-blob)
  Using cached cffi-1.16.0-cp312-cp312-macosx_11_0_arm64.whl.metadata (1.5 kB)
Collecting PyJWT

In [7]:
!pip3 install azure-search-documents

Collecting azure-search-documents
  Using cached azure_search_documents-11.4.0-py3-none-any.whl.metadata (22 kB)
Collecting azure-common~=1.1 (from azure-search-documents)
  Using cached azure_common-1.1.28-py2.py3-none-any.whl.metadata (5.0 kB)
Using cached azure_search_documents-11.4.0-py3-none-any.whl (283 kB)
Using cached azure_common-1.1.28-py2.py3-none-any.whl (14 kB)
Installing collected packages: azure-common, azure-search-documents
Successfully installed azure-common-1.1.28 azure-search-documents-11.4.0


In [9]:
from langchain.vectorstores.azuresearch import AzureSearch

vectorstore_address = os.getenv("AZURE_SEARCH_ENDPOINT")
vectorstore_password = os.getenv("AZURE_SEARCH_KEY")

# Add your name to the index name to avoid conflicts with other users
index_name = "bos-gab-index-rakesh"
vectorstore = AzureSearch(
    azure_search_endpoint=vectorstore_address,
    azure_search_key=vectorstore_password,
    index_name=index_name,
    embedding_function=embeddings.embed_query,
)

Now let's add the documents loaded in Step 1 into the AzureSearch vector store.

> NOTE: This will take a few minutes depending on the network speed.

In [10]:
vectorstore.add_documents(documents=documents)

['YjQzOTNiMjMtMDlmYi00MGI0LWEzMDktZmRkN2I0NGM3NjI0',
 'NTYyOWFjZTktMGI1MC00OTIzLTg1OWYtMzI0OWI2YzY0MTg3',
 'NThmNjZhMzAtMDgxMS00ZGQ2LTkyYWMtNzJmMzZkZTk3MDFm',
 'MjE2YWFmZDgtY2I0NS00YjQzLTg3ZDUtMDU5ZTg0Yzg3MTBh',
 'YmFhNTAwZjktMTQwYS00NjJmLTk1OWUtNmFjNzY0NWFmMDRi',
 'YjA5ZDAyYmYtNmI4Yy00Y2UxLWE2ZTUtMTZhZmYwZGMyMDU2',
 'OTk2MGQyNmUtNzM4MC00M2ViLTlhNmYtY2FkOTdiNDZlOGY2',
 'N2IyNGIyYmMtNmExNi00YmQ4LTgxYTYtOWI4Mzk0ODZlMjVm',
 'M2I4YWEwZWEtYWM2OS00YzAwLWE4NWMtMWQ4MWNlY2Y5NjBi',
 'ZTFlMjRkODItODUxMS00MzhkLTgzMmQtZGM4MDNlYWYwZDJl',
 'YmVlNTVjMzctMDA5Yy00NzNmLTg4MTUtMTE0NTFkODJiNTJl',
 'MTc2NWQ4ZTEtMjU3NC00MThlLWE2OTgtZDk5ZjZhMzcyMzBj',
 'NGY0YjUyNzktYmE0Yy00M2MwLTlkNjQtNDAwOWIwOWU2ZTk4',
 'ZmU4MTMwM2EtZjgxYi00NmM1LTk1MzAtYTA0ZGVhOGUzMzI0',
 'ZmQ1MWM3ZWMtMDE4MC00MTAxLWE2ZDctNjQ2MWM5ZThmYjc1',
 'MjM3MmE0OTQtOGRkMi00NzI5LThjOTItNjU4ODM5ZDAxMWUz',
 'N2RhNmZlMTAtNjc2NS00MjZhLWJlNDUtYjFjN2Y0MTZiN2Nk',
 'YmYzOTI4MDQtYmZlMS00Zjk3LWFhMjAtYzM5ZmIzYTdkYjIw',
 'NTY0MmMxYjMtOGY3Yi00Mzk3LTg0ZTgtNzEzNDMxY2Ri

Now there are documents to use, try a similarity search to verify things are working:

In [11]:
docs = vectorstore.similarity_search(
    query="What is langchain?",
    k=3,
    search_type="similarity",
)
docs

[Document(page_content="show that so you know there's the general architecture of the code and uh now our code is using you know python for the back end and it's just using the open AI SDK with the Azure you know document search SDK it's not using any fancy llm orchestration libraries and people always ask like why aren't you using Lang chain why aren't you using this why aren't you using that well part of the reason I'm not using any of those is because everybody has a different favorite library for Python and I'd have to pick one of them and I and then that would like alienate other people right so there are many popular rag orchestration libraries and you could totally use those as well you know whichever one works for you uh so the most popular ones in the open source world are Lang chain and llama index and both of them are available in multiple languages and then from Microsoft we've got Mantic kernel which is probably the most similar to linkchain and which is that's in Python a

Ok, it seems to be working. Now let's move toward the functionality we testing in Lab 3.

Create a retriever to use later:

In [12]:
retriever = vectorstore.as_retriever()

Test to see if it is working:

In [13]:
unique_docs = retriever.get_relevant_documents(query="What is langchain?")
unique_docs

[Document(page_content="at the code it will make a lot of sense so what is golang or i should say go not golang so it's an open source programming language it was it was found in 2009 i think i should know but i believe it's 2009 and let me give you guys a full screen version because you guys have been seeing the non-full screen version um so it was built by the the founders of the the c language and b language as well and the founders of of the language want this to be a super simple uh you know no nonsense slim language that just works because they were working a lot with the existing languages for the back end and they realized that they needed something better so it's funded by it's backed by google the team that built it and is working on it is is in google is google team as well so what's the idea behind it well it's a journal per general purpose language so it can be used to build all sorts of applications it's it is compiled into your native code so when we compile it on window

Now just like at the end of Lab 3, try out it out - just remember the vector store is not running in-memory this time, but in Azure.

In [14]:
chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt_template
    | llm
    | parser
)

chain.invoke("What is LangChain?")

"LangChain is a popular orchestration library used for generative AI applications, including RAG (Retrieval-Augmented-Generation). It's available in multiple languages and is commonly used in the open-source world."

### Step 3: Add citations for identifying the source

One of the important parts of a RAG application is to include citations - this helps verify the LLM is not making something up.

> NOTE: Citations do not guarantee the LLM didn't make up the response, but it does help to verify to help you with a confidence level.

In order to have the response include citations we need to do a couple of things:
1. Modify the document listing used as the context to include the title of the video
1. Modify the prompt with some instructions for how to use the title and how we want it returned

The first thing we need is a utility function we'll use to modify the string returned from the retriever so the format will look like: 
```code
"<video title>:<transcript text>"
```

Run the following to declare the utility function:

In [15]:
def format_docs(docs):
    return "\n\n".join([f"{d.metadata['title']}:{d.page_content}" for d in docs])

Next, we need to modify the prompt template to let the LLM know how the context is formatted and how to use the title and content:

In [17]:
prompt_template_with_citations = ChatPromptTemplate.from_messages(
    [
        ("system", """Assistant helps people with their questions about the content of video transcripts. Be brief in your answers.
        Answer ONLY with the facts listed in the list of sources below. If there isn't enough information below, say you don't know. 
        Do not generate answers that don't use the sources below. If asking a clarifying question to the user would help, ask the question.
        Each source has a title followed by colon and the actual information, always include the source title for each fact you use in the response. 
         Use square brackets to reference the source, for example [Video title here]. Don't combine sources, list each source separately, for example [Video 1][Video 2].
            Context: {context}
         """),
        ("human", "{question}")
    ],
)

chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt_template_with_citations
    | llm
    | parser
)

Now give it a try:

In [18]:
chain.invoke("What is langchain?")

'[Building a RAG app to chat with your data] Langchain is an open-source rag orchestration library available in multiple languages, including Python. It is one of the popular libraries for building RAG applications.'

In [19]:
chain.invoke("What is AKS?")

'Azure Kubernetes Service (AKS) is a managed Kubernetes service offered by Microsoft that simplifies the deployment of a managed Kubernetes cluster in Azure by offloading the operational overhead to Azure. It allows users to manage their AKS environment in Azure and run Kubernetes workloads, including quick demos. Vaibhav Gujral covers AKS in a deep dive session in the recording. [Running Kubernetes in Azure using AKS]'

### References

Here are some good resources to learn more about using Azure AI Search with LangChain:
* [Azure Cognitive Search and LangChain: A Seamless Integration for Enhanced Vector Search Capabilities](https://techcommunity.microsoft.com/t5/ai-azure-ai-services-blog/azure-cognitive-search-and-langchain-a-seamless-integration-for/ba-p/3901448)
* LangChain Docs [Azure AI Search](https://python.langchain.com/docs/integrations/vectorstores/azuresearch/)
* [Azure AI Search client library for Python - version 11.4.0](https://learn.microsoft.com/en-us/python/api/overview/azure/search-documents-readme?view=azure-python)

## [Go To Next Lab](./5_rag-final.ipynb.ipynb)