# Retrieval Augmented Generation with LangChain

In [25]:
import sys
import os

# Use current working directory and go one level up
parent_dir = os.path.abspath(os.path.join(os.getcwd(), '..'))
sys.path.append(parent_dir)

# Now you can import your config
from config import api_key

from openai import OpenAI

## Chapter 1 - Building RAG applications with LangChain

### Section 1.1 - Loading documents for RAG with LangChain

#### Loading PDF files for RAG
To begin implementing Retrieval Augmented Generation (RAG), you'll first need to load the documents that the model will access. These documents can come from a variety of sources, and LangChain supports document loaders for many of them.

In this exercise, you'll use a document loader to load a PDF document containing the paper, Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks by Lewis et al. (2021). This file is available for you as `'rag_paper.pdf'`.

Note: `pypdf`, a dependency for loading PDF documents in LangChain, has already been installed for you.

In [5]:
# Import library
from langchain_community.document_loaders import PyPDFLoader

# Create a document loader for rag_paper.pdf
loader = PyPDFLoader('./data/rag-paper.pdf')

# Load the document
data = loader.load()
print(data[0])

page_content='Retrieval-Augmented Generation for
Knowledge-Intensive NLP Tasks
Patrick Lewis†‡, Ethan Perez⋆,
Aleksandra Piktus†, Fabio Petroni†, Vladimir Karpukhin†, Naman Goyal†, Heinrich Küttler†,
Mike Lewis†, Wen-tau Yih†, Tim Rocktäschel†‡, Sebastian Riedel†‡, Douwe Kiela†
†Facebook AI Research; ‡University College London; ⋆New York University;
plewis@fb.com
Abstract
Large pre-trained language models have been shown to store factual knowledge
in their parameters, and achieve state-of-the-art results when ﬁne-tuned on down-
stream NLP tasks. However, their ability to access and precisely manipulate knowl-
edge is still limited, and hence on knowledge-intensive tasks, their performance
lags behind task-speciﬁc architectures. Additionally, providing provenance for their
decisions and updating their world knowledge remain open research problems. Pre-
trained models with a differentiable access mechanism to explicit non-parametric
memory have so far been only investigated for extractiv

#### Loading HTML files for RAG

It's possible to load documents from many different formats, including complex formats like HTML.

If you're not familiar with HTML, it's a markup language for creating web pages. Here's a small example:

In this exercise, you'll load an HTML file taken containing a DataCamp blog post webpage. The necessary classes have already been imported for you.

In [19]:
# Import library
from langchain_community.document_loaders import UnstructuredHTMLLoader

# Create a document loader for unstructured HTML
loader = UnstructuredHTMLLoader('./data/datacamp-blog.html')

# Print the first document's content
print(data[0].page_content)

# Print the first document's metadata
print(data[0].metadata)

Skip to main content

HomeBlogPython

How to Learn Python From Scratch in 2024: An Expert Guide

Discover how to learn Python, its applications, and the demand for Python skills. Start your Python journey today ​​with our comprehensive guide.

Updated Jul 2024 · 19 min read

Share

As one of the most popular programming languages out there, many people want to learn Python. But how do you go about getting started? In this guide, we explore everything you need to know to begin your learning journey, including a step-by-step guide and learning plan and some of the most useful resources to help you succeed.

What is Python?

Python is a high-level, interpreted programming language created by Guido van Rossum and first released in 1991. It is designed with an emphasis on code readability, and its syntax allows programmers to express concepts in fewer lines of code than would be possible in languages such as C++ or Java.

Python supports multiple programming paradigms, including procedural,

### Section 1.2 - Text splitting, embedding and vector storage

In [21]:
from langchain_text_splitters import CharacterTextSplitter, RecursiveCharacterTextSplitter

#### Getting started with text splitting

Time to start splitting! You've been provided with a statement about RAG stored in the string variable `text`. Your job is to split this string on occurrences of the `'.'` character. Take a look at the splitting results to see how this strategy performed.

In [22]:
text = '''RAG (retrieval augmented generation) is an advanced NLP model that combines retrieval mechanisms with generative capabilities. RAG aims to improve the accuracy and relevance of its outputs by grounding responses in precise, contextually appropriate data.'''

# Define a text splitter that splits on the '.' character
text_splitter = CharacterTextSplitter(
    separator=".",
    chunk_size=75,  
    chunk_overlap=10  
)

# Split the text using text_splitter
chunks = text_splitter.split_text(text)
print(chunks)
print([len(chunk) for chunk in chunks])

Created a chunk of size 125, which is longer than the specified 75


['RAG (retrieval augmented generation) is an advanced NLP model that combines retrieval mechanisms with generative capabilities', 'RAG aims to improve the accuracy and relevance of its outputs by grounding responses in precise, contextually appropriate data']
[125, 126]


#### Recursively splitting documents

Splitting on a single character is simple and predictable, but it often produces sub-optimal chunks. In this exercise, you'll apply recursive character splitting to split the Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks paper you loaded in a earlier exercise.

Recall that recursive character splitting iterates over a list of characters, splitting on each in turn to see if chunks can be created beneath the `chunk_size` limit.


In [45]:
loader = PyPDFLoader("./data/rag-paper.pdf")
document = loader.load()

# Define a text splitter that splits recursively through the character list
text_splitter = RecursiveCharacterTextSplitter(
    separators=['\n', '.', ' ', ''],
    chunk_size=1000,  
    chunk_overlap=100  
)

# Split the document using text_splitter
chunks = text_splitter.split_documents(document)
print(chunks)
print([len(chunk.page_content) for chunk in chunks])
print(len(chunks))

[Document(metadata={'producer': 'pdfTeX-1.40.21', 'creator': 'LaTeX with hyperref', 'creationdate': '2021-04-13T00:48:38+00:00', 'author': '', 'keywords': '', 'moddate': '2021-04-13T00:48:38+00:00', 'ptex.fullbanner': 'This is pdfTeX, Version 3.14159265-2.6-1.40.21 (TeX Live 2020) kpathsea version 6.3.2', 'subject': '', 'title': '', 'trapped': '/False', 'source': './data/rag-paper.pdf', 'total_pages': 19, 'page': 0, 'page_label': '1'}, page_content='Retrieval-Augmented Generation for\nKnowledge-Intensive NLP Tasks\nPatrick Lewis†‡, Ethan Perez⋆,\nAleksandra Piktus†, Fabio Petroni†, Vladimir Karpukhin†, Naman Goyal†, Heinrich Küttler†,\nMike Lewis†, Wen-tau Yih†, Tim Rocktäschel†‡, Sebastian Riedel†‡, Douwe Kiela†\n†Facebook AI Research; ‡University College London; ⋆New York University;\nplewis@fb.com\nAbstract\nLarge pre-trained language models have been shown to store factual knowledge\nin their parameters, and achieve state-of-the-art results when ﬁne-tuned on down-\nstream NLP tasks

In [57]:
# vector_store.search("What is BART?", search_type="similarity")

In [63]:
vector_store.get(limit=1)['documents']

['Retrieval-Augmented Generation for\nKnowledge-Intensive NLP Tasks\nPatrick Lewis†‡, Ethan Perez⋆,\nAleksandra Piktus†, Fabio Petroni†, Vladimir Karpukhin†, Naman Goyal†, Heinrich Küttler†,\nMike Lewis†, Wen-tau Yih†, Tim Rocktäschel†‡, Sebastian Riedel†‡, Douwe Kiela†\n†Facebook AI Research; ‡University College London; ⋆New York University;\nplewis@fb.com\nAbstract\nLarge pre-trained language models have been shown to store factual knowledge\nin their parameters, and achieve state-of-the-art results when ﬁne-tuned on down-\nstream NLP tasks. However, their ability to access and precisely manipulate knowl-\nedge is still limited, and hence on knowledge-intensive tasks, their performance\nlags behind task-speciﬁc architectures. Additionally, providing provenance for their\ndecisions and updating their world knowledge remain open research problems. Pre-\ntrained models with a differentiable access mechanism to explicit non-parametric']

#### Embedding and storing documents
The final step for preparing the documents for retrieval is embedding and storing them. You'll be using the text-embedding-3-small model from OpenAI for embedding the chunked documents, and storing them in a local Chroma vector database.

The `chunks` you created from splitting the Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks paper recursively have been pre-loaded.

Creating and using an OpenAI API key is not required in this exercise. You can leave the `<OPENAI_API_TOKEN>` placeholder, which will send valid requests to the OpenAI API.

In [48]:
from langchain_openai import OpenAIEmbeddings
from langchain_chroma import Chroma

# Initialize the OpenAI embedding model
embedding_model = OpenAIEmbeddings(
    api_key=api_key, 
    model='text-embedding-3-small')

# Create a Chroma vector store and embed the chunks
vector_store = Chroma.from_documents(
    documents=chunks,
    embedding=embedding_model,
    persist_directory="./chromadb/"
)

### Section 1.3 - Building an LCEL retrieval chain

#### Creating the retrieval prompt

A key piece of any RAG implementation is the retrieval prompt. In this exercise, you'll create a chat prompt template for your retrieval chain and test that the LLM is able to respond using only the context provided.

An `llm` has already been defined for you to use.

In [49]:
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate

llm = ChatOpenAI(model="gpt-4o-mini", api_key=api_key, temperature=0)

In [50]:
prompt = """
Use the only the context provided to answer the following question. If you don't know the answer, reply that you are unsure.
Context: {context}
Question: {question}
"""

# Convert the string into a chat prompt template
prompt_template = ChatPromptTemplate.from_template(prompt)

# Create an LCEL chain to test the prompt
chain = prompt_template | llm

# Invoke the chain on the inputs provided
print(chain.invoke({"context": "DataCamp's RAG course was created by Meri Nova and James Chapman!", "question": "Who created DataCamp's RAG course?"}))

content="DataCamp's RAG course was created by Meri Nova and James Chapman." additional_kwargs={'refusal': None} response_metadata={'token_usage': {'completion_tokens': 17, 'prompt_tokens': 62, 'total_tokens': 79, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_f7d56a8a2c', 'id': 'chatcmpl-BOniahrG6RGhAckGmFmVQTyGkborm', 'finish_reason': 'stop', 'logprobs': None} id='run-e2e887e5-364c-4bfb-8161-fe9e6e5e3a89-0' usage_metadata={'input_tokens': 62, 'output_tokens': 17, 'total_tokens': 79, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}}


#### Building the retrieval chain
Now for the finale of the chapter! You'll create a retrieval chain using LangChain's Expression Language (LCEL). This will combine the vector store containing your embedded document chunks from the RAG paper you loaded earlier, a prompt template, and an LLM so you can begin talking to your documents.

Here's a reminder of the prompt_template you created in the previous exercise, and which is available for you to use:

The vector_store of embedded document chunks that you created previously has also been loaded for you, along with all of the libraries and classes required.

In [69]:
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser

# Convert the vector store into a retriever
retriever = vector_store.as_retriever(search_type="similarity", search_kwargs={"k":2})

# Create the LCEL retrieval chain
chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt_template
    | llm
    | StrOutputParser()
)

# Invoke the chain
#print(chain.invoke("Who are the authors?"))
#print(chain.invoke("What is BART?"))
print(chain.invoke("What is the broader impact"))

The broader impact of the work discussed includes several positive societal benefits, such as being more strongly grounded in factual knowledge (specifically from Wikipedia), which reduces the likelihood of generating false information ("hallucinations") and offers greater control and interpretability. It can be applied in various scenarios that directly benefit society, such as in medical contexts or enhancing job effectiveness. However, there are potential downsides, including the inherent biases and inaccuracies of Wikipedia and the risk of generating misleading or abusive content, similar to concerns associated with other language models like GPT-2.


## Chapter 2 - Improving the RAG Architecture

In [67]:
from langchain_community.document_loaders import UnstructuredMarkdownLoader

#### Loading code files
Chatbots can not only access text files, but also code files like Python `(.py)` and Markdown files `(.md)`. In this exercise, you'll load a Python file containing the RAG architecture you created in Chapter 1. Let's load the file to get a reminder!

All of the classes needed to complete this exercise are already loaded.

In [77]:
# Create a document loader for README.md and load it
loader = UnstructuredMarkdownLoader('./data/README.md')

markdown_data = loader.load()
print(markdown_data[0])

page_content='🦜️🔗 LangChain

⚡ Build context-aware reasoning applications ⚡

Release Notes

CI

PyPI - License

PyPI - Downloads

GitHub star chart

Open Issues

Open in Dev Containers

Open in GitHub Codespaces

Twitter

Looking for the JS/TS library? Check out LangChain.js.

To help you ship LangChain apps to production faster, check out LangSmith. LangSmith is a unified developer platform for building, testing, and monitoring LLM applications. Fill out this form to speak with our sales team.

Quick Install

With pip: bash pip install langchain

With conda: bash conda install langchain -c conda-forge

🤔 What is LangChain?

LangChain is a framework for developing applications powered by large language models (LLMs).

For these applications, LangChain simplifies the entire application lifecycle:

Open-source libraries: Build your applications using LangChain's open-source building blocks, components, and third-party integrations. Use LangGraph to build stateful agents with first-class 

In [71]:
from langchain_community.document_loaders import PythonLoader

In [72]:
# Create a document loader for rag.py and load it
loader = PythonLoader('rag.py')

python_data = loader.load()
print(python_data[0])

page_content='__import__('pysqlite3')
import sys
sys.modules['sqlite3'] = sys.modules.pop('pysqlite3')
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_chroma import Chroma
from langchain_openai import ChatOpenAI
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_community.document_loaders import PyPDFLoader
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
import shutil
import getpass
import os

if not os.environ.get("OPENAI_API_KEY"):
    os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API key: ")

openai_api_key = os.environ["OPENAI_API_KEY"]

loader = PyPDFLoader("rag_paper.pdf")
documents = loader.load()
# Split the documents into manageable chunks
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
split_documents = text_splitter.split_documents(documents)

# In

#### Splitting Python files

Although text and code files contain the same characters, code files contain structures beyond natural language. To retain this code-specific context during document splitting, you should program the splitter to first try to split on the most common code structure. Fortunately, LangChain provides functionality to do just that!

All of the necessary classes have been imported for you, including `Language` from `langchain_text_splitters`.

In [75]:
from langchain_text_splitters import RecursiveCharacterTextSplitter, Language

# Create a Python-aware recursive character splitter
python_splitter = RecursiveCharacterTextSplitter.from_language(
    language=Language.PYTHON, chunk_size=300, chunk_overlap=100
)

# Split the Python content into chunks
chunks = python_splitter.split_documents(python_data)

for i, chunk in enumerate(chunks[:3]):
    print(f"Chunk {i+1}:\n{chunk.page_content}\n")

Chunk 1:
__import__('pysqlite3')
import sys
sys.modules['sqlite3'] = sys.modules.pop('pysqlite3')
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_chroma import Chroma
from langchain_openai import ChatOpenAI
from langchain_huggingface import HuggingFaceEmbeddings

Chunk 2:
from langchain_openai import ChatOpenAI
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_community.document_loaders import PyPDFLoader
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough

Chunk 3:
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
import shutil
import getpass
import os



### Section 2.2 - Advanced splitting method

#### Splitting by tokens

Splitting documents using RecursiveCharacterTextSplitter or CharacterTextSplitter is convenient, and can give you good performance in some cases, but it does have one drawback: they split using characters as base units, rather than tokens, which are processed by the model.

In this exercise, you'll split documents using a token text splitter, so you can verify the number of tokens in each chunk to ensure that they don't exceed the model's context window. A PDF document has been loaded as `document`.

`tiktoken` and all necessary classes have been imported for you.

In [81]:
import tiktoken
from langchain_text_splitters import TokenTextSplitter
from langchain_community.document_loaders import PyPDFLoader

# Create a document loader for rag_paper.pdf
loader = PyPDFLoader('./data/rag-paper.pdf')

# Load the document
data = loader.load()
#print(data[0])

# Get the encoding for gpt-4o-mini
encoding = tiktoken.encoding_for_model('gpt-4o-mini')

# Create a token text splitter
token_splitter = TokenTextSplitter(encoding_name=encoding.name, chunk_size=100, chunk_overlap=10)

# Split the PDF into chunks
chunks = token_splitter.split_documents(document)

for i, chunk in enumerate(chunks[:3]):
    print(f"Chunk {i+1}:\nNo. tokens: {len(encoding.encode(chunk.page_content))}\n{chunk}\n")

Chunk 1:
No. tokens: 100
page_content='Retrieval-Augmented Generation for
Knowledge-Intensive NLP Tasks
Patrick Lewis†‡, Ethan Perez⋆,
Aleksandra Piktus†, Fabio Petroni†, Vladimir Karpukhin†, Naman Goyal†, Heinrich Küttler†,
Mike Lewis†, Wen-tau Yih†, Tim Rocktäschel†‡, Sebastian Riedel†‡, Douwe Kiela†
†Facebook AI Research; ‡University College London;' metadata={'producer': 'pdfTeX-1.40.21', 'creator': 'LaTeX with hyperref', 'creationdate': '2021-04-13T00:48:38+00:00', 'author': '', 'keywords': '', 'moddate': '2021-04-13T00:48:38+00:00', 'ptex.fullbanner': 'This is pdfTeX, Version 3.14159265-2.6-1.40.21 (TeX Live 2020) kpathsea version 6.3.2', 'subject': '', 'title': '', 'trapped': '/False', 'source': './data/rag-paper.pdf', 'total_pages': 19, 'page': 0, 'page_label': '1'}

Chunk 2:
No. tokens: 100
page_content='Facebook AI Research; ‡University College London; ⋆New York University;
plewis@fb.com
Abstract
Large pre-trained language models have been shown to store factual knowledge
in 

#### Splitting semantically

All of the splitting strategies you've used up to this point have the same drawback: the split doesn't consider the context of the surrounding text, so context can easily be lost during splitting.

In this exercise, you'll create and apply a semantic text splitter, which is a cutting-edge experimental method for splitting text based on semantic meaning. When the splitter detects that the meaning of the text has deviated past a certain threshold, a split will be performed.

In [88]:
from langchain_openai import OpenAIEmbeddings
from langchain_experimental.text_splitter import SemanticChunker
from langchain_community.document_loaders import PyPDFLoader

# Create a document loader for rag_paper.pdf
loader = PyPDFLoader('./data/rag-paper.pdf')
# Instantiate an OpenAI embeddings model
embedding_model = OpenAIEmbeddings(api_key=api_key, model='text-embedding-3-small')

# Create the semantic text splitter with desired parameters
semantic_splitter = SemanticChunker(
    embeddings=embedding_model, breakpoint_threshold_type="gradient", breakpoint_threshold_amount=0.8
)

# Split the document
chunks = semantic_splitter.split_documents(document)
print(chunks[0])

page_content='Retrieval-Augmented Generation for
Knowledge-Intensive NLP Tasks
Patrick Lewis†‡, Ethan Perez⋆,
Aleksandra Piktus†, Fabio Petroni†, Vladimir Karpukhin†, Naman Goyal†, Heinrich Küttler†,
Mike Lewis†, Wen-tau Yih†, Tim Rocktäschel†‡, Sebastian Riedel†‡, Douwe Kiela†
†Facebook AI Research; ‡University College London; ⋆New York University;
plewis@fb.com
Abstract
Large pre-trained language models have been shown to store factual knowledge
in their parameters, and achieve state-of-the-art results when ﬁne-tuned on down-
stream NLP tasks.' metadata={'producer': 'pdfTeX-1.40.21', 'creator': 'LaTeX with hyperref', 'creationdate': '2021-04-13T00:48:38+00:00', 'author': '', 'keywords': '', 'moddate': '2021-04-13T00:48:38+00:00', 'ptex.fullbanner': 'This is pdfTeX, Version 3.14159265-2.6-1.40.21 (TeX Live 2020) kpathsea version 6.3.2', 'subject': '', 'title': '', 'trapped': '/False', 'source': './data/rag-paper.pdf', 'total_pages': 19, 'page': 0, 'page_label': '1'}


### Section 2.3 - Optimizing document retrieval

In [89]:
from langchain_community.retrievers import BM25Retriever

#### Understanding BM25

Before you start integrating a BM25 sparse retriever into your RAG architecture, it's best to test it on some short strings to get a intuition for how the retriever selects the documents.

You've been provided with three strings that you'll use as the basis for your BM25 retriever. The functionality required for this exercise is already loaded for you.

In [105]:
chunks = [
    "RAG stands for Retrieval Augmented Generation.",
    "Graph Retrieval Augmented Generation uses graphs to store and utilize relationships between documents in the retrieval process.",
    "There are different types of RAG architectures; for example, Graph RAG."
]

# Initialize the BM25 retriever
bm25_retriever = BM25Retriever.from_texts(chunks, k=3)

# Invoke the retriever
results = bm25_retriever.invoke("Graph RAG")

# Extract the page content from the first result
print("Most Relevant Document:")
print(results[0].page_content)

Most Relevant Document:
There are different types of RAG architectures; for example, Graph RAG.


#### Sparse retrieval with BM25

Time to try out a sparse retrieval implementation! You'll create a BM25 retriever to ask questions about an academic paper on RAG, which has already been split into chunks called chunks. An OpenAI chat model and prompt have also been defined as `llm` and `prompt`, respectively. You can view the prompt provided by printing it in the console.

In [106]:
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate

llm = ChatOpenAI(model="gpt-4o-mini", api_key=api_key, temperature=0)

In [113]:
# Create a document loader for rag_paper.pdf
loader = PyPDFLoader('./data/rag-paper.pdf')
# Instantiate an OpenAI embeddings model
embedding_model = OpenAIEmbeddings(api_key=api_key, model='text-embedding-3-small')

# Create the semantic text splitter with desired parameters
semantic_splitter = SemanticChunker(
    embeddings=embedding_model, breakpoint_threshold_type="gradient", breakpoint_threshold_amount=0.8
)

# Split the document
chunks = semantic_splitter.split_documents(document)
# print(chunks[0])

page_content='Retrieval-Augmented Generation for
Knowledge-Intensive NLP Tasks
Patrick Lewis†‡, Ethan Perez⋆,
Aleksandra Piktus†, Fabio Petroni†, Vladimir Karpukhin†, Naman Goyal†, Heinrich Küttler†,
Mike Lewis†, Wen-tau Yih†, Tim Rocktäschel†‡, Sebastian Riedel†‡, Douwe Kiela†
†Facebook AI Research; ‡University College London; ⋆New York University;
plewis@fb.com
Abstract
Large pre-trained language models have been shown to store factual knowledge
in their parameters, and achieve state-of-the-art results when ﬁne-tuned on down-
stream NLP tasks.' metadata={'producer': 'pdfTeX-1.40.21', 'creator': 'LaTeX with hyperref', 'creationdate': '2021-04-13T00:48:38+00:00', 'author': '', 'keywords': '', 'moddate': '2021-04-13T00:48:38+00:00', 'ptex.fullbanner': 'This is pdfTeX, Version 3.14159265-2.6-1.40.21 (TeX Live 2020) kpathsea version 6.3.2', 'subject': '', 'title': '', 'trapped': '/False', 'source': './data/rag-paper.pdf', 'total_pages': 19, 'page': 0, 'page_label': '1'}


In [114]:
prompt_string  ="""
Use the only the context provided to answer the following question. If you don't know the answer, reply that you are unsure.
Context: {context}
Question: {question}\n"""

prompt = ChatPromptTemplate.from_template(prompt_string)

In [115]:
# Create a BM25 retriever from chunks
retriever = BM25Retriever.from_documents(
    documents=chunks, 
    k=5
)

# Create the LCEL retrieval chain
chain = ({"context": retriever, "question": RunnablePassthrough()}
         | prompt
         | llm
         | StrOutputParser()
)

# Invoke the chain
print(chain.invoke("What are knowledge-intensive tasks?"))

Knowledge-intensive tasks are tasks that require a significant amount of factual knowledge and understanding to perform effectively. In the context provided, they include activities such as open-domain question answering and generating questions based on specific answers, which demand the ability to retrieve and utilize factual information accurately.


### Section 2.4 - Introduction to RAG evaluation

**remark** - I found the video unclear. The matter is rather complex is very briefly introduced and too difficult to capture from the video. 

#### Ragas context precision evaluation
To start your RAG evaluation journey, you'll begin by evaluating the context precision RAG metric using the `ragas` framework. Recall that context precision is essentially a measure of how relevant the retrieved documents are to the input query.

In this exercise, you've been provided with an input query, and the documents retrieved by a RAG application, and the ground truth, which was the most appropriate document to retrieve based on the opinion of a human expert. You'll calculate the context precision on these strings before evaluating an actual LangChain RAG chain in the next exercise.

The text generated by the RAG application has been saved to the variable `model_response` for brevity.

In [122]:
from langchain_openai import ChatOpenAI, OpenAIEmbeddings

llm = ChatOpenAI(model="gpt-4o-mini", api_key=api_key, temperature=0)
embeddings = OpenAIEmbeddings(model="text-embedding-3-small", api_key=api_key)

In [126]:
from ragas.metrics import context_precision
from ragas.integrations.langchain import EvaluatorChain

# Define the context precision chain
context_precision_chain = EvaluatorChain(metric=context_precision, llm=llm, embeddings=embeddings)

# Evaluate the context precision of the RAG chain
eval_result = context_precision_chain({
  "question": "How does RAG enable AI applications?",
  "ground_truth": "RAG enables AI applications by integrating external data in generative models.",
  "contexts": [
    "RAG enables AI applications by integrating external data in generative models.",
    "RAG enables AI applications such as semantic search engines, recommendation systems, and context-aware chatbots."
  ]
})

print(f"Context Precision: {eval_result['context_precision']}")

Context Precision: 0.0


#### Ragas faithfulness evaluation

In this exercise, you'll evaluate the faithfulness of the RAG architecture you created at the end of Chapter 1. This chain has been re-defined for you and is available as through the variable `chain`.

You'll use the query provided, the chain's output, and the retrieved output to evaluate the faithfulness using the `ragas` framework.

The classes required have already been imported for you.

#### String evaluation
Time to really evaluate the final output by comparing it to an answer written by a subject matter expert. You'll use LangSmith's `LangChainStringEvaluator` class to perform this string comparison evaluation.

A `prompt_template` for string evaluation has already been written for you as:

The output from the RAG chain is stored as `predicted_answer` and the expert's response is stored as `ref_answer`.

All of the necessary classes have been imported for you.

**remark** - below we need to do quite some setup for the exercise

In [147]:
from langchain_openai import ChatOpenAI
from langsmith.evaluation import LangChainStringEvaluator
from langchain_core.prompts import ChatPromptTemplate, PromptTemplate

eval_llm = ChatOpenAI(model="gpt-4o-mini", api_key=api_key, temperature=0)

prompt = """You are an expert professor specialized in grading students' answers to questions.
You are grading the following question:{query}
Here is the real answer:{answer}
You are grading the following predicted answer:{result}
Respond with CORRECT or INCORRECT:
Grade:"""

prompt_template = PromptTemplate(
    input_variables=["query", "answer", "result"],
    template= prompt)

prompt_template

query = "How does RAG improve question answering with LLMs"
predicted_answer = "RAG improves question answering with LLMs by generating correct answers even when the correct answer is not present in any retrieved document, achieving a notable accuracy of 11.8% in such cases, while extractive models would score 0%. Additionally, RAG models outperform other models like BART in terms of generating factually correct and diverse text, as well as being able to answer questions in a more flexible, abstractive manner rather than relying solely on extractive methods."
ref_answer = "Retrieval-Augmented Generation (RAG) improves question answering with large language models (LLMs) by combining a retrieval mechanism with a generative model. The retrieval system fetches relevant documents or passages from external knowledge sources, giving the LLM access to more up-to-date and accurate information than what it has learned during training. This allows RAG to generate responses that are grounded in factual data, reducing the risk of hallucination and improving the model's accuracy, especially in niche or specialized domains where the LLM alone may lack expertise. By leveraging both external knowledge and the generative abilities of LLMs, RAG enhances the quality, relevance, and factuality of the answers provided."

**exercise**

In [148]:
# Create the QA string evaluator
qa_evaluator = LangChainStringEvaluator(
    "qa",
    config={
        "llm": eval_llm,
        "prompt": prompt_template
    }
)

query = "How does RAG improve question answering with LLMs?"

# Evaluate the RAG output by evaluating strings
score = qa_evaluator.evaluator.evaluate_strings(
    prediction=predicted_answer,
    reference=ref_answer,
    input=query
)

print(f"Score: {score}")

Score: {'reasoning': 'INCORRECT', 'value': 'INCORRECT', 'score': 0}


oh dear! Looks like this RAG application needs to go back to the drawing board. Perhaps some of the techniques learned in this chapter, like semantic splitting or sparse retrieval, would improve this metric, or perhaps tweaking the retrieval prompt to allow a bit more creativity.

## Chapter 3 - Introduction to Graph RAG