Retrieval-Augmented Generation with LlamaIndex 🦙📚

> **Learning Objectives:**
> * 🧠 Understand the principles of Retrieval-Augmented Generation (RAG) and LlamaIndex
> * 🛠️ Set up a development environment for LlamaIndex
> * 🔍 Build and customize a basic RAG pipeline
> * 📊 Explore different index types in LlamaIndex

## Introduction


LlamaIndex is a data framework that simplifies the process of connecting custom data sources to LLMs. It provides tools for data ingestion, structuring, and efficient retrieval, making it easier to build robust RAG systems.


> 💡 **Retrieval-Augmented Generation (RAG)** is a technique that enhances large language models (LLMs) with external knowledge.
>
> Instead of relying solely on the model's pre-trained knowledge, RAG systems retrieve relevant information from a knowledge base to generate more accurate and contextually appropriate responses.



<img src = 'https://drive.google.com/uc?id=19zHBeEQrvtUgRy7ZdALmn1AASNq_uC5F'>

[Source](https://arxiv.org/abs/2402.19473)



🦙 **LlamaIndex** is a data framework that simplifies the process of connecting custom data sources to LLMs. It provides tools for data ingestion, structuring, and efficient retrieval, making it easier to build robust RAG systems.

🔑 Key Concepts:
- **RAG**: A technique that combines retrieval of relevant information with generative AI to produce more accurate and contextual responses.
- **Documents**: Raw text data, like files or web pages, that serve as the knowledge source.
- **Nodes**: Smaller, more manageable pieces of information extracted from documents.
- **Indices**: Data structures that organize nodes for efficient searching and retrieval.
- **Query Engine**: The component that processes user queries and generates responses using the index and LLM.

> Check out the LlamaIndex [documentation](https://docs.llamaindex.ai/en/stable/) for more information.





Let's get started by setting up our environment!

### Environment Setup

First, we need to install the necessary libraries and set up our OpenAI API key.

In [None]:
!pip install llama-index --quiet
!pip install openai --quiet
!pip install python-dotenv --quiet
import getpass
import os

from llama_index.llms.openai import OpenAI
from llama_index.core import Settings

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m52.8/52.8 kB[0m [31m2.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.6/1.6 MB[0m [31m24.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.2/1.2 MB[0m [31m48.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.5/1.5 MB[0m [31m62.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.2/1.2 MB[0m [31m55.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m173.8/173.8 kB[0m [31m10.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m76.4/76.4 kB[0m [31m4.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m78.0/78.0 kB[0m [31m5.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0

In [None]:

# Set OpenAI API key
os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API key: ")
API_KEY = os.environ["OPENAI_API_KEY"]
# Initialize the model

# Create a service context
Settings.llm = OpenAI(model="gpt-3.5-turbo")

print("Environment set up successfully!")

## Exercise 1 - Building a Basic RAG Pipeline

Now that our environment is set up, let's create a simple RAG system using LlamaIndex. We'll start by creating some sample documents and then build an index to query them.

🧠 Understanding the Pipeline:

1. We create sample documents to serve as our knowledge base.

2. The `SimpleDirectoryReader` loads these documents into our system.

3. We create a `VectorStoreIndex`, which converts our documents into vector embeddings for efficient semantic search.

4. The `as_query_engine()` method creates an interface for querying our index.

5. Finally, we test our system with a simple query.


### 1.2 Loading Sample Documents

Let's first create some sample documents for this lab, before proceeding to querying them. Go ahead and run the cell below to proceed storing our data.

In [None]:
# Sample Document 1: Introduction to Machine Learning
ml_doc = """
Machine Learning is a subset of artificial intelligence that focuses on the development of algorithms and statistical models that enable computer systems to improve their performance on a specific task through experience. Unlike traditional programming, where explicit instructions are provided to solve a problem, machine learning algorithms use data to learn patterns and make predictions or decisions without being explicitly programmed.

Key concepts in machine learning include:

1. Supervised Learning: The algorithm learns from labeled training data, attempting to find a function that best maps input variables to output variables.

2. Unsupervised Learning: The algorithm works on unlabeled data, trying to find hidden structures or patterns within the data.

3. Reinforcement Learning: The algorithm learns to make decisions by performing actions in an environment to maximize a reward signal.

4. Deep Learning: A subset of machine learning based on artificial neural networks with multiple layers, capable of learning complex patterns in large amounts of data.

Machine learning has numerous applications across various fields, including:
- Image and speech recognition
- Natural language processing
- Recommendation systems
- Autonomous vehicles
- Medical diagnosis
- Financial forecasting

As the field continues to evolve, new techniques and applications are constantly emerging, making machine learning one of the most exciting and rapidly growing areas in computer science and artificial intelligence.
"""

# Sample Document 2: The Impact of Climate Change
climate_doc = """
Climate change refers to long-term shifts in global weather patterns and average temperatures. Primarily caused by human activities, particularly the burning of fossil fuels, climate change is one of the most pressing issues facing our planet today. The impacts of climate change are far-reaching and affect every aspect of life on Earth.

Key effects of climate change include:

1. Rising Global Temperatures: The Earth's average temperature has increased by about 1°C since pre-industrial times, with most of this warming occurring in the past 40 years.

2. Extreme Weather Events: Climate change is leading to more frequent and severe heatwaves, droughts, hurricanes, and floods.

3. Sea Level Rise: Melting ice caps and glaciers, combined with thermal expansion of the oceans, are causing sea levels to rise, threatening coastal communities and ecosystems.

4. Biodiversity Loss: Changing temperatures and weather patterns are forcing many species to adapt or migrate, with some facing extinction.

5. Food and Water Security: Altered precipitation patterns and rising temperatures affect agriculture and water availability, potentially leading to food and water shortages.

6. Human Health: Climate change impacts human health through increased air pollution, the spread of infectious diseases, and heat-related illnesses.

Addressing climate change requires a multi-faceted approach, including:
- Transitioning to renewable energy sources
- Improving energy efficiency
- Sustainable land use and forest management
- Developing climate-resilient infrastructure
- International cooperation and policy implementation

The urgency of the climate crisis calls for immediate and decisive action from governments, businesses, and individuals worldwide to mitigate its effects and adapt to the changes already underway.
"""

# Sample Document 3: The Evolution of Artificial Intelligence
ai_doc = """
Artificial Intelligence (AI) is a branch of computer science that aims to create intelligent machines capable of mimicking human cognitive functions such as learning, problem-solving, and decision-making. The field has evolved significantly since its inception in the 1950s, with major breakthroughs and paradigm shifts shaping its development.

Key milestones in the evolution of AI include:

1. Early AI (1950s-1970s): Focused on symbolic AI and rule-based systems. This era saw the development of the Turing Test, the first AI programs like the Logic Theorist, and expert systems.

2. AI Winter (1970s-1980s): A period of reduced funding and interest in AI due to overhyped promises and limited progress.

3. Machine Learning Revolution (1990s-2000s): The rise of statistical approaches and machine learning algorithms, including support vector machines and decision trees.

4. Deep Learning Breakthrough (2010s-present): Advances in neural networks and deep learning, fueled by increased computational power and big data, led to significant progress in areas like computer vision and natural language processing.

5. Current Trends:
   - Reinforcement Learning: AI systems learning through interaction with environments.
   - Generative AI: Models capable of creating new content, including text, images, and audio.
   - Explainable AI: Developing methods to make AI decision-making processes more transparent and interpretable.
   - AI Ethics: Addressing concerns about bias, privacy, and the societal impact of AI.

Applications of modern AI span various domains:
- Healthcare: Disease diagnosis, drug discovery, and personalized treatment plans.
- Finance: Algorithmic trading, fraud detection, and risk assessment.
- Transportation: Autonomous vehicles and traffic management systems.
- Entertainment: Personalized content recommendations and AI-generated art.
- Education: Adaptive learning systems and automated grading.

As AI continues to advance, it raises important questions about the future of work, privacy, and the relationship between humans and machines. The field's ongoing development promises to revolutionize numerous aspects of society while also presenting new challenges and ethical considerations.
"""
!mkdir docs
# Write the documents to files
with open("./docs/machine_learning.txt", "w") as f:
    f.write(ml_doc)

with open("./docs/climate_change.txt", "w") as f:
    f.write(climate_doc)

with open("./docs/artificial_intelligence.txt", "w") as f:
    f.write(ai_doc)

print("Sample documents created successfully!")

### 1.2 Creating RAG Pipeline

Now we'll use llama index to create an index and ask information about the stored documents. You are encouraged to review the documents yourself to ensure quality of our RAG's results.

In [None]:
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

# Load documents
documents = SimpleDirectoryReader('./docs/').load_data()
print(f"Loaded {len(documents)} documents")

# Create an index
index = VectorStoreIndex.from_documents(documents)

# Create a query engine
query_engine = index.as_query_engine()

# Test the query engine
response = query_engine.query("Which documents do we have loaded? Give brief bullet descriptions for each")
print(response)

### 💡 Why This Matters:
> You've just created an AI system that can understand and answer questions about custom data. Consider how this could be applied in various fields such as customer service, research assistance, or knowledge management.
>
> You may have also noticed that it is quite easy to prototype RAGs quickly with LllamaIndex with just a few short lines of code!

### 🧪 Experiment


1. Try querying your RAG system with different questions related to your chosen topic.

2. How does the system handle questions about information not present in your data? Test this and explain what you observe.

In [None]:

# 1. Querying with different questions
questions = [
    "What are the main benefits of Artifical Intelligence?",
    "How is Machine learning related to AI?",
    "What should citizens know about Climate Change?"
]

for question in questions:
    response = query_engine.query(question)
    print(f"Question: {question}")
    print(f"Answer: {response}\n")



In [None]:

# 2. Out of scope questions
out_of_scope_questions = [
    "What is the capital of France?",
    "How do I bake a chocolate cake?",
    "Who won the World Cup in 2022?"
]

for question in out_of_scope_questions:
    response = query_engine.query(question)
    print(f"Question: {question}")
    print(f"Answer: {response}\n")

## Exercise 2 - Exploring Index Types

LlamaIndex offers various index types, each suited for different use cases. Let's explore three common types: Vector Store Index, Summary Index, and Tree Index.



### 2.1 Vector Store Index

We've already used this in our basic pipeline. It's great for semantic search and retrieving contextually relevant information.



🧠 The Vector Store Index creates embeddings for each chunk of text, allowing for semantic similarity search. This is particularly useful when you need to find information based on meaning rather than exact keyword matches.

<img src = 'https://drive.google.com/uc?id=1OqZf4eet5RBjOrKydyR6ChnTygPAziFh'>




In [None]:
from llama_index.core import VectorStoreIndex

vector_index = VectorStoreIndex.from_documents(documents, show_progress=True)
query_engine = vector_index.as_query_engine()

response = query_engine.query("Explain like I'm 5: What are ML and AI?")
print(response)

### 2.2 Summary Index

The Summary Index is simpler and focuses on generating summaries of the ingested documents.

🧠 The Summary Index is ideal when you need high-level overviews of your data. It doesn't perform semantic search but instead focuses on condensing information.
<img src = 'https://drive.google.com/uc?id=1IQbRTxq9wBVfqy3lVfgxVJToWSivaThW'>




In [None]:
from llama_index.core import SummaryIndex

summary_index = SummaryIndex.from_documents(documents)
query_engine = summary_index.as_query_engine()

response = query_engine.query("Summarize the key points from all documents with bullets")
print(response)

### 2.3 Tree Index

This index type is used for datasets that contain hierarchical data.


🧠 The Tree Index is powerful for understanding relationships between different pieces of information of a hierarchical nature. It's particularly useful in domains where the hierarchical relationships between concepts are essential to understanding the data.

<img src = 'https://drive.google.com/uc?id=1LmxVOPEG6neEjwHZ9QGapCzdkPWMuyIT' width = 500>



In [None]:
from llama_index.core import TreeIndex

tree_index = TreeIndex.from_documents(documents)
query_engine = tree_index.as_query_engine()

response = query_engine.query("What are the relationships between ML and AI?")
print(response)

**You can learn more about the details of Query Indexing strategies in LlamaIndex's website [here](https://docs.llamaindex.ai/en/stable/module_guides/indexing/index_guide/)**

### 2.4 Index Suitability

1. **Vector Store Index**: Best for semantic search and finding contextually relevant information. Suitable for applications like chatbots or question-answering systems where understanding the context is crucial.

2. **Summary Index**: Ideal for generating overviews or summaries of large documents. Useful in applications that need to provide quick, high-level insights from extensive data.

3. **Tree Index**: Excellent for understanding hierarchical information between concepts. Well-suited for complex domains where hierchical nature is relevant (e.g. Medical Domain - Medical Ontology data).


### 🧪 Experiment


1. Build each type of index (Vector Store, Summary, and Knowledge Graph) using these documents.
2. Query each index with the same question and compare the results. How do the responses differ?
3. Reflect on which index type might be most suitable for different types of applications or datasets.

In [None]:


from llama_index.core import VectorStoreIndex, SummaryIndex, KnowledgeGraphIndex

# Vector Store Index
vector_index = VectorStoreIndex.from_documents(documents)

# Summary Index
summary_index = SummaryIndex.from_documents(documents)

# Knowledge Graph Index
kg_index = KnowledgeGraphIndex.from_documents(documents)

# Create query engines for each index
vector_query_engine = vector_index.as_query_engine()
summary_query_engine = summary_index.as_query_engine()
kg_query_engine = kg_index.as_query_engine()

In [None]:


question = "What are the key challenges in artificial intelligence and machine learning?"

print("Vector Store Index Response:")
print(vector_query_engine.query(question))

print("\nSummary Index Response:")
print(summary_query_engine.query(question))

print("\nKnowledge Graph Index Response:")
print(kg_query_engine.query(question))


## Exercise 3 - Storing Your Index

Now that we've explored different index types and customization options, let's learn how to store our indexed data for future use. This is crucial for avoiding the time and cost associated with re-indexing large datasets.

🧠 **Why Store Indices?**

Storing indices allows you to:
1. Save time by avoiding repeated data processing and embedding generation
2. Reduce costs, especially when working with large datasets or expensive embedding models
3. Maintain consistency across different runs or application instances
4. Quickly load and use your indexed data in production environments

Let's explore two main methods of storing our index: persisting to disk and using vector stores.


### 3.1 Persisting to Disk

The simplest way to store your indexed data is by using the built-in `.persist()` method. This works for any type of index.

In [None]:
# Assuming we're using the vector_index we created earlier
persist_dir = "./stored_index"
vector_index.storage_context.persist(persist_dir=persist_dir)

print(f"Index persisted to {persist_dir}")

To load the persisted index later:


In [None]:
from llama_index.core import StorageContext, load_index_from_storage

# Rebuild storage context
storage_context = StorageContext.from_defaults(persist_dir=persist_dir)

# Load index
loaded_index = load_index_from_storage(storage_context)

print("Index loaded successfully!")


### 3.2 Using Vector Stores

For more advanced storage and faster retrieval, especially with large datasets, we can use specialized vector stores. Let's use Chroma as an example.

First, install Chroma:

In [None]:
!pip install chromadb --quiet
!pip install llama_index.vector_stores.chroma --quiet

Now, let's set up Chroma and store our index:

In [None]:
import chromadb
from llama_index.vector_stores.chroma import ChromaVectorStore

# Initialize Chroma client
db = chromadb.PersistentClient(path="./chroma_db")

# Create or get a collection
chroma_collection = db.get_or_create_collection("llamaindex_lab")

# Create ChromaVectorStore
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)

# Create a new storage context using the vector store
storage_context = StorageContext.from_defaults(vector_store=vector_store)

# Create a new index using the storage context
chroma_index = VectorStoreIndex.from_documents(
    documents, storage_context=storage_context
)

print("Index stored in Chroma successfully!")

To load and use the Chroma-stored index later:


In [None]:
# Get the existing collection
chroma_collection = db.get_collection("llamaindex_lab")

# Create ChromaVectorStore
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)

# Load the index
loaded_chroma_index = VectorStoreIndex.from_vector_store(vector_store)

print("Chroma-stored index loaded successfully!")


### 🧪 Experiment

1. Experiment with persisting different types of indices (e.g., SummaryIndex, TreeIndex) to disk.


In [None]:

from llama_index.core import SummaryIndex, TreeIndex

# Create and persist SummaryIndex
summary_index = SummaryIndex.from_documents(documents)
summary_index.storage_context.persist(persist_dir="./summary_index")

# Create and persist TreeIndex
kg_index = TreeIndex.from_documents(documents)
kg_index.storage_context.persist(persist_dir="./kg_index")

print("Indices persisted successfully!")

# Loading persisted indices
from llama_index.core import StorageContext, load_index_from_storage

loaded_summary_index = load_index_from_storage(StorageContext.from_defaults(persist_dir="./summary_index"))
loaded_kg_index = load_index_from_storage(StorageContext.from_defaults(persist_dir="./kg_index"))

print("Indices loaded successfully!")

## Exercise 4 - Advanced Querying Techniques

Now that we have our index stored, let's explore some advanced querying techniques to get the most out of our RAG system.

### 4.1 Customizing the Query Pipeline

LlamaIndex allows you to customize various stages of the querying process. Let's break down the querying pipeline and customize each part:

In [None]:
from llama_index.core import get_response_synthesizer
from llama_index.core.retrievers import VectorIndexRetriever
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.postprocessor import SimilarityPostprocessor

# Custom retriever
retriever = VectorIndexRetriever(
    index=loaded_chroma_index,
    similarity_top_k=5  # Retrieve top 5 most similar nodes
)

# Custom response synthesizer
response_synthesizer = get_response_synthesizer(
    response_mode="compact"  # Other options: "default", "tree_summarize", "accumulate"
)

# Custom post-processor
postprocessor = SimilarityPostprocessor(similarity_cutoff=0.7)

# Assemble custom query engine
custom_query_engine = RetrieverQueryEngine(
    retriever=retriever,
    response_synthesizer=response_synthesizer,
    node_postprocessors=[postprocessor]
)

# Query using the custom engine
response = custom_query_engine.query("What are all the main applications of AI in healthcare?")
print(response)

### 4.2 Structured Outputs

Sometimes, you may want to ensure that your RAG system's outputs follow a specific structure. We can use Pydantic models to achieve this:

In [None]:
!pip install llama_index.core --quiet
!pip install pydantic --quiet

In [None]:
from llama_index.program.openai import OpenAIPydanticProgram
from llama_index.llms.openai import OpenAI
from typing import List, Optional
from pydantic import BaseModel, Field


prompt_template_str = """\
Generate 4 applications of Artificial Intelligence in Healthcare.
"""

class Application(BaseModel):
    """Data model for a Application."""

    application: str = Field(description="Name of the application")
    description: str = Field(description="Brief Description of the application")
    benefits: str = Field(description="Key benefits of using this application")
    use_cases: str = Field(description="Examples of use cases for this application")


class AIApplications(BaseModel):
    """Data model for an Application."""


    applications: List[Application]



In [None]:
prompt_template_str = """\
Generate 4 applications of Artificial Intelligence in Healthcare.
"""
program = OpenAIPydanticProgram.from_defaults(
    output_cls=AIApplications,
    prompt_template_str=prompt_template_str,
    verbose=True,
    index = index
)
output = program()



### 🧪 Experiment

1. Build a Structured output class for understanding key issues in Climate change.

2. Test your class by making a `program` using `OpenAIPydanticProgram` and return the output.

In [None]:
### YOUR CODE HERE
# 1. Build Structured output class
from llama_index.program.openai import OpenAIPydanticProgram
from llama_index.llms.openai import OpenAI


class ClimateChange(BaseModel):
    """Data model for a Climate change Issue."""

    issue: str = Field(description="Name of the climate change issue")
    description: str = Field(description="Brief Description of the climate change issue")
    action: str = Field(description="Action to be taken to address the issue")
    impact: str = Field(description="Impact of the issue")
    mitigation: str = Field(description="Mitigation measures to address the issue")

class ClimateIssues(BaseModel):
    """Data model for list of Climate change issues."""

    issues: List[ClimateChange]




In [None]:
### YOUR CODE HERE
# 2. Test our Climate change class  `program`
prompt_template_str = """\
Generate 4 key issues in Climate change.
"""
program = OpenAIPydanticProgram.from_defaults(
    output_cls=ClimateIssues,
    prompt_template_str=prompt_template_str,
    verbose=True,
    index = index
)
output = program()


### Conclusion 🎓

In this extended lab, we've explored advanced techniques for storing indices and performing complex queries using LlamaIndex. We've learned how to:

- 🗂️ **Persist indices to disk and use vector stores for efficient storage and retrieval**
- 🔧 **Customize the query pipeline for more precise and relevant responses**
- 📋 **Generate structured outputs using Pydantic models**
- 🔍 **Perform multi-step querying for complex questions**

These techniques allow you to build more sophisticated and powerful RAG systems, capable of handling a wide range of use cases and query complexities.

🔑 Remember that the key to building effective RAG systems is experimentation and iteration. Don't hesitate to try different combinations of storage methods, query techniques, and output structures to find what works best for your specific use case.

## 🚀 Additional Exercises

1. Implement a caching mechanism to store and reuse query results for frequently asked questions.

2. Create a query pipeline that combines multiple index types (e.g., vector store and knowledge graph) for more comprehensive responses.

3. Develop a simple API endpoint that exposes your RAG system, allowing users to query it programmatically.

4. Experiment with different LLMs (e.g., GPT-4, Claude) and compare their performance in your RAG system.

5. Implement a feedback loop that allows users to rate responses and use this feedback to improve the retrieval and synthesis processes over time.