# Workshop Outline


1. Introduction
   - RAG overview
   - LlamaIndex basics

2. Building a Simple RAG Chatbot
   - Data preparation
   - Building a simple Q&A bot
   - Simple Q&A bot with conversation history

3. Advanced Strategies in RAG
    - Strategies while indexing data
    - Strategies while querying
    - Strategies while answering queries

4. Advanced RAG
    - Agents with reasoning capabilities

5. Building apps with ease with LlamaIndex
    - Chat with your PDF
    - Building a simple copilot for your website

6. Q&A




---



# Introduction


## Introduction to RAG



### What is RAG ?

Retrieval augmented generation, or RAG, is an architectural approach that can improve the efficacy of large language model (LLM) applications by leveraging custom data.

This is done by retrieving data/documents relevant to a question or task and providing them as context for the LLM.

By providing this extra context, LLM can generate answers, that are up-to-date, factually correct, and relevant to a specific domain.

### Sample RAG Architecture

![fm-jumpstar.png](https://docs.aws.amazon.com/images/sagemaker/latest/dg/images/jumpstart/jumpstart-fm-rag.jpg)




### Why do we need RAG ?

1. ***LLMs do not know your data*** - LLMs are often limited to their pre-trained knowledge and data. Once trained, many LLMs do not have the ability to access data beyond their training data cutoff point.

2. ***Factual grounding and consistency*** - LLMs are powerful tools for generating creative and engaging text, but they can sometimes struggle with factual accuracy. This is because LLMs are trained on massive amounts of text data, which may contain inaccuracies or biases.

### What are the use cases of RAG ?

1. ***Question and answer chatbots***: Incorporating LLMs with chatbots allows them to automatically derive more accurate answers from company documents and knowledge bases. Chatbots are used to automate customer support and website lead follow-up to answer questions and resolve issues quickly.

2. ***Search augmentation***: Incorporating LLMs with search engines that augment search results with LLM-generated answers can better answer informational queries and make it easier for users to find the information they need to do their jobs.

3. ***Knowledge engine*** — ask questions on your data (e.g., HR, compliance documents): Company data can be used as context for LLMs and allow employees to get answers to their questions easily, including HR questions related to benefits and policies and security and compliance questions.

### What are the benefits of RAG ?

The RAG approach has a number of key benefits, including:

1. ***Providing up-to-date and accurate responses***: RAG ensures that the response of an LLM is not based solely on static, stale training data. Rather, the model uses up-to-date external data sources to provide responses.
2. ***Reducing inaccurate responses, or hallucinations:*** By grounding the LLM model's output on relevant, external knowledge, RAG attempts to mitigate the risk of responding with incorrect or fabricated information (also known as hallucinations). Outputs can include citations of original sources, allowing human verification.
3. ***Providing domain-specific, relevant responses***: Using RAG, the LLM will be able to provide contextually relevant responses tailored to an organization's proprietary or domain-specific data.
4. ***Being efficient and cost-effective***: Compared to other approaches to customizing LLMs with domain-specific data, RAG is simple and cost-effective. Organizations can deploy RAG without needing to customize the model. This is especially beneficial when models need to be updated frequently with new data.


### Key phases in RAG

![redbricks.jpg](https://www.databricks.com/sites/default/files/inline-images/glossary-rag-image-2.png?v=1704903053)

## LlamaIndex basics


In [None]:
%pip install -q llama-index
%pip install -q llama-index-llms-gemini
%pip install -q google-generativeai
%pip install -q llama-index-embeddings-gemini
%pip install -q rich
%pip install -q requests
%pip install -q beautifulsoup4

### Concepts in Loading stage

[Nodes and Documents](https://docs.llamaindex.ai/en/stable/module_guides/loading/documents_and_nodes/):

A **Document** is a container around any data source - for instance, a PDF, an API output, or retrieve data from a database.

A **Node** is the atomic unit of data in LlamaIndex and represents a "chunk" of a source Document. Nodes have metadata that relate them to the document they are in and to other nodes.



In [None]:
# from llama_index.core import Document, VectorStoreIndex
# from llama_index.core.node_parser import SentenceSplitter

# text_list = [
#     "LlamaIndex provides tools for beginners, advanced users, and everyone in between.",
#     "Our high-level API allows beginner users to use LlamaIndex to ingest and query their data in 5 lines of code. For more complex applications, our lower-level APIs allow advanced users to customize and extend any module—data connectors, indices, retrievers, query engines, reranking modules—to fit their needs."
#   ]
# documents = [Document(text=t) for t in text_list]

# parser = SentenceSplitter()
# nodes = parser.get_nodes_from_documents(documents)

# print("Nodes", nodes)
# print("Documents", documents)


### Concepts in Indexing stage

[Indexes](https://docs.llamaindex.ai/en/stable/module_guides/indexing/)

Once you've ingested your data, LlamaIndex will help you index the data into a structure that's easy to retrieve. This usually involves generating vector embeddings which are stored in a specialized database called a vector store. Indexes can also store a variety of metadata about your data.



> Interested in learning more about indexes ? Visit this site > https://docs.llamaindex.ai/en/stable/module_guides/indexing/index_guide/



[Embeddings](https://docs.llamaindex.ai/en/stable/module_guides/models/embeddings/)

LLMs generate numerical representations of data called embeddings. When filtering your data for relevance, LlamaIndex will convert queries into embeddings, and your vector store will find data that is numerically similar to the embedding of your query.

![img](https://dkharazi.github.io/ecc71bb7c9e227b292dd909b02dbf4e8/embedding.svg)

In [None]:
# from llama_index.embeddings.gemini import GeminiEmbedding
# from google.colab import userdata

# gemini_api_key = userdata.get("GOOGLE_API_KEY")
# gemini_embedding_model = GeminiEmbedding(api_key=gemini_api_key, model_name="models/embedding-001")
# embedding = gemini_embedding_model.get_text_embedding("Hello, world!")

# print(embedding)

### Concepts in querying stage

[Retrievers](https://docs.llamaindex.ai/en/stable/module_guides/querying/retriever/)

A retriever defines how to efficiently retrieve relevant context from an index when given a query. Your retrieval strategy is key to the relevancy of the data retrieved and the efficiency with which it's done.

[Routers](https://docs.llamaindex.ai/en/stable/module_guides/querying/router/)

A router determines which retriever will be used to retrieve relevant context from the knowledge base. More specifically, the RouterRetriever class, is responsible for selecting one or multiple candidate retrievers to execute a query. They use a selector to choose the best option based on each candidate's metadata and the query.

[Node Postprocessors](https://docs.llamaindex.ai/en/stable/module_guides/querying/node_postprocessors/)

A node postprocessor takes in a set of retrieved nodes and applies transformations, filtering, or re-ranking logic to them.

[Response Synthesizers](https://docs.llamaindex.ai/en/stable/module_guides/querying/response_synthesizers/)

A response synthesizer generates a response from an LLM, using a user query and a given set of retrieved text chunks.



In [None]:
# from llama_index.core.data_structs import Node
# from llama_index.core.schema import NodeWithScore
# from llama_index.core.response_synthesizers import ResponseMode
# from llama_index.core import get_response_synthesizer
# from llama_index.llms.gemini import Gemini
# from google.colab import userdata

# gemini_api_key = userdata.get("GOOGLE_API_KEY")
# gemini_llm = Gemini(model="models/gemini-1.5-flash", api_key=gemini_api_key)

# response_synthesizer = get_response_synthesizer(
#     response_mode=ResponseMode.COMPACT, llm=gemini_llm
# )

# response = response_synthesizer.synthesize(
#     "Who was the indian cricket team coach in 2022 ?", nodes=[NodeWithScore(node=Node(text="Rahul dravid was the indian cricket team coach between 2021 and 2024"), score=1.0)]
# )

# print(response)

# Building a simple RAG Chatbot

## Data preparation



Let's scrape some indian cricket players bio-graphy from Wikipedia, which we'll use while building a RAG app.

We'll build a chatbot which will answer queries about indian cricket players based on the biography we're scraping off of Wikipedia

In [None]:
# import os
# import requests
# from bs4 import BeautifulSoup


# INDIAN_PLAYERS = [
#     "https://en.wikipedia.org/wiki/MS_Dhoni",
#     "https://en.wikipedia.org/wiki/Ruturaj_Gaikwad",
#     "https://en.wikipedia.org/wiki/Shubman_Gill",
#     "https://en.wikipedia.org/wiki/Prithvi_Shaw",
#     "https://en.wikipedia.org/wiki/Suryakumar_Yadav",
#     "https://en.wikipedia.org/wiki/Devdutt_Padikkal",
#     "https://en.wikipedia.org/wiki/Chetan_Sakariya",
#     "https://en.wikipedia.org/wiki/Shreyas_Iyer",
#     "https://en.wikipedia.org/wiki/Yashasvi_Jaiswal",
#     "https://en.wikipedia.org/wiki/Virat_Kohli",
#     "https://en.wikipedia.org/wiki/Rohit_Sharma",
#     "https://en.wikipedia.org/wiki/Shikhar_Dhawan",
#     "https://en.wikipedia.org/wiki/Lokesh_Rahul",
# ]


# def scrape_wiki(url):
#     response = requests.get(url)
#     soup = BeautifulSoup(response.content, "html.parser")
#     name = soup.find("h1", class_="firstHeading").text

#     para_tags = soup.find_all("p")
#     full_page_text = ""

#     for para in para_tags:
#         full_page_text += para.text

#     return name, full_page_text


# # Store player bio taken from wikipedia in text files
# os.mkdir("/content/player_wiki")

# player_data = []
# for player in INDIAN_PLAYERS:
#     name, text = scrape_wiki(player)
#     with open(f"/content/player_wiki/{name}.txt", "x") as f:
#         f.write(text)


# # Look around what information is scraped
# files = os.listdir("/content/player_wiki")
# for player_name in files[:10]:
#   file_path = f"/content/player_wiki/{player_name}"

#   file_content = None

#   with open(file_path) as f:
#     file_content = f.read()

#   print(player_name.strip(".txt"))
#   print(file_content[:500], end="\n\n")


Let's scrape IPL Teams information from Wikipedia, which we'll use with player bio for building RAG app.

In [None]:
# IPL_TEAMS = [
#     "https://en.wikipedia.org/wiki/Chennai_Super_Kings",
#     "https://en.wikipedia.org/wiki/Delhi_Capitals",
#     "https://en.wikipedia.org/wiki/Kolkata_Knight_Riders",
#     "https://en.wikipedia.org/wiki/Mumbai_Indians",
#     "https://en.wikipedia.org/wiki/Punjab_Kings",
#     "https://en.wikipedia.org/wiki/Rajasthan_Royals",
#     "https://en.wikipedia.org/wiki/Royal_Challengers_Bangalore",
#     "https://en.wikipedia.org/wiki/Sunrisers_Hyderabad",
# ]

# # Store team bio taken from wikipedia in text files
# os.mkdir("/content/team_wiki")

# for team in IPL_TEAMS:
#     name, text = scrape_wiki(team)
#     with open(f"./team_wiki/{name}.txt", "x") as f:
#         f.write(text)

## Building a simple Q&A bot

We'll now look at how to build a Q&A bot, over the data we've shown above.

To do this, we'll use LlamaIndex and Gemini API for generating responses and embeddings for the user queries.

In [None]:
# from llama_index.llms.gemini import Gemini
# from llama_index.embeddings.gemini import GeminiEmbedding
# from llama_index.core import SimpleDirectoryReader, VectorStoreIndex
# from google.colab import userdata
# from llama_index.core import Settings

# gemini_api_key = userdata.get("GOOGLE_API_KEY")

# # Initialise the gemini embedding model to be used for search
# gemini_embedding_model = GeminiEmbedding(api_key=gemini_api_key, model_name="models/embedding-001")

# # Initialise Gemini Flash LLM to be used for generating answers
# gemini_llm = Gemini(model="models/gemini-1.5-flash", api_key=gemini_api_key)

# # Mark the LLM and embedding model to be used as default
# Settings.embed_model = gemini_embedding_model
# Settings.llm = gemini_llm

# # Read the data from the directory
# reader = SimpleDirectoryReader(input_dir="/content/player_wiki/")
# documents = reader.load_data()

# # Create a in-memory vector store index from the documents
# index = VectorStoreIndex.from_documents(documents=documents)

# query_engine = index.as_query_engine(similarity_top_k=3)

# while True:
#   query = input("Enter your query: ")
#   if query == "q":
#     break

#   response = query_engine.query(query)
#   print(f"Response: {response}")

## Limitations of this Q&A bot




*   Can't handle multiple / complex queries in effectively. Try asking it queries like `With which team did Rohith Sharma win IPL titles with, and what is his highest ODI score is ?`

*   Doesn't remember the previous queries and answers, and hence can't handle follow-up questions.



## Simple Q&A bot with conversation history


As we've highlighted before, the simple Q&A bot we've built before lacks the ability to remember the previous conversations it has had.

We'll now look at how to add conversational history to the Q&A bot, and make it a full-fledged chatbot that you can have natural conversations with.

🚀 Let's get started !

In [None]:
# from llama_index.llms.gemini import Gemini
# from llama_index.embeddings.gemini import GeminiEmbedding
# from llama_index.core import SimpleDirectoryReader, VectorStoreIndex
# from google.colab import userdata
# from llama_index.core import Settings
# from llama_index.core.llms import ChatMessage, MessageRole

# messages = []
# gemini_api_key = userdata.get("GOOGLE_API_KEY")

# # Initialise the gemini embedding model to be used for search
# gemini_embedding_model = GeminiEmbedding(api_key=gemini_api_key, model_name="models/embedding-001")

# # Initialise Gemini Flash LLM to be used for generating answers
# gemini_llm = Gemini(model="models/gemini-1.5-flash", api_key=gemini_api_key)

# # Mark the LLM and embedding model to be used as default
# Settings.embed_model = gemini_embedding_model
# Settings.llm = gemini_llm

# # Read the data from the directory
# # reader = SimpleDirectoryReader(input_dir="/content/player_wiki/")
# # documents = reader.load_data()

# # # Create a in-memory vector store index from the documents
# # index = VectorStoreIndex.from_documents(documents=documents)
# chat_engine = index.as_chat_engine()

# while True:
#   query = input("Human: ")
#   if query == "q":
#     break

#   response = chat_engine.chat(query, chat_history=messages)

#   # Store the messages in a in-memory object. This can be persisted to a DB as well.
#   messages.append(ChatMessage(role=MessageRole.USER, content=query))
#   messages.append(ChatMessage(role=MessageRole.ASSISTANT, content=response))

#   print(f"AI: {response}")

With the above we now have a chatbot, that can answer questions in natural language based off on the data we've collected.

Now let's proceed with augmenting this chatbot with capabilities to tackle advanced use-cases.



---



# Advanced strategies in RAG

## Strategies while indexing data

### Metadata extraction

In many cases, especially with long documents, a chunk of text may lack the context necessary to disambiguate the chunk from other similar chunks of text.

One method of addressing this is manually labelling each chunk in our dataset or knowledge base. To combat this we can use LLMs to extract some metainformation to help differentiate chunks from each other.

We can have LLM's

*   Extract specific metadata we're interested in (based on the domain of the documents)
*   Extract Q&A pairs from chunks, which we can use while querying.



Let's see Q&A metadata extraction in action

In [None]:
# from llama_index.core.node_parser import TokenTextSplitter
# from llama_index.core.extractors import (
#     SummaryExtractor,
#     QuestionsAnsweredExtractor,
# )
# from llama_index.core.schema import MetadataMode
# from llama_index.core import SimpleDirectoryReader, VectorStoreIndex
# from llama_index.core.ingestion import IngestionPipeline
# from llama_index.llms.gemini import Gemini
# from llama_index.embeddings.gemini import GeminiEmbedding
# from google.colab import userdata
# from llama_index.core import Settings
# import json
# import nest_asyncio

# nest_asyncio.apply()

# gemini_api_key = userdata.get("GOOGLE_API_KEY")

# # Load the data from the folder
# reader = SimpleDirectoryReader(input_files=["/content/player_wiki/Chetan Sakariya.txt"])
# documents = reader.load_data()

# # Split the data into chunks based on empty space
# node_parser = TokenTextSplitter(
#     separator=" ", chunk_size=256, chunk_overlap=128
# )
# original_nodes = node_parser.get_nodes_from_documents(documents[:1])

# # Define a Q&A extractor to extract question answer pairs from the documents
# gemini_llm = Gemini(model="models/gemini-1.5-flash", api_key=gemini_api_key)

# # Initialise the gemini embedding model to be used for search
# gemini_embedding_model = GeminiEmbedding(api_key=gemini_api_key, model_name="models/embedding-001")

# Settings.llm = gemini_llm
# Settings.embed_model = gemini_llm


# question_nodes = QuestionsAnsweredExtractor(questions=3, llm=gemini_llm, num_workers=1).extract(nodes=original_nodes)

# for i in range(len(original_nodes)):
#   print(f"Node #{i} Content: {original_nodes[i].get_content()}", end="\n")
#   print(f"Extracted Questions: {question_nodes[i]}", end="\n\n\n")

### Semantic chunking

Semantic chunking is splitting the texts based on embedding similarity i.e., sentences that are closer together in embedding space (or) sentences that have similar meaning are kept together.

![img](https://miro.medium.com/v2/resize:fit:1400/1*mjbjx2U4d-H21jbCSJZcUA.png)

**How does it work ?**

- Starts by splitting the text by sentences, and computes embedding for them.
- Compares the first sentence's embedding with the second, and if they're within a given threshold, they're kept together.
- If the embedding score is not within the threshold, they're separated.
- This is repeated, until the entire text is traversed and processed.

Let's see how simple chunking fares with semantic chunking.

In [None]:
# from llama_index.core import SimpleDirectoryReader
# from llama_index.core.node_parser import (
#     SentenceSplitter,
#     SemanticSplitterNodeParser,
# )
# from llama_index.embeddings.gemini import GeminiEmbedding
# from google.colab import userdata
# from llama_index.core import Settings

# gemini_api_key = userdata.get("GOOGLE_API_KEY")

# # Initialise the gemini embedding model to be used for search
# gemini_embedding_model = GeminiEmbedding(api_key=gemini_api_key, model_name="models/text-embedding-004")

# # Mark the LLM and embedding model to be used as default
# Settings.embed_model = gemini_embedding_model

# # Load document
# documents = SimpleDirectoryReader(input_files=["/content/player_wiki/Devdutt Padikkal.txt"]).load_data()

# # Setup Semantic Splitter
# semantic_splitter = SemanticSplitterNodeParser(
#     buffer_size=1, breakpoint_percentile_threshold=95, embed_model=gemini_embedding_model
# )

# # Setup baseline splitter
# base_splitter = SentenceSplitter(chunk_size=128, chunk_overlap=64)

# semantic_nodes = semantic_splitter.get_nodes_from_documents(documents, show_progress=True)

# base_nodes = base_splitter.get_nodes_from_documents(documents)

# # Compare both
# print(len(semantic_nodes))
# print(len(base_nodes))

# print("Semantic Nodes")
# index_count = 0
# for semantic_node in semantic_nodes:
#   index_count += 1
#   print(f"Node #{index_count}")
#   print(semantic_node.get_content(), end="\n")

# print("Base nodes")
# index_count = 0
# for base_node in base_nodes:
#   index_count += 1
#   print(f"Node #{index_count}")
#   print(base_node.get_content(), end="\n")

## Strategies while querying

### Routing

Routers are modules that take in a user query and a set of "choices" (defined by metadata), and returns one or more selected choices.

They are simple but powerful modules that use LLMs for decision making capabilities. They can be used for the following use cases and more:

- Selecting the right data source among a diverse range of data sources
- Deciding whether to do summarization (e.g. using summary index query engine) or semantic search (e.g. using vector index query engine)

![img](https://miro.medium.com/v2/resize:fit:1400/0*Ado23RkTTpNFRO0b)

In [None]:
# from llama_index.llms.gemini import Gemini
# from llama_index.embeddings.gemini import GeminiEmbedding
# from llama_index.core import SimpleDirectoryReader, VectorStoreIndex
# from google.colab import userdata
# from llama_index.core import Settings
# from llama_index.core.tools import QueryEngineTool
# from llama_index.core.query_engine import RouterQueryEngine
# from llama_index.core.selectors import LLMSingleSelector

# gemini_api_key = userdata.get("GOOGLE_API_KEY")

# # Initialise the gemini embedding model to be used for search
# gemini_embedding_model = GeminiEmbedding(api_key=gemini_api_key, model_name="models/embedding-001", embed_batch_size=50)

# # Initialise Gemini Flash LLM to be used for generating answers
# gemini_llm = Gemini(model="models/gemini-1.5-flash", api_key=gemini_api_key)

# # Mark the LLM and embedding model to be used as default
# Settings.embed_model = gemini_embedding_model
# Settings.llm = gemini_llm

# # Read the player data from the directory
# reader = SimpleDirectoryReader(input_dir="/content/player_wiki/")
# player_documents = reader.load_data()

# # Create a in-memory vector store index from the player documents
# player_index = VectorStoreIndex.from_documents(documents=player_documents[:1])
# player_query_engine = player_index.as_query_engine()

# # Read the team data from the directory
# team_reader = SimpleDirectoryReader(input_dir="/content/team_wiki/")
# team_documents = team_reader.load_data()

# # Create a in-memory vector store index from the player documents
# team_index = VectorStoreIndex.from_documents(documents=team_documents[:1])
# team_query_engine = team_index.as_query_engine()

# player_tool = QueryEngineTool.from_defaults(
#     query_engine=player_query_engine,
#     description=(
#         "Useful for queries that are about specific cricket players."
#     ),
# )

# team_tool = QueryEngineTool.from_defaults(
#     query_engine=team_query_engine,
#     description=(
#         "Useful for queries that are about IPL Teams."
#     ),
# )

# query_engine = RouterQueryEngine(
#     selector=LLMSingleSelector.from_defaults(),
#     query_engine_tools=[
#         player_tool,
#         team_tool,
#     ],
# )

# response = query_engine.query("Who is Chetan sakariya ?")
# print(response)


### Sub-question query engine

Sub-question query engine in simple terms takes in a very complex query , and breaks it down into smaller sub-queries, searches for answers for each of these subqueries, and then generates a final response based on the intermediate answers.

![img](https://docs.llamaindex.ai/en/v0.10.19/_images/multi_step_diagram.png)

In [None]:
# from llama_index.llms.gemini import Gemini
# from llama_index.embeddings.gemini import GeminiEmbedding
# from llama_index.core import SimpleDirectoryReader, VectorStoreIndex
# from google.colab import userdata
# from llama_index.core import Settings
# from llama_index.core.tools import QueryEngineTool
# from llama_index.core.query_engine import SubQuestionQueryEngine

# gemini_api_key = userdata.get("GOOGLE_API_KEY")

# # Initialise the gemini embedding model to be used for search
# gemini_embedding_model = GeminiEmbedding(api_key=gemini_api_key, model_name="models/embedding-001")

# # Initialise Gemini Flash LLM to be used for generating answers
# gemini_llm = Gemini(model="models/gemini-1.5-flash", api_key=gemini_api_key)

# # Mark the LLM and embedding model to be used as default
# Settings.embed_model = gemini_embedding_model
# Settings.llm = gemini_llm

# # Read the player data from the directory
# reader = SimpleDirectoryReader(input_dir="/content/player_wiki/")
# player_documents = reader.load_data()

# # Create a in-memory vector store index from the player documents
# player_index = VectorStoreIndex.from_documents(documents=player_documents[:1])
# player_query_engine = player_index.as_query_engine()

# query_engine_tools = [
#     QueryEngineTool.from_defaults(
#         query_engine=player_query_engine,
#         description="Useful for answering queries about cricket players"
#     ),
# ]

# query_engine = SubQuestionQueryEngine.from_defaults(
#     query_engine_tools=query_engine_tools,
#     use_async=False
# )

# response = query_engine.query(
#     "How was chetan sakariya's life before and after his IPL debut ?"
# )

# print(response)

## Strategies while answering queries

### Setting up guard rails

When building a bot that can answer queries, it is important we set up guard-rails (or) rules that specify how the bot should interact with the user.

Till now, we haven't done it. Go to any of the previous code examples and try asking ***what is distance between the moon and earth***, and it'll answer.

When we build and deploy such a bot to customer facing scenarios, we don't really want it to answer these queries. We want it to answer only queries that we're interested in.

Let's see how we can do this.

In [None]:
# from llama_index.llms.gemini import Gemini
# from llama_index.embeddings.gemini import GeminiEmbedding
# from llama_index.core import SimpleDirectoryReader, VectorStoreIndex
# from google.colab import userdata
# from llama_index.core import Settings
# from llama_index.core import PromptTemplate

# gemini_api_key = userdata.get("GOOGLE_API_KEY")

# # Initialise the gemini embedding model to be used for search
# gemini_embedding_model = GeminiEmbedding(api_key=gemini_api_key, model_name="models/embedding-001")

# # Initialise Gemini Flash LLM to be used for generating answers
# gemini_llm = Gemini(model="models/gemini-1.5-flash", api_key=gemini_api_key)

# # Mark the LLM and embedding model to be used as default
# Settings.embed_model = gemini_embedding_model
# Settings.llm = gemini_llm

# # Read the player data from the directory
# reader = SimpleDirectoryReader(input_dir="/content/player_wiki/")
# player_documents = reader.load_data()

# # Create a in-memory vector store index from the player documents
# player_index = VectorStoreIndex.from_documents(documents=player_documents[:1])
# player_query_engine = player_index.as_query_engine()

# response = player_query_engine.query("what is 1+1 ?")

# print("LLM Response: ", response)

# # Implement guard-rail in prompt to avoid the LLM answering a different query
# qa_prompt_tmpl_str = (
#     "Context information is below.\n"
#     "---------------------\n"
#     "{context_str}\n"
#     "---------------------\n"
#     "Given the context information and not prior knowledge, "
#     "answer the query.\n"
#     "You must only answer queries that are regarding cricket.If the user asks about something else, say I'm sorry, I can't answer that.\n"
#     "Query: {query_str}\n"
#     "Answer: "
# )
# qa_prompt_tmpl = PromptTemplate(qa_prompt_tmpl_str)
# player_query_engine.update_prompts(
#     {"response_synthesizer:text_qa_template": qa_prompt_tmpl}
# )

# response = player_query_engine.query("what is 1+1 ?")
# print("LLM Response after setting up guardrails: ", response)



> Still not convinced why we need to setup guardrails , see this https://www.reddit.com/r/OpenAI/comments/18kjwcj/why_pay_indeed/



### Augmenting context with few-shot examples

In [None]:
# from llama_index.llms.gemini import Gemini
# from llama_index.embeddings.gemini import GeminiEmbedding
# from llama_index.core import SimpleDirectoryReader, VectorStoreIndex
# from google.colab import userdata
# from llama_index.core import Settings
# from llama_index.core import PromptTemplate

# gemini_api_key = userdata.get("GOOGLE_API_KEY")

# # Initialise the gemini embedding model to be used for search
# gemini_embedding_model = GeminiEmbedding(api_key=gemini_api_key, model_name="models/embedding-001")

# # Initialise Gemini Flash LLM to be used for generating answers
# gemini_llm = Gemini(model="models/gemini-1.5-flash", api_key=gemini_api_key)

# # Mark the LLM and embedding model to be used as default
# Settings.embed_model = gemini_embedding_model
# Settings.llm = gemini_llm

# # Read the player data from the directory
# reader = SimpleDirectoryReader(input_dir="/content/player_wiki/")
# player_documents = reader.load_data()

# # Create a in-memory vector store index from the player documents
# player_index = VectorStoreIndex.from_documents(documents=player_documents[:1])
# player_query_engine = player_index.as_query_engine()

# def few_shot_examples(**kwargs):
#   return  """
#     Question: Which team did chetan Sakariya play for ?
#     Answer: He played for Saurashtra in Ranji trophy.

#     Question: Which IPL Teams did Sakariya play for ?
#     Answer: He played for rajasthan royals and Delhi Capitals.
#   """

# qa_prompt_tmpl_str = """\
# Context information is below.
# ---------------------
# {context_str}
# ---------------------
# Given the context information and not prior knowledge, \
# answer the query asking about citations over different topics.
# Please provide your answer in the form of a structured JSON format containing \
# a list of authors as the citations. Some examples are given below.

# {few_shot_examples}

# Query: {query_str}
# Answer: \
# """

# qa_prompt_tmpl = PromptTemplate(
#     qa_prompt_tmpl_str,
#     function_mappings={"few_shot_examples": few_shot_examples},
# )

# player_query_engine.update_prompts(
#     {"response_synthesizer:text_qa_template": qa_prompt_tmpl}
# )


# response = player_query_engine.query("Teams that Sakariya played for ?")

# print(response)



---



# Advanced RAG

## Agents with reasoning capabilities - ReAct framework

***ReAct*** is technique which enable LLMs to do reasoning and take task specific actions. It combines chain of thought reasoning with action planning.It enables LLMs to generate reasoning traces and task-specific actions, leveraging the synergy between them. This approach demonstrates superior performance over baselines in various tasks, overcoming issues like hallucination and error propagation.

![img](https://statusneo.com/wp-content/uploads/2024/01/fe9fa1ac-dfde-4d91-8b5b-4497b742c414_1400x686.jpg)

In [None]:
# from llama_index.core.agent import ReActAgent
# from llama_index.core.tools import  QueryEngineTool
# from llama_index.core import SimpleDirectoryReader, VectorStoreIndex
# from google.colab import userdata
# from llama_index.core import Settings
# from llama_index.embeddings.gemini import GeminiEmbedding
# from llama_index.llms.gemini import Gemini

# gemini_api_key = userdata.get("GOOGLE_API_KEY")

# # Initialise Gemini Flash LLM to be used for generating answers
# gemini_llm = Gemini(model="models/gemini-1.0-pro", api_key=gemini_api_key)

# # Initialise the gemini embedding model to be used for search
# gemini_embedding_model = GeminiEmbedding(api_key=gemini_api_key, model_name="models/embedding-001")

# # Mark the LLM and embedding model to be used as default
# Settings.embed_model = gemini_embedding_model
# Settings.llm = gemini_llm

# # Read the player data from the directory
# reader = SimpleDirectoryReader(input_files=["/content/player_wiki/Ruturaj Gaikwad.txt"])
# player_documents = reader.load_data()

# # Create a in-memory vector store index from the player documents
# player_index = VectorStoreIndex.from_documents(documents=player_documents)
# player_query_engine = player_index.as_query_engine()

# # Read the team data from the directory
# team_reader = SimpleDirectoryReader(input_files=["/content/team_wiki/Chennai Super Kings.txt"])
# team_documents = team_reader.load_data()

# # Create a in-memory vector store index from the player documents
# team_index = VectorStoreIndex.from_documents(documents=team_documents)
# team_query_engine = team_index.as_query_engine()

# player_query_tool = QueryEngineTool.from_defaults(
#     query_engine=player_query_engine,
#     name="player_query_tool",
#     description=(
#         "Useful for queries that are about specific cricket players"
#     ),
# )

# team_query_tool = QueryEngineTool.from_defaults(
#     query_engine=team_query_engine,
#     name="team_query_tool",
#     description=(
#         "Useful for queries that are about IPL Teams"
#     ),
# )

# agent = ReActAgent.from_tools([player_query_tool, team_query_tool], verbose=True)

# response = agent.chat("Can you tell me who's the new captain of CSK, and share a brief history of his playing career")
# # response = agent.chat("Compare and contrast the performance of CSK before and after the 2021 season.")

# print(response)

# Building apps with ease with LlamaIndex

## Chat with your PDF

### Install dependencies

In [None]:
%pip install pymupdf

### Downloading a sample PDF

In [None]:
!wget -O test.pdf https://arxiv.org/pdf/2406.11652

### Let's write some code

In [None]:
# from pathlib import Path
# from llama_index.core import download_loader
# from llama_index.core.indices import VectorStoreIndex
# from llama_index.core import Settings
# from llama_index.embeddings.gemini import GeminiEmbedding
# from llama_index.llms.gemini import Gemini
# from google.colab import userdata
# from llama_index.readers.file import PyMuPDFReader

# # from llama_index.readers.file.unstructured import UnstructuredReader

# gemini_api_key = userdata.get("GOOGLE_API_KEY")

# # Initialise Gemini Flash LLM to be used for generating answers
# gemini_llm = Gemini(model="models/gemini-1.5-flash", api_key=gemini_api_key)

# # Initialise the gemini embedding model to be used for search
# gemini_embedding_model = GeminiEmbedding(api_key=gemini_api_key, model_name="models/embedding-001")

# # Mark the LLM and embedding model to be used as default
# Settings.embed_model = gemini_embedding_model
# Settings.llm = gemini_llm


# loader = PyMuPDFReader()
# documents = loader.load(file_path="/content/test.pdf")

# index = VectorStoreIndex.from_documents(documents=documents)

# query_engine = index.as_query_engine()

# response = query_engine.query("What Performance indicators were used ?")

# print(response)

## Building a simple copilot for your website

In [None]:
%pip install llama-index-readers-web

In [None]:
# from llama_index.core import VectorStoreIndex
# from llama_index.readers.web import SimpleWebPageReader
# from google.colab import userdata
# from llama_index.embeddings.gemini import GeminiEmbedding
# from llama_index.llms.gemini import Gemini

# gemini_api_key = userdata.get("GOOGLE_API_KEY")

# # Initialise Gemini Flash LLM to be used for generating answers
# gemini_llm = Gemini(model="models/gemini-1.5-flash-latest", api_key=gemini_api_key)

# # Initialise the gemini embedding model to be used for search
# gemini_embedding_model = GeminiEmbedding(api_key=gemini_api_key, model_name="models/embedding-001")

# # Mark the LLM and embedding model to be used as default
# Settings.embed_model = gemini_embedding_model
# Settings.llm = gemini_llm


# documents = SimpleWebPageReader(html_to_text=True).load_data(
#     ["https://www.thecricketmonthly.com/story/1442710/baracara-s-speed-demon--the-shamar-joseph-story"]
# )


# index = VectorStoreIndex.from_documents(documents)
# query_engine = index.as_query_engine()

# response = query_engine.query("Who is shamar joseph?")

# print(response)


# Q & A's

![test.png](https://thumb.ac-illust.com/86/862a3ccb34edb9da6cd42cffbeeb0bfa_t.jpeg)


