<h3 style='color:#1D3E06'><b>TABLE OF CONTENTS</b></h3>

* [1. Initialization](#1)
* [2. Importing the required libraries](#2)
* [3. Data Crawling and Cleaning](#3)
  * [3.1 Data Crawling](#3.1)
  * [3.2 Data Cleaning](#3.2)
* [4. Chunking and Vector Embeddings](#4)
  * [4.1 LLM Loading](#4.1)
  * [4.2 Data Loading and Chunking](#4.2)
  * [4.3 Data Embedding and Vector Store ](#4.3)
* [5. Query Engine](#5)
  * [5.1 Public Testcases ](#5.1)
  * [5.2 Router Query Engine](#5.2)
  * [5.3 User Interface Demo](#5.3)
* [6. Conclusion](#6)

<a id="1"></a>
## <div style="text-align: left; background-color:#DEF5B9; font-family: Trebuchet MS; color:#1D3E06; padding: 15px; line-height:1;border-radius:1px; margin-bottom: 0em; text-align: center; font-size: 25px;border-style: solid;border-color: dark green">1. Initialization  </div>

In [1]:
!nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Aug_15_22:02:13_PDT_2023
Cuda compilation tools, release 12.2, V12.2.140
Build cuda_12.2.r12.2/compiler.33191640_0


In [2]:
!nvidia-smi

Fri Jul 26 11:11:51 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  Tesla T4                       Off | 00000000:00:04.0 Off |                    0 |
| N/A   71C    P8              11W /  70W |      0MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                    

In [3]:
!pip install llama-index llama-index-llms-huggingface llama-index-vector-stores-qdrant llama-index-embeddings-huggingface transformers



In [4]:
!pip install accelerate bitsandbytes



In [5]:
!pip install flash_attn



In [6]:
!pip install gradio



<a id="2"></a>
## <div style="text-align: left; background-color:#DEF5B9; font-family: Trebuchet MS; color:#1D3E06; padding: 15px; line-height:1;border-radius:1px; margin-bottom: 0em; text-align: center; font-size: 25px;border-style: solid;border-color: dark green">2. Importing the required libraries 📚</div>

In [7]:
import os
import re
import torch
import requests
import getpass
from bs4 import BeautifulSoup
from tqdm import tqdm
import pandas as pd
from time import time
import gradio as gr
from transformers import BitsAndBytesConfig
from llama_index.llms.huggingface import HuggingFaceLLM
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core import Settings, Document, VectorStoreIndex, StorageContext
from llama_index.core.prompts import PromptTemplate
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.storage.docstore import SimpleDocumentStore
from llama_index.core.tools import QueryEngineTool, ToolMetadata
from llama_index.core.response.notebook_utils import display_response
from llama_index.core.query_engine import RouterQueryEngine
from llama_index.core.agent import ReActAgent, FunctionCallingAgentWorker, AgentRunner
from qdrant_client import QdrantClient
from llama_index.vector_stores.qdrant import QdrantVectorStore




<a id="3"></a>
## <div style="text-align: left; background-color:#DEF5B9; font-family: Trebuchet MS; color:#1D3E06; padding: 15px; line-height:1;border-radius:1px; margin-bottom: 0em; text-align: center; font-size: 25px;border-style: solid;border-color: dark green">3. Data Crawling and Cleaning 🛠️</div>

<a id="3.1"></a>
### <div style="text-align: left; background-color:#F0DCED; font-family:Trebuchet MS;color:#8F2A46; padding: 14px; line-height: 1;border-radius:10px;border-style: solid;border-color: dark pink">3.1 Data Crawling 🍷</div>

In [8]:
# Set up headers to mimic a browser request
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
}

# Making a GET request with headers
try:
    r = requests.get('https://www.llamaindex.ai/blog', headers=headers, timeout=10)
    r.raise_for_status()  # Raise an exception for bad status codes

    # Print status code
    print(f"Status Code: {r.status_code}")

    # Parsing the HTML
    soup = BeautifulSoup(r.content, 'html.parser')

    # Print the title of the page
    print(f"Page Title: {soup.title.string if soup.title else 'No title found'}")

except requests.RequestException as e:
    print(f"An error occurred: {e}")

Status Code: 200
Page Title: Blog — LlamaIndex, Data Framework for LLM Applications


In [9]:
# Find all blog post cards
blog_cards = soup.find_all('div', class_='CardBlog_card__mm0Zw')
base_url = "https://www.llamaindex.ai"

# Extract and print the main text from each card
for card in blog_cards:
    # Extract title
    title_element = card.find('p', class_='CardBlog_title__qC51U').find('a')
    title = title_element.text.strip()
    url = base_url + title_element['href']

    # Extract publication date
    date = card.find('p', class_='Text_text__zPO0D Text_text-size-16__PkjFu').text.strip()

    # Print the extracted information
    print(f"Title: {title}")
    print(f"Date: {date}")
    print(f"URL: {url}")
    print("---")

Title: Introducing LlamaExtract Beta: structured data extraction in just a few clicks
Date: Jul 25, 2024
URL: https://www.llamaindex.ai/blog/introducing-llamaextract-beta-structured-data-extraction-in-just-a-few-clicks
---
Title: LlamaIndex Newsletter 2024-07-23
Date: Jul 23, 2024
URL: https://www.llamaindex.ai/blog/llamaindex-newsletter-2024-07-23
---
Title: Improving Vector Search - Reranking with PostgresML and LlamaIndex
Date: Jul 19, 2024
URL: https://www.llamaindex.ai/blog/improving-vector-search-reranking-with-postgresml-and-llamaindex
---
Title: The latest updates to LlamaCloud
Date: Jul 19, 2024
URL: https://www.llamaindex.ai/blog/the-latest-updates-to-llamacloud
---
Title: Case Study: How Scaleport.ai Accelerated Development and Improved Sales with LlamaCloud
Date: Jul 17, 2024
URL: https://www.llamaindex.ai/blog/case-study-how-scaleport-ai-accelerated-development-and-improved-sales-with-llamacloud
---
Title: Building a multi-agent concierge system
Date: Jul 17, 2024
URL: htt

In [10]:
with open('llama_blog.txt', 'w') as f:
    for card in blog_cards:
        title_element = card.find('p', class_='CardBlog_title__qC51U').find('a')
        title = title_element.text.strip()
        url = base_url + title_element['href']
        date = card.find('p', class_='Text_text__zPO0D Text_text-size-16__PkjFu').text.strip()
        f.write(f"Title: {title}\n")
        f.write(f"Date: {date}\n")
        f.write(f"URL: {url}\n")
        f.write("---\n")

In [11]:
df = pd.DataFrame(columns=['source', 'title', 'url', 'date', 'content'])

for card in blog_cards:
    title_element = card.find('p', class_='CardBlog_title__qC51U').find('a')
    title = title_element.text.strip()
    source = title_element['href']
    url = base_url + title_element['href']
    date = card.find('p', class_='Text_text__zPO0D Text_text-size-16__PkjFu').text.strip()

    df.loc[len(df)] = [source, title, url, date, '']

df.to_csv('llama_blog.csv', index=False)

In [12]:
def crawl_website(url):
    # Set up headers to mimic a browser request
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
    }
    response = requests.get(url, headers=headers)

    # Check if the request was successful
    if response.status_code == 200:
        # Parse HTML content
        soup = BeautifulSoup(response.text, 'html.parser')
        content = soup.find('div', class_='BlogPost_htmlPost__Z5oDL')
        all_tags = content.find_all()

        tags_to_extract = ['h1', 'h2', 'h3', 'p', 'li', 'code', 'blockquote', 'em']

        for tag in all_tags:
          if tag.name in tags_to_extract:
            if tag.name == 'h1':
              print(f"Title: {tag.text.strip()}")
            elif tag.name == 'h2':
              print(f"\n\nSection: {tag.text.strip()}:")
            elif tag.name == 'h3':
              print(f"\n\nSubsection: {tag.text.strip()}:")
            elif tag.name == 'li':
              print(f"- {tag.text.strip()}")
            elif tag.name == 'code':
              print(f"\n{tag.text.strip()}")
            else:
              print(tag.text.strip())
    else:
        print(f"Failed to retrieve the page. Status code: {reponse.status_code}")

url = "https://www.llamaindex.ai/blog/introducing-llama-agents-a-powerful-framework-for-building-production-multi-agent-ai-systems"
crawl_website(url)

We're excited to announce the alpha release of llama-agents, a new open-source framework designed to simplify the process of building, iterating, and deploying multi-agent AI systems and turn your agents into production microservices. Whether you're working on complex question-answering systems, collaborative AI assistants, or distributed AI workflows, llama-agents provides the tools and structure you need to bring your ideas to life.

llama-agents


Section: Key Features of llama-agents:
- Distributed Service Oriented Architecture: every agent in LlamaIndex can be its own independently running microservice, orchestrated by a fully customizable LLM-powered control plane that routes and distributes tasks.
- Communication via standardized API interfaces: interface between agents using a central control plane orchestrator. Pass messages between agents using a message queue.
- Define agentic and explicit orchestration flows: developers have the flexibility to directly define the sequence o

In [13]:
# Extract the content of the first blog post
first_post_url = "https://www.llamaindex.ai/blog/testing-anthropic-claudes-100k-token-window-on-sec-10-k-filings-473310c20dba"

try:
    r = requests.get(first_post_url, headers=headers, timeout=10)
    r.raise_for_status()  # Raise an exception for bad status codes

    soup = BeautifulSoup(r.content, 'html.parser')
    content = soup.find('div', class_='BlogPost_htmlPost__Z5oDL').text.strip()


    with open('temp.txt', 'w') as f:
        f.write(content)


except requests.RequestException as e:
    print(f"An error occurred: {e}")

In [14]:
# Extract the content of all blog posts
for index in tqdm(df.index):
    url = df['url'][index]
    try:
        r = requests.get(url, headers=headers, timeout=10)
        r.raise_for_status()  # Raise an exception for bad status codes

        # Parsing the HTML
        soup = BeautifulSoup(r.content, 'html.parser')
        content = soup.find('div', class_='BlogPost_htmlPost__Z5oDL')
        all_tags = content.find_all()

        tags_to_extract = ['h1', 'h2', 'h3', 'p', 'li', 'code', 'blockquote', 'em']
        text = str("")

        # Extract the content from the tags
        for tag in all_tags:
          if tag.name in tags_to_extract:
            if tag.name == 'h1':
              text = text + f"\n\nSection: {tag.text.strip()}:\n"
            elif tag.name == 'h2':
              text = text + f"\n\nSubsection: {tag.text.strip()}:\n"
            elif tag.name == 'h3':
              text = text + f"\n\nSubSubsection: {tag.text.strip():}\n"
            elif tag.name == 'li':
              text = text + f"- {tag.text.strip()}\n"
            elif tag.name == 'code':
              text = text + f"\n{tag.text.strip()}\n"
            else:
              text = text + tag.text.strip() + "\n"

        # Update the DataFrame with the extracted content
        df.at[index, 'content'] = text

    except requests.RequestException as e:
        print(f"An error occurred: {e}")

# Save the DataFrame to a CSV file
df.to_csv('llama_blog.csv', index=False)

100%|██████████| 161/161 [00:52<00:00,  3.09it/s]


<a id="3.2"></a>
### <div style="text-align: left; background-color:#F0DCED; font-family:Trebuchet MS;color:#8F2A46; padding: 14px; line-height: 1;border-radius:10px;border-style: solid;border-color: dark pink">3.2 Data Cleaning 🧹</div>

In [15]:
# Load the CSV file
df = pd.read_csv('llama_blog.csv')
df.tail()

Unnamed: 0,source,title,url,date,content
156,/blog/using-llms-for-retrieval-and-reranking-2...,Using LLM’s for Retrieval and Reranking,https://www.llamaindex.ai/blog/using-llms-for-...,"May 17, 2023",\n\nSection: Summary:\nThis blog post outlines...
157,/blog/testing-anthropic-claudes-100k-token-win...,Testing Anthropic Claude’s 100k-token window o...,https://www.llamaindex.ai/blog/testing-anthrop...,"May 12, 2023","Anthropic’s 100K Context Window expansion, jus..."
158,/blog/llamaindex-on-twiml-ai-a-distilled-summa...,LlamaIndex on TWIML AI: A Distilled Summary (u...,https://www.llamaindex.ai/blog/llamaindex-on-t...,"May 10, 2023",\n\nSection: Overview:\nI had the pleasure of ...
159,/blog/a-new-document-summary-index-for-llm-pow...,A New Document Summary Index for LLM-powered Q...,https://www.llamaindex.ai/blog/a-new-document-...,"May 8, 2023","In this blog post, we introduce a brand new Ll..."
160,/blog/building-and-evaluating-a-qa-system-with...,Building and Evaluating a QA System with Llama...,https://www.llamaindex.ai/blog/building-and-ev...,"May 7, 2023",\n\nSection: Introduction:\nLlamaIndex (GPT In...


In [16]:
# Clean the text
def clean_text(text):
    text = re.sub(r'\s+', ' ', text)
    text = text.strip()
    return text

df['title'] = df['title'].apply(clean_text)
df['content'] = df['content'].apply(clean_text)

df.to_csv('llama_blog_clean.csv', index=False)

In [17]:
df = pd.read_csv('llama_blog_clean.csv')
df.head()

Unnamed: 0,source,title,url,date,content
0,/blog/introducing-llamaextract-beta-structured...,Introducing LlamaExtract Beta: structured data...,https://www.llamaindex.ai/blog/introducing-lla...,"Jul 25, 2024",Structured extraction from unstructured data i...
1,/blog/llamaindex-newsletter-2024-07-23,LlamaIndex Newsletter 2024-07-23,https://www.llamaindex.ai/blog/llamaindex-news...,"Jul 23, 2024","Hello, Llama Followers! 🦙 Welcome to this week..."
2,/blog/improving-vector-search-reranking-with-p...,Improving Vector Search - Reranking with Postg...,https://www.llamaindex.ai/blog/improving-vecto...,"Jul 19, 2024",Subsection: Search and Reranking: Improving Re...
3,/blog/the-latest-updates-to-llamacloud,The latest updates to LlamaCloud,https://www.llamaindex.ai/blog/the-latest-upda...,"Jul 19, 2024",To build a production-quality LLM agent over y...
4,/blog/case-study-how-scaleport-ai-accelerated-...,Case Study: How Scaleport.ai Accelerated Devel...,https://www.llamaindex.ai/blog/case-study-how-...,"Jul 17, 2024",Subsection: The Challenge: Streamlining AI Dev...


<a id="4"></a>
## <div style="text-align: left; background-color:#DEF5B9; font-family: Trebuchet MS; color:#1D3E06; padding: 15px; line-height:1;border-radius:1px; margin-bottom: 0em; text-align: center; font-size: 25px;border-style: solid;border-color: dark green">4 Chunking and Vector Embeddings 🖌️ </div>

<a id="4.1"></a>
### <div style="text-align: left; background-color:#F0DCED; font-family:Trebuchet MS;color:#8F2A46; padding: 14px; line-height: 1;border-radius:10px;border-style: solid;border-color: dark pink">4.1 LLM Loading </div>

In [18]:
HF_TOKEN = getpass.getpass('Enter your Hugging Face token: ')

Enter your Hugging Face token: ··········


In [19]:
torch.cuda.is_available()

True

In [20]:
# Define a function to convert messages to a prompt
def messages_to_prompt(messages):
  prompt = ""
  for message in messages:
    if message.role == 'system':
      prompt += f"<|system|>\n{message.content}</s>\n"
    elif message.role == 'user':
      prompt += f"<|user|>\n{message.content}</s>\n"
    elif message.role == 'assistant':
      prompt += f"<|assistant|>\n{message.content}</s>\n"

  # ensure we start with a system prompt, insert blank if needed
  if not prompt.startswith("<|system|>\n"):
    prompt = "<|system|>\n</s>\n" + prompt

  # add final assistant prompt
  prompt = prompt + "<|assistant|>\n"

  return prompt

# Define the quantization configuration
quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
)

# Define the LLM
llm = HuggingFaceLLM(
    model_name="HuggingFaceH4/zephyr-7b-alpha",
    tokenizer_name="HuggingFaceH4/zephyr-7b-alpha",
    query_wrapper_prompt=PromptTemplate("<|system|>\n</s>\n<|user|>\n{query_str}</s>\n<|assistant|>\n"),
    model_kwargs={"quantization_config": quantization_config, "token": HF_TOKEN},
    tokenizer_kwargs={"token": HF_TOKEN},
    generate_kwargs={"temperature": 0.7, "top_p": 0.95},
    messages_to_prompt=messages_to_prompt,
    device_map= 'cuda' if torch.cuda.is_available() else 'cpu',
)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Loading checkpoint shards:   0%|          | 0/8 [00:00<?, ?it/s]

<a id="4.2"></a>
### <div style="text-align: left; background-color:#F0DCED; font-family:Trebuchet MS;color:#8F2A46; padding: 14px; line-height: 1;border-radius:10px;border-style: solid;border-color: dark pink">4.2 Data Loading and Chunking </div>

In [21]:
# Create a list of Document objects from the DataFrame
documents = [
    Document(
        text=row['content'],
        metadata={
            'source': row['source'],
            'title': row['title'],
            'url': row['url'],
            'date': row['date'],
        },
    )
    for index, row in df.iterrows()
]

In [22]:
# Load embedding
Settings.llm = llm
Settings.embed_model = HuggingFaceEmbedding(model_name="dunzhang/stella_en_1.5B_v5")

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [23]:
# Use SentenceSplitter to generate a list of nodes from documents
nodes = SentenceSplitter(
    chunk_size=2048,
    chunk_overlap=256,
    paragraph_separator="\n\n",
).get_nodes_from_documents(documents)

In [24]:
nodes[0]

TextNode(id_='3c75b653-cf80-4370-a505-3b8d5aa51347', embedding=None, metadata={'source': '/blog/introducing-llamaextract-beta-structured-data-extraction-in-just-a-few-clicks', 'title': 'Introducing LlamaExtract Beta: structured data extraction in just a few clicks', 'url': 'https://www.llamaindex.ai/blog/introducing-llamaextract-beta-structured-data-extraction-in-just-a-few-clicks', 'date': 'Jul 25, 2024'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='a2577e04-9ee3-4e8f-805f-def3af8ffe43', node_type=<ObjectType.DOCUMENT: '4'>, metadata={'source': '/blog/introducing-llamaextract-beta-structured-data-extraction-in-just-a-few-clicks', 'title': 'Introducing LlamaExtract Beta: structured data extraction in just a few clicks', 'url': 'https://www.llamaindex.ai/blog/introducing-llamaextract-beta-structured-data-extraction-in-just-a-few-clicks', 'date': 'Jul 25, 2024'}, hash='b2dd97f36af5cddd37eb79eba51b

In [25]:
# Create a SimpleDocumentStore and add the documents to it
docstore = SimpleDocumentStore()
docstore.add_documents(documents)

<a id="4.3"></a>
### <div style="text-align: left; background-color:#F0DCED; font-family:Trebuchet MS;color:#8F2A46; padding: 14px; line-height: 1;border-radius:10px;border-style: solid;border-color: dark pink">4.3 Data Embedding and Vector Store </div>

In [26]:
# Prompt the user to enter their Qdrant API key
QDRANT_API_KEY = getpass.getpass('Enter your Qdrant API key: ')
QDRANT_URL = "https://ed3265f8-b409-4cff-9259-4d2ebbc2cc93.us-east4-0.gcp.cloud.qdrant.io:6333"
QDRANT_COLLECTION_NAME = "llama-index_collection"

# Initialize the Qdrant client with the provided URL and API key
qdrant_client = QdrantClient(
    url=QDRANT_URL,
    api_key=QDRANT_API_KEY,
)

# Attempt to delete the specified collection, ignore any errors
try:
  qdrant_client.delete_collection(QDRANT_COLLECTION_NAME)
except:
  pass

print(qdrant_client.get_collections())

Enter your Qdrant API key: ··········
collections=[]


In [27]:
# Initialize a QdrantVectorStore with the Qdrant client and collection name
vector_store = QdrantVectorStore(
    client=qdrant_client,
    collection_name=QDRANT_COLLECTION_NAME,
)

# Create a StorageContext using the vector store
storage_context = StorageContext.from_defaults(vector_store=vector_store)

# Create a VectorStoreIndex with the nodes and storage context
index = VectorStoreIndex(nodes, storage_context=storage_context)

We detected that you are passing `past_key_values` as a tuple and this is deprecated and will be removed in v4.43. Please use an appropriate `Cache` class (https://huggingface.co/docs/transformers/v4.41.3/en/internal/generation_utils#transformers.Cache)


In [28]:
print(qdrant_client.get_collections())

collections=[CollectionDescription(name='llama-index_collection')]


In [29]:
qdrant_client.count(
    collection_name=QDRANT_COLLECTION_NAME,
    exact=True,
)

CountResult(count=206)

<a id="5"></a>
## <div style="text-align: left; background-color:#DEF5B9; font-family: Trebuchet MS; color:#1D3E06; padding: 15px; line-height:1;border-radius:1px; margin-bottom: 0em; text-align: center; font-size: 25px;border-style: solid;border-color: dark green">5. Query Engine </div>

<a id="5.1"></a>
### <div style="text-align: left; background-color:#F0DCED; font-family:Trebuchet MS;color:#8F2A46; padding: 14px; line-height: 1;border-radius:10px;border-style: solid;border-color: dark pink">5.1 Public Testcases with QueryEngineTool</div>

In [53]:
# torch.cuda.empty_cache()
# from llama_index.core.postprocessor import SentenceTransformerRerank

# rerank = SentenceTransformerRerank(
#     model="dunzhang/stella_en_1.5B_v5", top_n=3
# )
# ERROR: Colab disk space having full

# Create a QueryEngineTool with the index
query_engine = index.as_query_engine(
    # node_post_processor=[rerank]
)

In [54]:
now = time()
response = query_engine.query("What are key features of llama-agents?")
display_response(response)

print(f"Source: {response.source_nodes[0].metadata['source']}")
print(f"Title: {response.source_nodes[0].metadata['title']}")
print(f"URL: {response.source_nodes[0].metadata['url']}")
print(f"Date: {response.source_nodes[0].metadata['date']}")
print(f"Elapsed: {round(time() - now, 2)}s")

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


**`Final Response:`** llama-agents are a type of RAG (Retrieval-as-Generation) application that uses LLM (Language Learning Model) to generate responses to user queries by retrieving and synthesizing relevant information from a large corpus of text. They are designed to provide accurate and relevant responses to complex and open-ended questions, and can handle a wide range of data sources and formats. Some key features of llama-agents include:

1. Retrieval: llama-agents use a retrieval system to identify the most relevant documents or passages for a given query. This can involve techniques such as vector search, keyword search, or full-text search.

2. Generation: llama-agents use an LLM to generate a response to the user query based on the retrieved information. This can involve techniques such as chain-of-thought reasoning, summarization, or question answering.

3. Context: llama-agents can handle large context windows, allowing for more accurate and relevant responses to complex and open-ended questions.

4. Personalization: llama-agents can learn from user interactions and preferences, allowing for more personalized and relevant responses over time

Source: /blog/one-click-open-source-rag-observability-with-langfuse
Title: One-click Open Source RAG Observability with Langfuse
URL: https://www.llamaindex.ai/blog/one-click-open-source-rag-observability-with-langfuse
Date: Mar 18, 2024
Elapsed: 27.22s


In [41]:
now = time()
response = query_engine.query("""What are the two critical areas of RAG system performance that are assessed in the "Evaluating RAG with LlamaIndex" section of the OpenAI Cookbook?""")
display_response(response)

print(f"Source: {response.source_nodes[0].metadata['source']}")
print(f"Title: {response.source_nodes[0].metadata['title']}")
print(f"URL: {response.source_nodes[0].metadata['url']}")
print(f"Date: {response.source_nodes[0].metadata['date']}")
print(f"Elapsed: {round(time() - now, 2)}s")

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


**`Final Response:`** The two critical areas of RAG system performance that are assessed in the "Evaluating RAG with LlamaIndex" section of the OpenAI Cookbook are the Retrieval System and Response Generation.

Source: /blog/evaluating-the-ideal-chunk-size-for-a-rag-system-using-llamaindex-6207e5d3fec5
Title: Evaluating the Ideal Chunk Size for a RAG System using LlamaIndex
URL: https://www.llamaindex.ai/blog/evaluating-the-ideal-chunk-size-for-a-rag-system-using-llamaindex-6207e5d3fec5
Date: Oct 5, 2023
Elapsed: 8.41s


In [42]:
now = time()
response = query_engine.query("What are the two main metrics used to evaluate the performance of the different rerankers in the RAG system?")
display_response(response)

print(f"Source: {response.source_nodes[0].metadata['source']}")
print(f"Title: {response.source_nodes[0].metadata['title']}")
print(f"URL: {response.source_nodes[0].metadata['url']}")
print(f"Date: {response.source_nodes[0].metadata['date']}")
print(f"Elapsed: {round(time() - now, 2)}s")

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


**`Final Response:`** The two main metrics used to evaluate the performance of the different rerankers in the RAG system are hit rate and MRR (mean reciprocal rank). These metrics are discussed in the context of the benchmarks presented in the blog post "Boosting RAG: Picking the Best Embedding & Reranker models" (https://www.llamaindex.ai/blog/boosting-rag-picking-the-best-embedding-reranker-models-42d079022e83).

Source: /blog/boosting-rag-picking-the-best-embedding-reranker-models-42d079022e83
Title: Boosting RAG: Picking the Best Embedding & Reranker models
URL: https://www.llamaindex.ai/blog/boosting-rag-picking-the-best-embedding-reranker-models-42d079022e83
Date: Nov 3, 2023
Elapsed: 15.14s


<a id="5.2"></a>
### <div style="text-align: left; background-color:#F0DCED; font-family:Trebuchet MS;color:#8F2A46; padding: 14px; line-height: 1;border-radius:10px;border-style: solid;border-color: dark pink">5.2 Router Query Engine </div>

In [43]:
# Create a QueryEngineTool for vector search and summary
vector_tool = QueryEngineTool(
    index.as_query_engine(),
    metadata=ToolMetadata(
        name="vector_search",
        description="Useful for searching for specific facts."
    )
)

summary_tool = QueryEngineTool(
    index.as_query_engine(response_mode = "tree_summarize"),
    metadata=ToolMetadata(
        name="summary",
        description="Useful for getting a summary of the document."
    )
)

In [44]:
from llama_index.core.query_engine import RouterQueryEngine

# Create a RouterQueryEngine with the vector and summary tools
query_engine = RouterQueryEngine.from_defaults(
    [vector_tool, summary_tool],
    verbose=True
)

# Query the engine with a question
response = query_engine.query("What are key features of llama-agents?")
display_response(response)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


[1;3;38;5;200mSelecting query engine 0: To find specific facts about llama-agents, such as their memory capacity, processing speed, or training methods, choice 1 would be the most relevant..
[0m

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


**`Final Response:`** llama-agents are a type of RAG (Retrieval-as-Generation) application that uses LLM (Language Learning Model) to generate responses to user queries by retrieving and synthesizing relevant information from a large corpus of text. They are designed to provide accurate and relevant responses to complex and open-ended questions, and can handle a wide range of data sources and formats. Some key features of llama-agents include:

1. Retrieval: llama-agents use a retrieval system to identify the most relevant documents or passages for a given query. This can involve techniques such as vector search, keyword search, or full-text search.

2. Generation: llama-agents use an LLM to generate a response to the user query based on the retrieved information. This can involve techniques such as chain-of-thought reasoning, summarization, or question answering.

3. Context: llama-agents can handle large context windows, allowing for more accurate and relevant responses to complex and open-ended questions.

4. Personalization: llama-agents can learn from user interactions and preferences, allowing for more personalized and relevant responses over time

<a id="5.3"></a>
### <div style="text-align: left; background-color:#F0DCED; font-family:Trebuchet MS;color:#8F2A46; padding: 14px; line-height: 1;border-radius:10px;border-style: solid;border-color: dark pink">5.3 User Interface Demo </div>

In [48]:
# Prompt the user to enter their input
prompt = input("Enter your prompt: ")

# Query the engine with the prompt
query_engine = index.as_query_engine()

now = time()
response = query_engine.query(prompt)
display_response(response)

print(f"**Final Response: ** {response.response}")
print(f"Source: {response.source_nodes[0].metadata['source']}")
print(f"Title: {response.source_nodes[0].metadata['title']}")
print(f"URL: {response.source_nodes[0].metadata['url']}")
print(f"Date: {response.source_nodes[0].metadata['date']}")
print(f"Elapsed: {round(time() - now, 2)}s")

Enter your prompt: What are key features of llama-agents?


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


**`Final Response:`** llama-agents are a type of RAG (Retrieval-as-Generation) application that uses LLM (Language Learning Model) to generate responses to user queries by retrieving and synthesizing relevant information from a large corpus of text. They are designed to provide accurate and relevant responses to complex and open-ended questions, and can handle a wide range of data sources and formats. Some key features of llama-agents include:

1. Retrieval: llama-agents use a retrieval system to identify the most relevant documents or passages for a given query. This can involve techniques such as vector search, keyword search, or full-text search.

2. Generation: llama-agents use an LLM to generate a response to the user query based on the retrieved information. This can involve techniques such as chain-of-thought reasoning, summarization, or question answering.

3. Context: llama-agents can handle large context windows, allowing for more accurate and relevant responses to complex and open-ended questions.

4. Personalization: llama-agents can learn from user interactions and preferences, allowing for more personalized and relevant responses over time

**Final Response: ** llama-agents are a type of RAG (Retrieval-as-Generation) application that uses LLM (Language Learning Model) to generate responses to user queries by retrieving and synthesizing relevant information from a large corpus of text. They are designed to provide accurate and relevant responses to complex and open-ended questions, and can handle a wide range of data sources and formats. Some key features of llama-agents include:

1. Retrieval: llama-agents use a retrieval system to identify the most relevant documents or passages for a given query. This can involve techniques such as vector search, keyword search, or full-text search.

2. Generation: llama-agents use an LLM to generate a response to the user query based on the retrieved information. This can involve techniques such as chain-of-thought reasoning, summarization, or question answering.

3. Context: llama-agents can handle large context windows, allowing for more accurate and relevant responses to complex and

In [51]:
# Create a RouterQueryEngine with the vector and summary tools
query_engine = RouterQueryEngine.from_defaults(
    [vector_tool, summary_tool],
    verbose=False
)

In [55]:
import gradio as gr

# Define a function to query the RAG system
def query_rag_system(question):
    response = query_engine.query(question).response
    return response

# Create Gradio interface
iface = gr.Interface(fn=query_rag_system,
                     inputs="text",
                     outputs="text",
                     title="LlamaIndex Chatbot",
                     description="Ask any question about LlamaIndex.")

# Launch the interface
iface.launch(share = True)

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
Running on public URL: https://7fbead98693bf7ecbe.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)




<a id="6"></a>
## <div style="text-align: left; background-color:#DEF5B9; font-family: Trebuchet MS; color:#1D3E06; padding: 15px; line-height:1;border-radius:1px; margin-bottom: 0em; text-align: center; font-size: 25px;border-style: solid;border-color: dark green">6. Conclusion </div>

<div style="border-radius:10px;
            border :#0A0104 solid;
            padding: 15px;
            font-size:110%;
            text-align: left">
This project successfully implemented a RAG-based question-answering system using LlamaIndex. The system shows promise in providing accurate and contextual responses to queries about LlamaIndex.