Agents-Concepts/3-LlamaIndex/

In [4]:
# pip install llama-index-llms-huggingface-api llama-index-embeddings-huggingface

In [5]:
# !pip install python-dotenv

In [9]:
# !pip install llama-index-llms-groq

## Simple setup

In [16]:
from llama_index.llms.groq import Groq
import os
from google.colab import userdata

llm = Groq(model="llama3-70b-8192", api_key=userdata.get('groq_api'))

response = llm.complete("Hello, how are you?")
print(response)


I'm just a language model, I don't have emotions or feelings like humans do, so I don't have good or bad days. However, I'm functioning properly and ready to assist you with any questions or tasks you may have! How can I help you today?


In [27]:
# pip install llama-index-vector-stores-chroma

`SimpleDirectoryReader`: A built-in loader for various file types from a local directory.

`LlamaParse`: LlamaParse, LlamaIndex’s official tool for PDF parsing, available as a managed API.

`LlamaHub`: A registry of hundreds of data-loading libraries to ingest data from any source.

## Reading Documents

In [35]:
from llama_index.core import SimpleDirectoryReader

reader = SimpleDirectoryReader(input_dir="/content/files/")
documents = reader.load_data()



After loading our documents, we need to break them into smaller pieces called Node objects. A Node is just a chunk of text from the original document that’s easier for the AI to work with, while it still has references to the original Document object.



The `IngestionPipeline` helps us create these nodes through two key transformations.

`SentenceSplitter` breaks down documents into manageable chunks by splitting them at natural sentence boundaries.

`HuggingFaceEmbedding` converts each chunk into numerical embeddings - vector representations that capture the semantic meaning in a way AI can process efficiently.

In [21]:
from llama_index.core import Document
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.ingestion import IngestionPipeline

# create the pipeline with transformations
pipeline = IngestionPipeline(
    transformations=[
        SentenceSplitter(chunk_overlap=0),
        HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5"),
    ]
)

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/94.8k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/52.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/743 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/133M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/366 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [None]:
nodes = await pipeline.arun(documents=documents)

Metadata length (12) is close to chunk size (25). Resulting chunks are less than 50 tokens. Consider increasing the chunk size or decreasing the size of your metadata to avoid this.
Metadata length (12) is close to chunk size (25). Resulting chunks are less than 50 tokens. Consider increasing the chunk size or decreasing the size of your metadata to avoid this.
Metadata length (12) is close to chunk size (25). Resulting chunks are less than 50 tokens. Consider increasing the chunk size or decreasing the size of your metadata to avoid this.
Metadata length (12) is close to chunk size (25). Resulting chunks are less than 50 tokens. Consider increasing the chunk size or decreasing the size of your metadata to avoid this.
Metadata length (12) is close to chunk size (25). Resulting chunks are less than 50 tokens. Consider increasing the chunk size or decreasing the size of your metadata to avoid this.
Metadata length (11) is close to chunk size (25). Resulting chunks are less than 50 tokens

In [41]:
Document.example()

Document(id_='b1fbe37c-d5e2-4fa3-a39f-c418315606b1', embedding=None, metadata={'filename': 'README.md', 'category': 'codebase'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, metadata_template='{key}: {value}', metadata_separator='\n', text_resource=MediaResource(embeddings=None, data=None, text='\nContext\nLLMs are a phenomenal piece of technology for knowledge generation and reasoning.\nThey are pre-trained on large amounts of publicly available data.\nHow do we best augment LLMs with our own private data?\nWe need a comprehensive toolkit to help perform this data augmentation for LLMs.\n\nProposed Solution\nThat\'s where LlamaIndex comes in. LlamaIndex is a "data framework" to help\nyou build LLM  apps. It provides the following tools:\n\nOffers data connectors to ingest your existing data sources and data formats\n(APIs, PDFs, docs, SQL, etc.)\nProvides ways to structure your data (indices, graphs) so that this data can be\neasily used with LLMs.

## Storing in ChromaDB

After creating our Node objects we need to index them to make them searchable, but before we can do that, we need a place to store our data.



In [28]:
import chromadb
from llama_index.vector_stores.chroma import ChromaVectorStore

db = chromadb.PersistentClient(path="./alfred_chroma_db")

In [29]:
chroma_collection = db.get_or_create_collection("alfred")

## Creating a vector store

In [31]:
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)

In [30]:
pipeline = IngestionPipeline(
    transformations=[
        SentenceSplitter(chunk_size=25, chunk_overlap=0),
        HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5"),
    ],
    vector_store=vector_store,
)

This pipeline will automatically send the nodes to the chromadb

In [None]:
nodes = await pipeline.arun(documents=documents)

## Using the stored embeddings

This is where vector embeddings come in - by embedding both the query and nodes in the same vector space, we can find relevant matches. The VectorStoreIndex handles this for us, using the same embedding model we used during ingestion to ensure consistency.

In [32]:
from llama_index.core import VectorStoreIndex
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")
index = VectorStoreIndex.from_vector_store(vector_store, embed_model=embed_model)

## Querying a VectorStoreIndex with prompts and LLMs

`as_retriever`: For basic document retrieval, returning a list of NodeWithScore objects with similarity scores

`as_query_engine`: For single question-answer interactions, returning a written response

`as_chat_engine`: For conversational interactions that maintain memory across multiple messages, returning a written response using chat history and indexed context


`refine`: create and refine an answer by sequentially going through each retrieved text chunk. This makes a separate LLM call per Node/retrieved chunk.

`compact (default)`: similar to refining but concatenating the chunks beforehand, resulting in fewer LLM calls.

`tree_summarize`: create a detailed answer by going through each retrieved text chunk and creating a tree structure of the answer.

In [34]:
query_engine = index.as_query_engine(
    llm=llm,
    response_mode="tree_summarize",
)
query_engine.query("What is the use of computer network")

Response(response='Empty Response', source_nodes=[], metadata=None)

## Evaluation and observability

LlamaIndex provides built-in evaluation tools to assess response quality. These evaluators leverage LLMs to analyze responses across different dimensions. Let’s look at the three main evaluators available:

`FaithfulnessEvaluator`: Evaluates the faithfulness of the answer by checking if the answer is supported by the context.

`AnswerRelevancyEvaluator`: Evaluate the relevance of the answer by checking if the answer is relevant to the question.

`CorrectnessEvaluator`: Evaluate the correctness of the answer by checking if the answer is correct.

In [None]:
from llama_index.core.evaluation import FaithfulnessEvaluator

# query index
evaluator = FaithfulnessEvaluator(llm=llm)
response = query_engine.query(
    "What battles took place in New York City in the American Revolution?"
)
eval_result = evaluator.evaluate_response(response=response)
eval_result.passing