# RAG with LlamaIndex

![RAG LlamaIndex](https://media.licdn.com/dms/image/v2/D5622AQHrFw7nRO7GFg/feedshare-shrink_800/feedshare-shrink_800/0/1732033489014?e=1735171200&v=beta&t=gy1fBYDQHck9vsDld3EUjoVbJM8bJ_MXlCbiuF55Erk)

In [6]:
!pip install -q llama-index==0.10.57 llama-index-llms-gemini==0.1.11 openai==1.37.0 google-generativeai==0.5.4

In [7]:
import os

# Set the following API Keys in the Python environment. Will be used later.
os.environ["OPENAI_API_KEY"] = "Add your OpenAI API Key"
os.environ["GOOGLE_API_KEY"] = "Add your Google API Key"

In [31]:
!curl -o ./mani-dataset.csv https://raw.githubusercontent.com/pavanbelagatti/LlamaIndex-RAG-Demo/refs/heads/main/Articles-SS.csv

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 20329  100 20329    0     0  88431      0 --:--:-- --:--:-- --:--:-- 93682


In [32]:
import csv

rows = []

# Load the CSV file
with open("./mani-dataset.csv", mode="r", encoding="utf-8") as file:
    csv_reader = csv.reader(file)

    for idx, row in enumerate(csv_reader):
        if idx == 0:
            continue
            # Skip header row
        rows.append(row)

# The number of characters in the dataset.
print("number of articles:", len(rows))

number of articles: 9


In [33]:
from llama_index.core import Document

# Convert the texts to Document objects so the LlamaIndex framework can process them.
documents = [Document(text=row[1]) for row in rows]

In [34]:
documents[0]

Document(id_='502a18e5-7503-4b81-a7e4-4aadb3b75616', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, text='In the world of generative AI, that’s Amazon Web Services (AWS) and SingleStore.\n\nAs gen AI progresses from novelty to necessity, SingleStore and AWS have jointly committed to accelerating possibilities for gen AI at the enterprise level. SingleStore’s latest AWS gGen AI agreement supports machine learning and generative AI initiatives with tailored resources, funding and shared expertise.\n\nThe partnership is highly strategic, given our complementary strengths and vision.   \n\nAs the largest and most widely used cloud computing platform, AWS is a natural choice for creating AI agents that function autonomously and can be orchestrated into deterministic workflows with predictable outcomes. Such orchestration is pivotal for ensuring reliability and efficiency in AI-driven processes, and is part of what makes AWS a p

In [35]:
from llama_index.core import VectorStoreIndex
from llama_index.core.node_parser import SentenceSplitter
from llama_index.embeddings.openai import OpenAIEmbedding


# Build index / generate embeddings using OpenAI embedding model
index = VectorStoreIndex.from_documents(
    documents,
    transformations=[SentenceSplitter(chunk_size=768, chunk_overlap=64)],
    embed_model=OpenAIEmbedding(model="text-embedding-3-small"),
    show_progress=True,
)

Parsing nodes:   0%|          | 0/9 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/11 [00:00<?, ?it/s]

In [36]:
# Visualize Chunks and Chunks Overlap after the Sentence Splitter Transformation

documents = index.docstore.docs
for doc in documents.values():
  print(doc.text)
  print("-_-_-_-_-_-_-_-_")

In the world of generative AI, that’s Amazon Web Services (AWS) and SingleStore.

As gen AI progresses from novelty to necessity, SingleStore and AWS have jointly committed to accelerating possibilities for gen AI at the enterprise level. SingleStore’s latest AWS gGen AI agreement supports machine learning and generative AI initiatives with tailored resources, funding and shared expertise.

The partnership is highly strategic, given our complementary strengths and vision.   

As the largest and most widely used cloud computing platform, AWS is a natural choice for creating AI agents that function autonomously and can be orchestrated into deterministic workflows with predictable outcomes. Such orchestration is pivotal for ensuring reliability and efficiency in AI-driven processes, and is part of what makes AWS a platform of choice for developers in enterprises.
-_-_-_-_-_-_-_-_
How can you tell if a software or technology product is truly loved by its users?


Answering that question is

In [37]:
# Define a query engine that is responsible for retrieving related pieces of text,
# and using a LLM to formulate the final answer.

from llama_index.llms.gemini import Gemini

llm = Gemini(model="models/gemini-1.5-flash", temperature=1, max_tokens=512)

query_engine = index.as_query_engine(llm=llm)

In [38]:
response = query_engine.query("What is SingleStore's npm package name?")
print(response)

The npm package name is `@singlestore/client`.

