<a href="https://colab.research.google.com/github/isamdr86/towards-ai/blob/main/notebooks/04-RAG_with_VectorStore_ir.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Install Packages and Setup Variables


In [1]:
!pip install -q llama-index==0.10.57 llama-index-llms-gemini==0.1.11 openai==1.37.0 google-generativeai==0.5.4 httpx==0.27.2 cohere==5.6.2 tiktoken==0.7.0 --force-reinstall --quiet

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.0/3.0 MB[0m [31m20.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m65.5/65.5 kB[0m [31m6.2 MB/s[0m eta [36m0:00:00[0m
[?25h[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
chromadb 0.5.23 requires tokenizers<=0.20.3,>=0.13.2, but you have tokenizers 0.21.0 which is incompatible.
cudf-cu12 24.10.1 requires pandas<2.2.3dev0,>=2.0, but you have pandas 2.2.3 which is incompatible.
gcsfs 2024.10.0 requires fsspec==2024.10.0, but you have fsspec 2024.12.0 which is incompatible.
google-colab 1.0.0 requires google-auth==2.27.0, but you have google-auth 2.37.0 which is incompatible.
google-colab 1.0.0 requires pandas==2.2.2, but you have pandas 2.2.3 which is incompatible.
jupyter-server 1.24.0 requires anyio<4,>=3.1.0, but you have anyio 4.8.0 which

In [2]:
!pip install -q llama-index-vector-stores-chroma langchain_google_genai langchain==0.1.17 langchain-chroma langchain_openai==0.1.5 chromadb


[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
cudf-cu12 24.10.1 requires pandas<2.2.3dev0,>=2.0, but you have pandas 2.2.3 which is incompatible.
gcsfs 2024.10.0 requires fsspec==2024.10.0, but you have fsspec 2024.12.0 which is incompatible.
jupyter-server 1.24.0 requires anyio<4,>=3.1.0, but you have anyio 4.8.0 which is incompatible.
transformers 4.47.1 requires tokenizers<0.22,>=0.21, but you have tokenizers 0.20.3 which is incompatible.[0m[31m
[0m

In [3]:
import os

from google.colab import userdata
os.environ["OPENAI_API_KEY"] = userdata.get('openai_api_key')
os.environ["GOOGLE_API_KEY"] = userdata.get('google_api_key')

## LangChain vs LlamaIndex

Both LangChain and LlamaIndex frameworks offer distinct features for development with LLMs:

- LlamaIndex specializes in search and retrieval applications, emphasizing fast data retrieval and concise response generation with LLMs.

- LangChain is a multi-purpose framework suitable for applications such as chatbots and virtual assistants with a rapid prototyping capacity and application development ease.



# Load the Dataset (CSV)


## Download


The dataset includes several articles from the TowardsAI blog, which provide an in-depth explanation of the LLaMA2 model. Read the dataset as a long string.


In [4]:
!curl -o ./mini-dataset.csv https://raw.githubusercontent.com/AlaFalaki/tutorial_notebooks/main/data/mini-llama-articles.csv

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  169k  100  169k    0     0   445k      0 --:--:-- --:--:-- --:--:--  446k


## Read File


In [5]:
import csv

text = ""

# Load the file as a JSON
with open("./mini-dataset.csv", mode="r", encoding="utf-8") as file:
    csv_reader = csv.reader(file)

    for idx, row in enumerate(csv_reader):
        if idx == 0:
            continue
        text += row[1]

# The number of characters in the dataset.
print(len(text))

171044


# Chunking


In [6]:
chunk_size = 512
chunks = []

# Split the long text into smaller manageable chunks of 512 characters.
for i in range(0, len(text), chunk_size):
    chunks.append(text[i : i + chunk_size])

print(len(chunks))

335


#Interface of Chroma with LlamaIndex


In [7]:
from llama_index.core import Document

# Convert the chunks to Document objects so the LlamaIndex framework can process them.
documents = [Document(text=t) for t in chunks]

Save on Chroma


In [9]:
import chromadb

# create client and a new collection
# chromadb.EphemeralClient saves data in-memory.
chroma_client = chromadb.PersistentClient(path="./mini-chunked-dataset")
chroma_collection = chroma_client.create_collection("mini-chunked-dataset2")

In [10]:
from llama_index.vector_stores.chroma import ChromaVectorStore
from llama_index.core import StorageContext

# Define a storage context object using the created vector database.
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)

In [11]:
from llama_index.core import VectorStoreIndex
from llama_index.core.node_parser import SentenceSplitter
from llama_index.embeddings.openai import OpenAIEmbedding

# Build index / generate embeddings using OpenAI embedding model
index = VectorStoreIndex.from_documents(
    documents,
    embed_model=OpenAIEmbedding(model="text-embedding-3-small"),
    storage_context=storage_context,
    show_progress=True,
)

Parsing nodes:   0%|          | 0/335 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/335 [00:00<?, ?it/s]

Query Dataset


In [12]:
# Define a query engine that is responsible for retrieving related pieces of text,
# and using a LLM to formulate the final answer.

from llama_index.llms.gemini import Gemini

llm = Gemini(model="models/gemini-1.5-flash", temperature=1, max_tokens=512)

query_engine = index.as_query_engine(llm=llm, similarity_top_k=5)

In [14]:
response = query_engine.query("How many parameters LLaMA2 model has?")
print(response)

ValueError: Expected where to have exactly one operator, got {} in query.

# Interface of Chroma with LangChain


In [15]:
from langchain.schema.document import Document

# Convert the chunks to Document objects so the LangChain framework can process them.
documents = [Document(page_content=t) for t in chunks]

Save on Chroma


In [16]:
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings

# Add the documents to chroma DB and create Index / embeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
chroma_db = Chroma.from_documents(
    documents=documents,
    embedding=embeddings,
    persist_directory="./mini-chunked-dataset",
    collection_name="mini-chunked-dataset",
)

Query Dataset


In [17]:
from langchain_google_genai import ChatGoogleGenerativeAI

# Initializing the LLM model
#llm = ChatOpenAI(temperature=0, model="gpt-4o-mini", max_tokens=512)

llm = ChatGoogleGenerativeAI(
    model="gemini-1.5-flash",
    temperature=0,
    max_tokens=512,
)

In [18]:
from langchain.chains import RetrievalQA

query = "How many parameters LLaMA 2 model has?"
retriever = chroma_db.as_retriever(search_kwargs={"k": 4})
# Define a RetrievalQA chain that is responsible for retrieving related pieces of text,
# and using a LLM to formulate the final answer.
chain = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=retriever)

response = chain.invoke(query)
print(response["result"])

The LLaMA 2 model comes in four sizes: 7 billion, 13 billion, 34 billion, and 70 billion parameters.

