<a href="https://colab.research.google.com/github/isamdr86/towards-ai/blob/main/notebooks/Larger_Context_Larger_N_ir.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Install Packages and Setup Variables


In [1]:
!pip install -q huggingface_hub llama-index==0.10.37 tiktoken==0.7.0 chromadb==0.5.0 llama-index-llms-gemini==0.1.10 llama-index-vector-stores-chroma==0.1.7 llama-index-embeddings-openai

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/67.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m67.3/67.3 kB[0m [31m2.8 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.1/1.1 MB[0m [31m19.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m526.8/526.8 kB[0m [31m24.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.4/2.4 MB[0m [31m60.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m278.6/278.6 kB[0m [31m16.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m94.8/94.8 kB[0m [31m5.7 MB/s[0m eta [36m0:00:00

In [2]:
%%capture
!pip install openai==1.55.3 httpx==0.27.2 --force-reinstall --quiet

In [4]:
import os

from google.colab import userdata
os.environ["OPENAI_API_KEY"] = userdata.get('openai_api_key')
os.environ["GOOGLE_API_KEY"] = userdata.get('google_api_key')

# Load Gemini Model

In [5]:
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.gemini import Gemini
from llama_index.core import Settings

Settings.llm = Gemini(model="models/gemini-1.5-flash")
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")

[nltk_data] Downloading package punkt_tab to
[nltk_data]     /usr/local/lib/python3.10/dist-
[nltk_data]     packages/llama_index/core/_static/nltk_cache...
[nltk_data]   Unzipping tokenizers/punkt_tab.zip.


**Note: You can create a vector store from scratch using the code below, or you can load it from Hugging Face using the code provided in this notebook.**

## Vector Store

## Load the Dataset (JSON)

The dataset includes several articles from the TowardsAI blog, Research paper contents and Documentation which provide an in-depth explanation of AI models and RAG method.

In [6]:
from huggingface_hub import hf_hub_download
file_path = hf_hub_download(repo_id="jaiganesan/ai_tutor_knowledge", filename="ai_tutor_knowledge.jsonl",repo_type="dataset",local_dir="/content")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


ai_tutor_knowledge.jsonl:   0%|          | 0.00/6.96M [00:00<?, ?B/s]

In [7]:
import json
with open(file_path, "r") as file:
    ai_tutor_knowledge = [json.loads(line) for line in file]
ai_tutor_knowledge[1]['content']

"Github Repo: https://github.com/vaibhawkhemka/ML-Umbrella/tree/main/NLP/Product-Categorization   From e-commerce to Customer support  all businesses require some kind of NER model to process huge amounts of texts from users.   To automate this whole  one requires NER models to extract relevant and important entities from text.   Final Result/OutputInput text = EL D68 (Green  32 GB) 3 GB RAM [3 GB RAM U+007C 32 GB ROM U+007C Expandable Upto 128 GB  15.46 cm (6.088 inch) Display  13MP Rear Camera U+007C 8MP Front Camera  4000 mAh Battery  Quad-Core Processor]   Output =   Green ->>>> COLOR 32 GB ->>>> STORAGE 3 GB RAM ->>>> RAM 3 GB RAM ->>>> RAM 32 GB ROM ->>>> STORAGE Expandable Upto 128 GB ->>>> EXPANDABLE_STORAGE 15.46 cm (6.088 inch) ->>>> SCREEN_SIZE 13MP Rear Camera ->>>> BACK_CAMERA 8MP Front Camera ->>>> FRONT_CAMERA 4000 mAh Battery ->>>> BATTERY_CAPACITY Quad-Core Processor ->>>> PROCESSOR_CORE   Data PreparationA tool for creating this dataset (https://github.com/tecoholic/n

In [8]:
from typing import List
from llama_index.core import Document

def create_docs_from_list(data_list: List[dict]) -> List[Document]:
    documents = []
    for data in data_list:
        documents.append(
            Document(
                doc_id=data["doc_id"],
                text=data["content"],
                metadata={  # type: ignore
                    "url": data["url"],
                    "title": data["name"],
                    "tokens": data["tokens"],
                    "source": data["source"],
                },
                excluded_llm_metadata_keys=[
                    "title",
                    "tokens",
                    "source",
                ],
                excluded_embed_metadata_keys=[
                    "url",
                    "tokens",
                    "source",
                ],
            )
        )
    return documents

doc = create_docs_from_list(ai_tutor_knowledge)
doc[2]

Document(id_='45501b72-9391-529e-8e5e-59a2604ba26e', embedding=None, metadata={'url': 'https://towardsai.net/p/machine-learning/adaboost-explained-from-its-original-paper', 'title': 'AdaBoost Explained From Its Original Paper', 'tokens': 1697, 'source': 'tai_blog'}, excluded_embed_metadata_keys=['url', 'tokens', 'source'], excluded_llm_metadata_keys=['title', 'tokens', 'source'], relationships={}, text="This publication is meant to show a very popular ML algorithm in complete detail  how it works  the math behind it  how to execute it in Python and an explanation of the proofs of the original paper. There will be math and code  but it is written in a way that allows you to decide which are the fun parts.   A bit on the origins of the algorithm: It was proposed by Yoav Freund and Robert E. Schapire in a 1997 paper  A Decision-Theoretic Generalization of On-Line Learning and an Application to Boostinga beautiful and brilliant publication for an effective and useful algorithm.   Lets star

In [9]:
import nest_asyncio
nest_asyncio.apply()

In [None]:
from llama_index.core.node_parser import TokenTextSplitter

# Define the splitter object that split the text into segments with 1536 tokens,
# with a 128 overlap between the segments.
text_splitter = TokenTextSplitter(separator=" ", chunk_size=512, chunk_overlap=128)

import chromadb
from llama_index.vector_stores.chroma import ChromaVectorStore
from llama_index.llms.openai import OpenAI
from llama_index.core.extractors import (
    SummaryExtractor,
    QuestionsAnsweredExtractor,
    KeywordExtractor,
)
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core.ingestion import IngestionPipeline

# set up ChromaVectorStore and load in data
chroma_client = chromadb.EphemeralClient()
chroma_collection = chroma_client.create_collection("ai_tutor_knowledge")

# save to disk
db = chromadb.PersistentClient(path="/content/ai_tutor_knowledge")
chroma_collection = db.get_or_create_collection("ai_tutor_knowledge")
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
llm = OpenAI(temperature=0, model="gpt-4o-mini")

pipeline = IngestionPipeline(
    transformations=[
        text_splitter,
        QuestionsAnsweredExtractor(questions=2, llm=llm),
        SummaryExtractor(summaries=["prev", "self"], llm=llm),
        KeywordExtractor(keywords=10, llm=llm),
        OpenAIEmbedding(model = "text-embedding-3-small"),
    ],
    vector_store=vector_store,
)

# Run the transformation pipeline.
nodes = pipeline.run(documents=doc, show_progress=True)

Parsing nodes:   0%|          | 0/762 [00:00<?, ?it/s]

100%|██████████| 5834/5834 [1:04:21<00:00,  1.51it/s]
 66%|██████▌   | 3846/5834 [1:12:30<29:47,  1.11it/s]

In [None]:
!zip -r vectorstore.zip ai_tutor_knowledge

# Download the Vector Store


You can Access the VectorStore from the Hugginface hub

In [None]:
from huggingface_hub import hf_hub_download
vectorstore = hf_hub_download(repo_id="jaiganesan/ai_tutor_knowledge", filename="vectorstore.zip",repo_type="dataset",local_dir="/content")

vectorstore.zip:   0%|          | 0.00/97.2M [00:00<?, ?B/s]

In [None]:
!unzip vectorstore.zip

Archive:  vectorstore.zip
   creating: ai_tutor_knowledge/
   creating: ai_tutor_knowledge/684af133-f877-4230-bde4-575cf53b6688/
  inflating: ai_tutor_knowledge/684af133-f877-4230-bde4-575cf53b6688/length.bin  
  inflating: ai_tutor_knowledge/684af133-f877-4230-bde4-575cf53b6688/index_metadata.pickle  
  inflating: ai_tutor_knowledge/684af133-f877-4230-bde4-575cf53b6688/link_lists.bin  
  inflating: ai_tutor_knowledge/684af133-f877-4230-bde4-575cf53b6688/header.bin  
  inflating: ai_tutor_knowledge/684af133-f877-4230-bde4-575cf53b6688/data_level0.bin  
  inflating: ai_tutor_knowledge/chroma.sqlite3  


In [None]:
import chromadb
from llama_index.vector_stores.chroma import ChromaVectorStore

# Load the vector store from the local storage.
db = chromadb.PersistentClient(path="/content/ai_tutor_knowledge")
chroma_collection = db.get_or_create_collection("ai_tutor_knowledge")
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)

In [None]:
from llama_index.core import VectorStoreIndex

# Create the index based on the vector store.
index = VectorStoreIndex.from_vector_store(vector_store=vector_store)

In [None]:
for i in [2, 4, 6, 8, 10, 15, 20, 25, 30]:
    query_engine = index.as_query_engine(similarity_top_k=i)

    res = query_engine.query("Explain how RAG works?")

    print(f"top_{i} results:")
    print("\t", res.response)
    print("-_" * 20)

top_2 results:
	 Retrieval-Augmented Generation (RAG) models combine the strengths of pre-trained dense retrieval and sequence-to-sequence models. They retrieve relevant documents from a dense vector index, such as Wikipedia, using a pre-trained neural retriever. These retrieved documents are then passed to a seq2seq model, which generates outputs based on the retrieved information. Both the retriever and seq2seq modules are initialized from pre-trained models and fine-tuned together, allowing them to adapt to specific downstream tasks. 

-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_
top_4 results:
	 Retrieval-Augmented Generation (RAG) models combine pretrained dense retrieval (DPR) and sequence-to-sequence (seq2seq) models to enhance performance on knowledge-intensive natural language processing (NLP) tasks. They retrieve relevant documents from a dense vector index using a pretrained neural retriever and then pass these documents to a seq2seq model, which generates outputs based on the r

# Evaluate


In [None]:
!wget https://raw.githubusercontent.com/AlaFalaki/tutorial_notebooks/main/data/rag_eval_dataset.json

--2024-09-24 10:55:13--  https://raw.githubusercontent.com/AlaFalaki/tutorial_notebooks/main/data/rag_eval_dataset.json
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 476714 (466K) [text/plain]
Saving to: ‘rag_eval_dataset.json’


2024-09-24 10:55:14 (9.85 MB/s) - ‘rag_eval_dataset.json’ saved [476714/476714]



In [None]:
# We can also load the dataset from a previously saved json file.
from llama_index.core.evaluation import EmbeddingQAFinetuneDataset

rag_eval_dataset = EmbeddingQAFinetuneDataset.from_json("./rag_eval_dataset.json")

In [None]:
from llama_index.core.evaluation import (
    RelevancyEvaluator,
    FaithfulnessEvaluator,
    BatchEvalRunner,
)

from llama_index.llms.openai import OpenAI
llm_gpt4o_mini = OpenAI(temperature=0, model="gpt-4o-mini")

faithfulness_evaluator = FaithfulnessEvaluator(llm=llm_gpt4o_mini)
relevancy_evaluator = RelevancyEvaluator(llm=llm_gpt4o_mini)

# Run evaluation
queries = list(rag_eval_dataset.queries.values())
batch_eval_queries = queries[:20]

runner = BatchEvalRunner(
    {"faithfulness": faithfulness_evaluator, "relevancy": relevancy_evaluator},
    workers=32,
)

for i in [2, 4, 6, 8, 10, 15, 20, 25, 30]:
    # Set Faithfulness and Relevancy evaluators
    query_engine = index.as_query_engine(similarity_top_k=i, llm=llm_gpt4o_mini)

    eval_results = await runner.aevaluate_queries(
        query_engine, queries=batch_eval_queries
    )
    faithfulness_score = sum(
        result.passing for result in eval_results["faithfulness"]
    ) / len(eval_results["faithfulness"])
    print(f"top_{i} faithfulness_score: {faithfulness_score}")

    relevancy_score = sum(result.passing for result in eval_results["relevancy"]) / len(
        eval_results["relevancy"]
    )
    print(f"top_{i} relevancy_score: {relevancy_score}")

top_2 faithfulness_score: 0.6
top_2 relevancy_score: 0.9
top_4 faithfulness_score: 0.65
top_4 relevancy_score: 0.9
top_6 faithfulness_score: 0.65
top_6 relevancy_score: 0.85
top_8 faithfulness_score: 0.5
top_8 relevancy_score: 0.85
top_10 faithfulness_score: 0.5
top_10 relevancy_score: 0.9
top_15 faithfulness_score: 0.6
top_15 relevancy_score: 0.85
top_20 faithfulness_score: 0.65
top_20 relevancy_score: 0.85
top_25 faithfulness_score: 0.55
top_25 relevancy_score: 0.85
top_30 faithfulness_score: 0.6
top_30 relevancy_score: 0.85
