<a href="https://colab.research.google.com/github/isamdr86/towards-ai/blob/main/notebooks/Metadata_Filtering_ir.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Install Packages and Setup Variables


In [1]:
!pip install -q llama-index==0.10.37 llama-index-vector-stores-qdrant==0.2.10

In [2]:
%%capture
!pip install openai==1.55.3 httpx==0.27.2 tiktoken==0.7.0 --force-reinstall --quiet

In [None]:
import os
os.kill(os.getpid(), 9)

In [3]:
import os

from google.colab import userdata
os.environ["OPENAI_API_KEY"] = userdata.get('openai_api_key')

In [4]:
import nest_asyncio

nest_asyncio.apply()

# Load a Model


In [5]:
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core import Settings

Settings.llm = OpenAI(temperature=0.9, model="gpt-4o-mini", max_tokens=512)
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")

[nltk_data] Downloading package punkt_tab to
[nltk_data]     /usr/local/lib/python3.10/dist-
[nltk_data]     packages/llama_index/core/_static/nltk_cache...
[nltk_data]   Package punkt_tab is already up-to-date!


# Create a VectoreStore


In [6]:
from qdrant_client import QdrantClient
from llama_index.vector_stores.qdrant import QdrantVectorStore

# qdrant_client = QdrantClient(location=":memory:")
# or Persist storage
qdrant_client = QdrantClient(path="/content/")

In [7]:
vector_store = QdrantVectorStore(client=qdrant_client, collection_name="ai_tutor_knowledge")

# Load the Dataset (JSON)


## Download


In [8]:
from huggingface_hub import hf_hub_download
file_path = hf_hub_download(repo_id="jaiganesan/ai_tutor_knowledge", filename="ai_tutor_knowledge.jsonl",repo_type="dataset",local_dir="/content")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


## Read File


In [9]:
import json
with open(file_path, "r") as file:
    ai_tutor_knowledge = [json.loads(line) for line in file]
ai_tutor_knowledge[1]['content']

"Github Repo: https://github.com/vaibhawkhemka/ML-Umbrella/tree/main/NLP/Product-Categorization   From e-commerce to Customer support  all businesses require some kind of NER model to process huge amounts of texts from users.   To automate this whole  one requires NER models to extract relevant and important entities from text.   Final Result/OutputInput text = EL D68 (Green  32 GB) 3 GB RAM [3 GB RAM U+007C 32 GB ROM U+007C Expandable Upto 128 GB  15.46 cm (6.088 inch) Display  13MP Rear Camera U+007C 8MP Front Camera  4000 mAh Battery  Quad-Core Processor]   Output =   Green ->>>> COLOR 32 GB ->>>> STORAGE 3 GB RAM ->>>> RAM 3 GB RAM ->>>> RAM 32 GB ROM ->>>> STORAGE Expandable Upto 128 GB ->>>> EXPANDABLE_STORAGE 15.46 cm (6.088 inch) ->>>> SCREEN_SIZE 13MP Rear Camera ->>>> BACK_CAMERA 8MP Front Camera ->>>> FRONT_CAMERA 4000 mAh Battery ->>>> BATTERY_CAPACITY Quad-Core Processor ->>>> PROCESSOR_CORE   Data PreparationA tool for creating this dataset (https://github.com/tecoholic/n

In [10]:
# Not necessary to use full dataset
documents = ai_tutor_knowledge[:100]+ai_tutor_knowledge[500:]

# Transforming


In [11]:
from typing import List
from llama_index.core import Document

def create_docs_from_list(data_list: List[dict]) -> List[Document]:
    documents = []
    for data in data_list:
        documents.append(
            Document(
                doc_id=data["doc_id"],
                text=data["content"],
                metadata={  # type: ignore
                    "url": data["url"],
                    "title": data["name"],
                    "tokens": data["tokens"],
                    "source": data["source"],
                },
                excluded_llm_metadata_keys=[
                    "title",
                    "tokens",
                    "source",
                ],
                excluded_embed_metadata_keys=[
                    "url",
                    "tokens",
                    "source",
                ],
            )
        )
    return documents

doc = create_docs_from_list(documents)
doc[2]

Document(id_='45501b72-9391-529e-8e5e-59a2604ba26e', embedding=None, metadata={'url': 'https://towardsai.net/p/machine-learning/adaboost-explained-from-its-original-paper', 'title': 'AdaBoost Explained From Its Original Paper', 'tokens': 1697, 'source': 'tai_blog'}, excluded_embed_metadata_keys=['url', 'tokens', 'source'], excluded_llm_metadata_keys=['title', 'tokens', 'source'], relationships={}, text="This publication is meant to show a very popular ML algorithm in complete detail  how it works  the math behind it  how to execute it in Python and an explanation of the proofs of the original paper. There will be math and code  but it is written in a way that allows you to decide which are the fun parts.   A bit on the origins of the algorithm: It was proposed by Yoav Freund and Robert E. Schapire in a 1997 paper  A Decision-Theoretic Generalization of On-Line Learning and an Application to Boostinga beautiful and brilliant publication for an effective and useful algorithm.   Lets star

In [12]:
from llama_index.core.node_parser import TokenTextSplitter

# Define the splitter object that split the text into segments with 512 tokens,
# with a 128 overlap between the segments.
text_splitter = TokenTextSplitter(separator=" ", chunk_size=512, chunk_overlap=128)

In [13]:
from llama_index.core.extractors import (
    KeywordExtractor,
)
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core.ingestion import IngestionPipeline

# Create the pipeline to apply the transformation on each chunk,
# and store the transformed text in the chroma vector store.
pipeline = IngestionPipeline(
    transformations=[
        text_splitter,
        KeywordExtractor(keywords=10, llm=Settings.llm),
        OpenAIEmbedding(model="text-embedding-3-small"),
    ],
    vector_store=vector_store,
)

# Run the transformation pipeline.
nodes = pipeline.run(documents=doc, show_progress=True)

Parsing nodes:   0%|          | 0/362 [00:00<?, ?it/s]

100%|██████████| 1181/1181 [05:09<00:00,  3.82it/s]


Generating embeddings:   0%|          | 0/1181 [00:00<?, ?it/s]



In [14]:
!zip ai_tutor_knowledge_metadata.zip /content/collection/ai_tutor_knowledge

  adding: content/collection/ai_tutor_knowledge/ (stored 0%)


In [15]:
len(nodes)

1181

In [16]:
nodes[0].metadata

{'url': 'https://towardsai.net/p/machine-learning/bert-huggingface-model-deployment-using-kubernetes-github-repo-03-07-2024',
 'title': 'BERT HuggingFace Model Deployment using Kubernetes [ Github Repo]  03/07/2024',
 'tokens': 768,
 'source': 'tai_blog',
 'excerpt_keywords': 'BERT, HuggingFace, Kubernetes, deployment, Docker, FastAPI, model serving, ML pipeline, containerization, scalability.'}

In [17]:
from llama_index.core import VectorStoreIndex

# Create the index based on the vector store.
index = VectorStoreIndex.from_vector_store(vector_store)

In [18]:
query_engine = index.as_query_engine(similarity_top_k=10)
res = query_engine.query("Explain how Advance RAG works?")

res.response

'Advanced RAG systems enhance the traditional retrieval-augmented generation approach by introducing more sophisticated methodologies for managing the retrieval and generation processes. \n\nFor instance, Self-Reflective RAG focuses on improving the quality of retrieved documents by incorporating a specialized instruction-tuning mechanism that generates self-reflection tags. These tags help the language model dynamically assess the relevance of retrieved documents before generating responses. \n\nCorrective RAG, on the other hand, introduces an external evaluator to refine the quality of retrieved documents, ensuring that only the most relevant and accurate information is utilized in the generation stage.\n\nAnother significant advancement is SPECULATIVE RAG, which decomposes the RAG process into two main steps: drafting and verification. A smaller specialized RAG drafter handles the initial drafting of responses using subsets of retrieved documents, while a larger generalist language 

# Metadata Filtering


In [19]:
from llama_index.core.vector_stores import MetadataFilter, MetadataFilters, FilterOperator

filters = MetadataFilters(
    filters=[
        MetadataFilter(
            key="excerpt_keywords",
            operator=FilterOperator.TEXT_MATCH,
            value="PEFT",
        ),
    ]
)

# Query Dataset


In [20]:
# Define a query engine that is responsible for retrieving related pieces of text,
# and using a LLM to formulate the final answer.
query_engine = index.as_query_engine(filters=filters)

res = query_engine.query("How Parameter efficient fine tuning (PEFT) Works?")

In [21]:
res.response

'Parameter Efficient Fine Tuning (PEFT) works by making targeted adjustments to a pre-trained large language model (LLM) without fully retraining it, which can be computationally intensive. The process involves focusing on a subset of parameters to fine-tune, rather than altering every weight in the model. This is achieved through three primary approaches:\n\n1. **Selective Fine-Tuning**: This approach selects specific initial parameters of the LLM to adjust during training.\n\n2. **Reparameterization**: In this method, model weights are represented using a low-rank format, which reduces the number of parameters that need to be updated. For instance, a weight matrix can be broken down into smaller matrices that capture essential information with fewer parameters.\n\n3. **Additive Approaches**: This involves adding small trainable modules or parameters to the existing model. Two common techniques are:\n   - **Adapter Modules**: These are small neural networks integrated into specific la

In [22]:
# Show the retrieved nodes
for src in res.source_nodes:
    print("Node ID\t", src.node_id)
    print("Title\t", src.metadata["title"])
    print("Text\t", src.text)
    print("Score\t", src.score)
    print("Score\t", src.metadata["excerpt_keywords"])
    print("-_" * 20)

Node ID	 3fa93a0f-2735-4e57-8709-639d0ede5c4e
Title	 Fine-Tuning and Evaluating Large Language Models: Key Benchmarks and Metrics
Text	 SAMSUM is one of the datasets that FLAN T5 uses. There are several pre-trained FLAN T5 models that have been fine-tuned on SAMSUM  including Phil Schmid/flan-t5-base-samsum and jasonmcaffee/flan-t5-large-samsum on Hugging Face. If we want to fine-tune the FLAN T5 model specifically for formal dialogue conversations  we can do so using the DIALOGUESUM dataset.   Models fine-tuned on DialogSum can be applied to areas like customer support  meeting minutes generation  chatbot summarization  and more.   2. PEFT (Parameter efficient fine tuning)Training LLMs is computationally intensive. Full finetuning is computationally expensive as it might change each weight in the model. First  we start with a pretrained LLM like GPT-3. This model already has a vast amount of knowledge and understanding of language. Then we provide task-specific datasets  which could b

# Filter Metadata (source_name)


In [23]:
from llama_index.core.vector_stores import MetadataFilter,MetadataFilters,FilterOperator,FilterCondition

filters = MetadataFilters(
    filters=[
        MetadataFilter(
            key="excerpt_keywords",
            operator=FilterOperator.TEXT_MATCH,
            value="BERT",
        ),
        MetadataFilter(
            key="source", operator=FilterOperator.EQ, value="tai_blog"
        ),
    ],
    condition=FilterCondition.AND,
)

In [24]:
# Define a query engine that is responsible for retrieving related pieces of text,
# and using a LLM to formulate the final answer.
query_engine = index.as_query_engine(filters=filters)

result = query_engine.query("Explain BERT?")

In [25]:
result.response

"BERT, or Bidirectional Encoder Representations from Transformers, is a language model developed by Google that utilizes a transformer architecture. It is specifically designed to process text in both directions simultaneously, which allows it to capture context more effectively than traditional unidirectional models that read text sequentially. \n\nBERT employs two main tasks during its pre-training: \n\n1. **Masked Language Modeling (MLM)**: In this task, around 15% of the input tokens are masked, and the model learns to predict these masked tokens based on the surrounding context. This helps BERT produce contextualized word vectors.\n\n2. **Next Sentence Prediction (NSP)**: This task involves predicting whether one sentence is likely to follow another, further enhancing the model's understanding of sentence relationships.\n\nBERT comes in two versions: BERT BASE, which has 12 layers and 110 million parameters, and BERT LARGE, which has 24 layers and 340 million parameters. It was pr

In [26]:
# Show the retrieved nodes
for src in result.source_nodes:
    print("Node ID\t", src.node_id)
    print("Title\t", src.metadata["title"])
    print("Text\t", src.text)
    print("Score\t", src.score)
    print("Score\t", src.metadata["excerpt_keywords"])
    print("-_" * 20)

Node ID	 137929fb-49d1-4164-b7e1-ab4c81c2cb8c
Title	 Attention is all you need: How Transformer Architecture in NLP started.
Text	 GPT-2  GPT-3  and GPT-4  which were decoder-only architectures. Another well-known example is BERT (Bidirectional Encoder Representations from Transformers)  an encoder-only transformer mode used as a component in sentence embedding models.   Lets talk about BERT!BERT stands for Bidirectional Encoder Representations from Transformers. It is a language model by Google that uses a transformer architecture to understand and generate human-like language. BERT is designed to simultaneously process text in both directions  allowing it to capture context more effectively than traditional unidirectional models  which read text sequentially from left to right or right to left.   Example of Bidirectional CapabilityConsider the sentence:    The bank is situated on the _______ of the river.   In a unidirectional model  understanding the blank would primarily rely on th

In [27]:
# When Mismatch between Keyword (value) and Query

filters = MetadataFilters(
    filters=[
        MetadataFilter(
            key="excerpt_keywords",
            operator=FilterOperator.TEXT_MATCH,
            value="BERT",
        ),
        MetadataFilter(
            key="source", operator=FilterOperator.EQ, value="tai_blog"
        ),
    ],
    condition=FilterCondition.AND,
)

query_engine = index.as_query_engine(filters=filters)

result = query_engine.query("Explain PEFT?")

print(result.response)


# Show the retrieved nodes
for src in result.source_nodes:
    print("Node ID\t", src.node_id)
    print("Title\t", src.metadata["title"])
    print("Text\t", src.text)
    print("Score\t", src.score)
    print("Score\t", src.metadata["excerpt_keywords"])
    print("-_" * 20)


The provided information does not mention PEFT or provide any details related to it. Therefore, an explanation of PEFT cannot be derived from the given excerpts.
Node ID	 137929fb-49d1-4164-b7e1-ab4c81c2cb8c
Title	 Attention is all you need: How Transformer Architecture in NLP started.
Text	 GPT-2  GPT-3  and GPT-4  which were decoder-only architectures. Another well-known example is BERT (Bidirectional Encoder Representations from Transformers)  an encoder-only transformer mode used as a component in sentence embedding models.   Lets talk about BERT!BERT stands for Bidirectional Encoder Representations from Transformers. It is a language model by Google that uses a transformer architecture to understand and generate human-like language. BERT is designed to simultaneously process text in both directions  allowing it to capture context more effectively than traditional unidirectional models  which read text sequentially from left to right or right to left.   Example of Bidirectional Cap