# Simple RAG Implementation


In this notebook we will build a simple RAG application based on a scholraship dataset from Govt. of India
* [Load the dataset](#loading-the-dataset).
* [Chunking](#Encode-using-vector-embedding)


### Load the Dataset. 
Govt. of India data is available at https://scholarships.gov.in/ which was uploaded to the Hugging Face. Uploading to the Hugging face is already done.
In this cell we would download the dataset from HF. 

In [12]:
! pip install pandas
import pandas as pd
from pprint import pprint

# Read scholarship data from parquet file
df = pd.read_parquet("hf://datasets/NetraVerse/indian-govt-scholarships/data/train-00000-of-00001.parquet")
df = df[['label', 'text']]

# Convert to records format
data = df.to_dict('records')
print(f"Loaded {len(data)} scholarship documents")
pprint(data[:1])





'(ProtocolError('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')), '(Request ID: b79dad6d-e400-45fe-aeeb-decd4aa3e1b1)')' thrown while requesting GET https://huggingface.co/datasets/NetraVerse/indian-govt-scholarships/resolve/main/data/train-00000-of-00001.parquet
Retrying in 1s [Retry 1/5].




'(ProtocolError('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')), '(Request ID: b79dad6d-e400-45fe-aeeb-decd4aa3e1b1)')' thrown while requesting GET https://huggingface.co/datasets/NetraVerse/indian-govt-scholarships/resolve/main/data/train-00000-of-00001.parquet
Retrying in 1s [Retry 1/5].


Loaded 10 scholarship documents
[{'label': 'AICTE_2010_F',
  'text': 'Page 1 of 8 \n'
          '  \n'
          ' \n'
          'PRAGATI SCHOLARSHIP SCHEME  \n'
          'Frequently Asked Questions (FAQs)  \n'
          'Q.1 Who is eligible for PRAGATI Scholarship?  \n'
          'Ans: Eligibility criteria underPRAGATI Scholarship scheme:  \n'
          'EligibilityforPragati-DegreeLevel '
          'EligibilityforPragati‚ÄìDiplomaLevel \n'
          '1. Upto TwoGirls per family. 1. Upto TwoGirls perfamily. \n'
          '2. \n'
          ' Familyincomeshouldbelessthan Rs.8Lakh perannu\n'
          'm. 2. Familyincomeshouldbelessthan Rs.8Lakh perannum. \n'
          '3.  Studentsadmittedin UGDegreeLevel \n'
          'Programme/CourseinAICTEApprovedInstitutions. 3.Studentsadmittedin '
          'DiplomaLevel Programme/CourseinAIC\n'
          'TE ApprovedInstitutions. \n'
          '4.Thestudentsadmitted in  first year of their Degree \n'
          'Course OR Second year oftheir \n'


### Chunk the Data - Splitting into smaller pieces
* We will split the data into smaller chunks to make it easier to process and retrieve relevant information.
* Interactive chunking experience is available at https://chunkviz.up.railway.app/

In [13]:
# Set this to True to enable chunking, False to disable
ENABLE_CHUNKING = True 

def chunk_text(text, chunk_size, overlap):
    '''Split text into overlapping chunks'''
    chunks = []
    start = 0
    while start < len(text):
        end = start + chunk_size
        chunk = text[start:end]
        chunks.append(chunk)
        start += chunk_size - overlap
    return chunks

if ENABLE_CHUNKING:
    # Create chunked version of data
    chunked_data = []
    for doc in data:
        text = doc['text']  #data under 'text' key
        chunks = chunk_text(text, chunk_size=500, overlap=100)  #500 characters with 100 characters overlap
        
        for i, chunk in enumerate(chunks):
            chunked_data.append({
                'label': doc['label'],
                'text': chunk,
                'chunk_id': i,
                'total_chunks': len(chunks)
            })
    
    # Reassign data with chunked data
    data = chunked_data
    
    print(f"CHUNKING ENABLED")
    print(f"Chunked into {len(data)} pieces")
    # Display first chunk example - FULL TEXT
    print("FIRST CHUNK EXAMPLE:")
    print(f"Chunk ID: {data[0]['chunk_id']} of {data[0]['total_chunks']}")
else:
    print(f"CHUNKING DISABLED - Using full documents")
    print(f"Total documents: {len(data)}")
    print("FIRST DOCUMENT EXAMPLE:")
    
print(f"Label: {data[0]['label']}")
print(f"Text Length: {len(data[0]['text'])} characters")
print(f"FULL TEXT:\n{data[0]['text']}")


CHUNKING ENABLED
Chunked into 447 pieces
FIRST CHUNK EXAMPLE:
Chunk ID: 0 of 33
Label: AICTE_2010_F
Text Length: 500 characters
FULL TEXT:
Page 1 of 8 
  
 
PRAGATI SCHOLARSHIP SCHEME  
Frequently Asked Questions (FAQs)  
Q.1 Who is eligible for PRAGATI Scholarship?  
Ans: Eligibility criteria underPRAGATI Scholarship scheme:  
EligibilityforPragati-DegreeLevel EligibilityforPragati‚ÄìDiplomaLevel 
1. Upto TwoGirls per family. 1. Upto TwoGirls perfamily. 
2. 
 Familyincomeshouldbelessthan Rs.8Lakh perannu
m. 2. Familyincomeshouldbelessthan Rs.8Lakh perannum. 
3.  Studentsadmittedin UGDegreeLevel 
Programme/CourseinAICTEApprovedInstit


### üì¶ Install required dependencies for vector database, embeddings, and deep learning
* Vector database used is qdrant
* Embeddings model is from sentence transformers. This maps sentences & paragraphs to a 384 dimensional dense vector space and can be used for tasks like clustering or semantic search. https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2
* Deep learning model is from Hugging Face.



In [14]:
! pip install qdrant-client
! pip install sentence-transformers
! pip install torch



### üì¶ Initialize Qdrant vector database client and SentenceTransformer embedding encoder
* Vector database is used to store and retrieve document chunks based on their semantic similarity to the query.
* SentenceTransformer is used to convert text into dense vector representations (embeddings).

In [15]:
from qdrant_client import models, QdrantClient
from sentence_transformers import SentenceTransformer

qdrant = QdrantClient(":memory:") # Create in-memory Qdrant instance

encoder = SentenceTransformer('all-MiniLM-L6-v2') # Model to create embeddings

###  üì¶ Initialize vector database for storing scholarship embeddings with cosine similarity. Other similarity functions are
* DOT Product (models.Distance.DOT)
* Euclidean (models.Distance.EUCLIDIAN)
* Manhattan (models.Distance.MANHATTAN)
* ...etc.

In [16]:
# Create collection to store the scholarship data
collection_name="scholarships"

qdrant.recreate_collection(
    collection_name=collection_name,
    vectors_config=models.VectorParams(
        size=encoder.get_sentence_embedding_dimension(), # Vector size is as defined in the used model
        distance=models.Distance.COSINE 
    )
)

  qdrant.recreate_collection(


  qdrant.recreate_collection(


True

### üì¶ Generate embeddings for each document and upload to vector database
* In vector database, each data point is represented as a vector in a high-dimensional space.
* For all-MiniLM-L6-v2 model, each vector has 384 dimensions.

In [17]:
points_to_upload = []
for idx, doc in enumerate(data):
    points_to_upload.append(
        models.PointStruct(
            id=idx,
            vector=encoder.encode(doc["text"]).tolist(),  # Use 'text' field for scholarship data
            payload=doc
        )
    )

# vectorize and upload points to Qdrant
qdrant.upload_points(
    collection_name=collection_name,
    points=points_to_upload
)

### üì¶ Check the embeddings.

In [18]:
# Display first document's text and embedding
first_doc = data[0]
first_text = first_doc['text']
first_vector = encoder.encode(first_text).tolist()

print("DOCUMENT TEXT:")
print(f"Text (first 100 chars): {first_text[:100]}...")
print("EMBEDDING VECTOR:")
print(f"Vector dimension: {len(first_vector)}")
print(f"First 20 values: {first_vector[:20]}")


DOCUMENT TEXT:
Text (first 100 chars): Page 1 of 8 
  
 
PRAGATI SCHOLARSHIP SCHEME  
Frequently Asked Questions (FAQs)  
Q.1 Who is eligib...
EMBEDDING VECTOR:
Vector dimension: 384
First 20 values: [-0.0012908128555864096, 0.012660644017159939, 0.002040782244876027, -0.03014940395951271, 0.0076779816299676895, 0.030943244695663452, -0.0029246823396533728, 0.025981172919273376, -0.04518456012010574, 0.0067618186585605145, 0.05067143961787224, -0.07189879566431046, -0.01719922386109829, -0.021779561415314674, -0.0004332665994297713, -0.051407407969236374, -0.06701146811246872, -0.05509680509567261, -0.02935010939836502, -0.04273590072989464]


### üì¶ User query and searching the database
 * Define user query 
 * Convert user query to embedding using the same SentenceTransformer model.



In [19]:
user_prompt = "what is the percetnage reservations for women in NSPG Scheme"
query_vector = encoder.encode(user_prompt).tolist()

### üéØ Search vector database
* Search the vector database for the top 3 (top k) most similar document chunks based on cosine similarity.
* Display the retrieved document chunks with metadata and similarity scores.

In [20]:
# Search time for awesome wines!
from qdrant_client import QdrantClient
from qdrant_client.models import SearchParams, ScoredPoint

hits = qdrant.query_points(
    collection_name=collection_name,
    query=query_vector,
    limit=3
)

#  hold the search results
search_results = [hit.payload for hit in hits.points]

for hit in hits.points: # Print the search hits
  pprint(hit)

ScoredPoint(id=133, version=0, score=0.5962272196906675, payload={'label': 'FAQ_NSPG', 'text': 'FAQs for NATIONAL SCHOLARSHIP FOR POSTGRADUATE STUDIES\n1\n. \nWhat is NSPG Scheme?\nReply: \nNational Scholarship for Post Graduate Studies (NSPGS) is an umbrella\nscheme by merging four diÔ¨Äerent scholarship schemes for postgraduate studies\ni.e. (i) PG Indira Gandhi single girl child scholarship, (ii) PG scholarship for\nuniversity rank holders, (iii) PG scholarship for SC/ST students pursuing\nprofessional courses and (iv) Post Graduate Scholarship for GATE/GPAT.\n \n2\n. \nHow can I apply online for scho', 'chunk_id': 0, 'total_chunks': 14}, vector=None, shard_key=None, order_value=None)
ScoredPoint(id=140, version=0, score=0.59211491907457, payload={'label': 'FAQ_NSPG', 'text': 'marked for Science, Engineering &\nTechnology, medical, technical, agriculture, forestry programmes. The selection is\ndone purely on merit basis. The slots are allocated as per Govt. of India\nreservation pol

### ü§ñ Load TinyLlama model
* TinyLlama is a smaller version of the LLaMA model, designed to be more efficient while still providing good performance for various NLP tasks.
* We will use TinyLlama to generate responses based on the retrieved document chunks.
* https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0



In [21]:
# For Hugging Face models
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Set up device (GPU if available, else CPU)
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}")

# Load TinyLlama model and tokenizer
model_name = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name).to(device)

Using device: cuda


### ü§ñ Generate response using TinyLlama without search results
* Generate a response to the user query using TinyLlama without incorporating any retrieved document chunks.
* This serves as a baseline to compare against the RAG approach.
* Return Tensor output is of PyTorch type.
* max_new_tokens: The maximum number of new tokens to generate in the response.

In [22]:
prompt = [
    {"role": "system", "content": "You are a helpful chatbot specializing in Indian government scholarships. Your top priority is to help users find relevant scholarship information and guide them with their queries. ONLY use information from the retrieved documents"},
    {"role": "user","content": user_prompt},
]
inputs = tokenizer.apply_chat_template(
	prompt,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=1024)
pprint("Response without RAG and with TinyLlama:")
pprint(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

'Response without RAG and with TinyLlama:'
('According to the given material, the percentage of reservations for women in '
 'NSPG Scheme is not mentioned. However, the text mentions that the scheme '
 'aims to provide financial assistance to women students for pursuing higher '
 'education. Therefore, it is safe to assume that the scheme provides '
 'financial assistance to women students as well.</s>')


### ‚ú® Generate response using TinyLlama WITH search results (RAG-ENHANCED)
* Generate a response to the user query using TinyLlama  incorporating retrieved document chunks.
* max_new_tokens: The maximum number of new tokens to generate in the response.

In [26]:
# No need to reload the model - just create a new prompt with RAG context

prompt = [
    {"role": "system", "content": f"You are a helpful chatbot specializing in Indian government scholarships. Use the following retrieved documents to answer the user's question accurately.ONLY use information from the retrieved documents.\n\nRetrieved Documents:\n{str(search_results)}"},
    {"role": "user", "content": user_prompt},
]
inputs = tokenizer.apply_chat_template(
	prompt,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)
pprint("Response with  RAG and with TinyLlama:")

outputs = model.generate(**inputs, max_new_tokens=500)
response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])
pprint(response)

# Print source documents in one line
sources = " | ".join([f"{doc['label']}" for doc in search_results])
print(f"\nüìö Sources: {sources}")

'Response with  RAG and with TinyLlama:'
('According to the retrieved documents, the percentage reservations for women '
 'in NSPG Scheme are 30%. This means that 30% of the available scholarship '
 'slots are reserved for women candidates.</s>')

üìö Sources: FAQ_NSPG | FAQ_NSPG | Guidelines_NATIONAL_SCHOLARSHIP_FOR_POSTGRADUATE_STUDIES_UGC_2324


<h4>üåê Launch interactive Gradio chatbot interface with full RAG pipeline</h4>

In [27]:
import gradio as gr

def scholarship_chatbot(message, history):
    # Encode user query
    query_vector = encoder.encode(message).tolist()
    
    # Search for relevant scholarships
    hits = qdrant.query_points(
        collection_name=collection_name,
        query=query_vector,
        limit=3
    )
    
    search_results = [hit.payload for hit in hits.points]
    
    # Generate response with LLM
    prompt = [
        {"role": "system", "content": f"You are a helpful chatbot specializing in Indian government scholarships. Use the following retrieved documents to answer accurately:\n\nRetrieved Documents:\n{str(search_results)}"},
        {"role": "user", "content": message}
    ]
    
    inputs = tokenizer.apply_chat_template(
        prompt,
        add_generation_prompt=True,
        tokenize=True,
        return_dict=True,
        return_tensors="pt",
    ).to(model.device)
    
    outputs = model.generate(**inputs, max_new_tokens=1024)
    response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])
    
    # Add source documents in one line
    sources = " | ".join([f"{doc['label']}" for doc in search_results])
    response_with_sources = f"{response}\n\n Sources: {sources}"
    
    return response_with_sources

# Launch Gradio interface
demo = gr.ChatInterface(
    scholarship_chatbot,
    title="üéì Indian Government Scholarship Chatbot",
    description="Ask me about Indian government scholarships!",
    examples=[
        "What scholarships are available for engineering students?",
        "Tell me about AICTE scholarships",
        "Are there scholarships for women in STEM?"
    ]
)

demo.launch()

  self.chatbot = Chatbot(


It looks like you are running Gradio on a hosted Jupyter notebook, which requires `share=True`. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://5404f89d983d01c36a.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


