# RAG Implementation Chatbot to get information about the scholarships provided by Govt. of India


In this notebook we will build a simple RAG application based on a scholraship dataset from Govt. of India. It has following sections
* Load the dataset
* Chunking - Splitting of the Data
* Vector Database
* Generating Embedding
* Encoding user query, Creating the prompt and generating the similarity score
* Generate the output and channel it through the LLM for the proper response

### Load the Dataset. 
Govt. of India data is available at https://scholarships.gov.in/ which was uploaded to the Hugging Face. Uploading to the Hugging face is already done.
In this cell we would download the dataset from HF. 

In [1]:
! pip install pandas
import pandas as pd
from pprint import pprint

# Read scholarship data from parquet file
df = pd.read_parquet("hf://datasets/NetraVerse/indian-govt-scholarships/data/train-00000-of-00001.parquet")
df = df[['label', 'text']]

# Convert to records format
data = df.to_dict('records')
print(f"Loaded {len(data)} scholarship documents")
pprint(data[:1])



Error while fetching `HF_TOKEN` secret value from your vault: 'Requesting secret HF_TOKEN timed out. Secrets can only be fetched when running from the Colab UI.'.
You are not authenticated with the Hugging Face Hub in this notebook.
If the error persists, please let us know by opening an issue on GitHub (https://github.com/huggingface/huggingface_hub/issues/new).


Loaded 10 scholarship documents
[{'label': 'AICTE_2010_F',
  'text': 'Page 1 of 8 \n'
          '  \n'
          ' \n'
          'PRAGATI SCHOLARSHIP SCHEME  \n'
          'Frequently Asked Questions (FAQs)  \n'
          'Q.1 Who is eligible for PRAGATI Scholarship?  \n'
          'Ans: Eligibility criteria underPRAGATI Scholarship scheme:  \n'
          'EligibilityforPragati-DegreeLevel '
          'EligibilityforPragati‚ÄìDiplomaLevel \n'
          '1. Upto TwoGirls per family. 1. Upto TwoGirls perfamily. \n'
          '2. \n'
          ' Familyincomeshouldbelessthan Rs.8Lakh perannu\n'
          'm. 2. Familyincomeshouldbelessthan Rs.8Lakh perannum. \n'
          '3.  Studentsadmittedin UGDegreeLevel \n'
          'Programme/CourseinAICTEApprovedInstitutions. 3.Studentsadmittedin '
          'DiplomaLevel Programme/CourseinAIC\n'
          'TE ApprovedInstitutions. \n'
          '4.Thestudentsadmitted in  first year of their Degree \n'
          'Course OR Second year oftheir \n'


### Chunk the Data - Splitting into smaller pieces
* We will split the data into smaller chunks to make it easier to process and retrieve relevant information.
* Interactive chunking experience is available at https://chunkviz.up.railway.app/

In [None]:
# Set this to True to enable chunking, False to disable
ENABLE_CHUNKING = True 

def chunk_text(text, chunk_size, overlap):
    '''Split text into overlapping chunks'''
    chunks = []
    start = 0
    while start < len(text):
        end = start + chunk_size
        chunk = text[start:end]
        chunks.append(chunk)
        start += chunk_size - overlap
    return chunks

if ENABLE_CHUNKING:
    # Create chunked version of data
    chunked_data = []
    for doc in data:
        text = doc['text']  #data under 'text' key
        chunks = chunk_text(text, chunk_size=500, overlap=100)  #500 characters with 100 characters overlap
        
        for i, chunk in enumerate(chunks):
            chunked_data.append({
                'label': doc['label'],
                'text': chunk,
                'chunk_id': i,
                'total_chunks': len(chunks)
            })
    
    # Reassign data with chunked data
    data = chunked_data
    
    print(f"CHUNKING ENABLED")
    print(f"Chunked into {len(data)} pieces")
    # Display first chunk example - FULL TEXT
    print("FIRST CHUNK EXAMPLE:")
    print(f"Chunk ID: {data[0]['chunk_id']} of {data[0]['total_chunks']}")
else:
    print(f"CHUNKING DISABLED - Using full documents")
    print(f"Total documents: {len(data)}")
    print("FIRST DOCUMENT EXAMPLE:")
    
print(f"Label: {data[0]['label']}")
print(f"Text Length: {len(data[0]['text'])} characters")
print(f"FULL TEXT:\n{data[0]['text']}")


CHUNKING DISABLED - Using full documents
Total documents: 10
FIRST DOCUMENT EXAMPLE:
Label: AICTE_2010_F
Text Length: 12975 characters
FULL TEXT:
Page 1 of 8 
  
 
PRAGATI SCHOLARSHIP SCHEME  
Frequently Asked Questions (FAQs)  
Q.1 Who is eligible for PRAGATI Scholarship?  
Ans: Eligibility criteria underPRAGATI Scholarship scheme:  
EligibilityforPragati-DegreeLevel EligibilityforPragati‚ÄìDiplomaLevel 
1. Upto TwoGirls per family. 1. Upto TwoGirls perfamily. 
2. 
 Familyincomeshouldbelessthan Rs.8Lakh perannu
m. 2. Familyincomeshouldbelessthan Rs.8Lakh perannum. 
3.  Studentsadmittedin UGDegreeLevel 
Programme/CourseinAICTEApprovedInstitutions. 3.Studentsadmittedin DiplomaLevel Programme/CourseinAIC
TE ApprovedInstitutions. 
4.Thestudentsadmitted in  first year of their Degree 
Course OR Second year oftheir 
DegreeCoursethroughlateralentry(LE) inanyofth
eAICTEapprovedInstitutionof respectiveyear. 4.The students admitted in first year of their 
DiplomaCourse ORSecondyearof DiplomaCou

### üì¶ Install required dependencies for vector database, embeddings, and deep learning
* Vector database used is qdrant
* Embeddings model is from sentence transformers. This maps sentences & paragraphs to a 384 dimensional dense vector space and can be used for tasks like clustering or semantic search. https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2
* Deep learning model is from Hugging Face.



In [3]:
! pip install qdrant-client
! pip install sentence-transformers
! pip install torch



### üì¶ Initialize Qdrant vector database client and SentenceTransformer embedding encoder
* Vector database is used to store and retrieve document chunks based on their semantic similarity to the query.
* SentenceTransformer is used to convert text into dense vector representations (embeddings).

In [4]:
from qdrant_client import models, QdrantClient
from sentence_transformers import SentenceTransformer

qdrant = QdrantClient(":memory:") # Create in-memory Qdrant instance

encoder = SentenceTransformer('all-MiniLM-L6-v2') # Model to create embeddings

###  üì¶ Initialize vector database for storing scholarship embeddings with cosine similarity. Other similarity functions are
* DOT Product (models.Distance.DOT)
* Euclidean (models.Distance.EUCLIDIAN)
* Manhattan (models.Distance.MANHATTAN)
* ...etc.

In [5]:
# Create collection to store the scholarship data
collection_name="scholarships"

qdrant.recreate_collection(
    collection_name=collection_name,
    vectors_config=models.VectorParams(
        size=encoder.get_sentence_embedding_dimension(), # Vector size is as defined in the used model
        distance=models.Distance.COSINE 
    )
)

  qdrant.recreate_collection(


True

### üì¶ Generate embeddings for each document and upload to vector database
* In vector database, each data point is represented as a vector in a high-dimensional space.
* For all-MiniLM-L6-v2 model, each vector has 384 dimensions.

In [6]:
points_to_upload = []
for idx, doc in enumerate(data):
    points_to_upload.append(
        models.PointStruct(
            id=idx,
            vector=encoder.encode(doc["text"]).tolist(),  # Use 'text' field for scholarship data
            payload=doc
        )
    )

# vectorize and upload points to Qdrant
qdrant.upload_points(
    collection_name=collection_name,
    points=points_to_upload
)

### üì¶ Check the embeddings.

In [7]:
# Display first document's text and embedding
first_doc = data[0]
first_text = first_doc['text']
first_vector = encoder.encode(first_text).tolist()

print("DOCUMENT TEXT:")
print(f"Text (first 100 chars): {first_text[:100]}...")
print("EMBEDDING VECTOR:")
print(f"Vector dimension: {len(first_vector)}")
print(f"First 20 values: {first_vector[:20]}")


DOCUMENT TEXT:
Text (first 100 chars): Page 1 of 8 
  
 
PRAGATI SCHOLARSHIP SCHEME  
Frequently Asked Questions (FAQs)  
Q.1 Who is eligib...
EMBEDDING VECTOR:
Vector dimension: 384
First 20 values: [-0.006886775139719248, 0.011079510673880577, -6.519130693050101e-05, -0.031075933948159218, 0.02162911742925644, 0.019189953804016113, -0.03200969845056534, 0.018779166042804718, -0.013022931292653084, 0.04516565799713135, 0.08221457153558731, -0.07967180758714676, 0.009064801037311554, -0.013389509171247482, 0.003615487366914749, -0.08470773696899414, -0.05057789385318756, -0.025727171450853348, -0.014768298715353012, -0.01017395593225956]


### üì¶ User query and searching the database
 * Define user query 
 * Convert user query to embedding using the same SentenceTransformer model.



In [8]:
user_prompt = "what is the percetnage reservations for women in NSPG Scheme"
query_vector = encoder.encode(user_prompt).tolist()

### üéØ Search vector database
* Search the vector database for the top 3 (top k) most similar document chunks based on cosine similarity.
* Display the retrieved document chunks with metadata and similarity scores.

In [9]:
# Search time for awesome wines!
from qdrant_client import QdrantClient
from qdrant_client.models import SearchParams, ScoredPoint

hits = qdrant.query_points(
    collection_name=collection_name,
    query=query_vector,
    limit=3
)

#  hold the search results
search_results = [hit.payload for hit in hits.points]

for hit in hits.points: # Print the search hits
  pprint(hit)

ScoredPoint(id=5, version=0, score=0.628864160584484, payload={'label': 'FAQ_NSPG', 'text': "FAQs for NATIONAL SCHOLARSHIP FOR POSTGRADUATE STUDIES\n1\n. \nWhat is NSPG Scheme?\nReply: \nNational Scholarship for Post Graduate Studies (NSPGS) is an umbrella\nscheme by merging four diÔ¨Äerent scholarship schemes for postgraduate studies\ni.e. (i) PG Indira Gandhi single girl child scholarship, (ii) PG scholarship for\nuniversity rank holders, (iii) PG scholarship for SC/ST students pursuing\nprofessional courses and (iv) Post Graduate Scholarship for GATE/GPAT.\n \n2\n. \nHow can I apply online for scholarship?\nReply: \nStudents can apply online through National Scholarship Portal (NSP).\n \n3\n. \nDoes reservation system is followed in this scholarship? If yes what percentage is\nreserved for each respective category?\nReply\n: \nYes. The slots are allocated as per Govt. of India reservation policy.\nHowever, 30% slots will be reserved for the women candidates. Preference is\ngiven to 

### ü§ñ Load TinyLlama model
* TinyLlama is a smaller version of the LLaMA model, designed to be more efficient while still providing good performance for various NLP tasks.
* We will use TinyLlama to generate responses based on the retrieved document chunks.
* https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0



In [10]:
# For Hugging Face models
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Set up device (GPU if available, else CPU)
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}")

# Load TinyLlama model and tokenizer
model_name = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name).to(device)

Using device: cuda


### ü§ñ Generate response using TinyLlama without search results
* Generate a response to the user query using TinyLlama without incorporating any retrieved document chunks.
* This serves as a baseline to compare against the RAG approach.
* Return Tensor output is of PyTorch type.
* max_new_tokens: The maximum number of new tokens to generate in the response.

In [11]:
prompt = [
    {"role": "system", "content": "You are a helpful chatbot specializing in Indian government scholarships. Your top priority is to help users find relevant scholarship information and guide them with their queries. ONLY use information from the retrieved documents"},
    {"role": "user","content": user_prompt},
]
inputs = tokenizer.apply_chat_template(
	prompt,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=1024)
pprint("Response without RAG and with TinyLlama:")
pprint(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

'Response without RAG and with TinyLlama:'
('According to the given material, the percentage of reservations for women in '
 'NSPG Scheme is not mentioned. However, the text mentions that the scheme '
 'aims to provide financial assistance to women students for pursuing higher '
 'education. Therefore, it is safe to assume that the scheme provides '
 'financial assistance to women students as well.</s>')


### ‚ú® Generate response using TinyLlama WITH search results (RAG-ENHANCED)
* Generate a response to the user query using TinyLlama  incorporating retrieved document chunks.
* max_new_tokens: The maximum number of new tokens to generate in the response.

In [12]:
# No need to reload the model - just create a new prompt with RAG context

prompt = [
    {"role": "system", "content": f"You are a helpful chatbot specializing in Indian government scholarships. Use the following retrieved documents to answer the user's question accurately.ONLY use information from the retrieved documents.\n\nRetrieved Documents:\n{str(search_results)}"},
    {"role": "user", "content": user_prompt},
]
inputs = tokenizer.apply_chat_template(
	prompt,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)
pprint("Response with  RAG and with TinyLlama:")

outputs = model.generate(**inputs, max_new_tokens=500)
response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])
pprint(response)

# Print source documents in one line
sources = " | ".join([f"{doc['label']}" for doc in search_results])
print(f"\nüìö Sources: {sources}")

Token indices sequence length is longer than the specified maximum sequence length for this model (8452 > 2048). Running this sequence through the model will result in indexing errors


'Response with  RAG and with TinyLlama:'


OutOfMemoryError: CUDA out of memory. Tried to allocate 8.52 GiB. GPU 0 has a total capacity of 14.74 GiB of which 1.20 GiB is free. Process 99055 has 13.54 GiB memory in use. Of the allocated memory 13.38 GiB is allocated by PyTorch, and 39.09 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

<h4>üåê Launch interactive Gradio chatbot interface with full RAG pipeline</h4>

In [None]:
import gradio as gr

def scholarship_chatbot(message, history):
    # Encode user query
    query_vector = encoder.encode(message).tolist()
    
    # Search for relevant scholarships
    hits = qdrant.query_points(
        collection_name=collection_name,
        query=query_vector,
        limit=3
    )
    
    search_results = [hit.payload for hit in hits.points]
    
    # Generate response with LLM
    prompt = [
        {"role": "system", "content": f"You are a helpful chatbot specializing in Indian government scholarships. Use the following retrieved documents to answer accurately:\n\nRetrieved Documents:\n{str(search_results)}"},
        {"role": "user", "content": message}
    ]
    
    inputs = tokenizer.apply_chat_template(
        prompt,
        add_generation_prompt=True,
        tokenize=True,
        return_dict=True,
        return_tensors="pt",
    ).to(model.device)
    
    outputs = model.generate(**inputs, max_new_tokens=1024)
    response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])
    
    # Add source documents in one line
    sources = " | ".join([f"{doc['label']}" for doc in search_results])
    response_with_sources = f"{response}\n\n Sources: {sources}"
    
    return response_with_sources

# Launch Gradio interface
demo = gr.ChatInterface(
    scholarship_chatbot,
    title="üéì Indian Government Scholarship Chatbot",
    description="Ask me about Indian government scholarships!",
    examples=[
        "What scholarships are available for engineering students?",
        "Tell me about AICTE scholarships",
        "Are there scholarships for women in STEM?"
    ]
)

demo.launch()