# Retrieval-Augmented Generation (RAG) on Research Paper

## Objective
To implement a Retrieval-Augmented Generation (RAG) pipeline on the research paper:

**"AI Driven Crop Disease Detection and Management System"**

The system will:
- Parse the PDF
- Split text into semantic chunks
- Generate embeddings
- Store vectors using FAISS
- Retrieve relevant context
- Generate grounded answers using FLAN-T5


In [1]:
!pip install -q sentence-transformers faiss-cpu pypdf transformers accelerate


[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m23.8/23.8 MB[0m [31m26.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m330.6/330.6 kB[0m [31m27.8 MB/s[0m eta [36m0:00:00[0m
[?25h

## Step 1: Import Required Libraries


In [2]:
import faiss
import numpy as np
from pypdf import PdfReader
from sentence_transformers import SentenceTransformer
from transformers import pipeline


## Step 2: Load the Research Paper PDF


In [3]:
reader = PdfReader("IJISRT25NOV542.pdf")

text = ""
for page in reader.pages:
    text += page.extract_text()

print("Total Characters:", len(text))


Total Characters: 14242


## Step 3: Split Text into Overlapping Chunks

Chunking improves retrieval quality by breaking long documents into smaller semantic units.


In [4]:
def chunk_text(text, chunk_size=800, overlap=150):
    chunks = []
    start = 0
    while start < len(text):
        end = start + chunk_size
        chunks.append(text[start:end])
        start += chunk_size - overlap
    return chunks

chunks = chunk_text(text)

print("Total Chunks:", len(chunks))


Total Chunks: 22


## Step 4: Generate Embeddings

We use SentenceTransformer (all-MiniLM-L6-v2) to convert text chunks into vector representations.


In [5]:
embedder = SentenceTransformer("all-MiniLM-L6-v2")

embeddings = embedder.encode(chunks, show_progress_bar=True)


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]



model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

Loading weights:   0%|          | 0/103 [00:00<?, ?it/s]

BertModel LOAD REPORT from: sentence-transformers/all-MiniLM-L6-v2
Key                     | Status     |  | 
------------------------+------------+--+-
embeddings.position_ids | UNEXPECTED |  | 

Notes:
- UNEXPECTED	:can be ignored when loading from different task/architecture; not ok if you expect identical arch.


tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

## Step 5: Store Embeddings in FAISS Vector Database

FAISS enables efficient similarity search over dense vectors.


In [6]:
dimension = embeddings.shape[1]
index = faiss.IndexFlatL2(dimension)

index.add(np.array(embeddings))

print("Total Vectors in FAISS:", index.ntotal)


Total Vectors in FAISS: 22


## Step 6: Create Retrieval Function

This function retrieves top-k relevant chunks for a given query.


In [7]:
def retrieve(query, top_k=3):
    query_embedding = embedder.encode([query])
    distances, indices = index.search(np.array(query_embedding), top_k)
    results = [chunks[i] for i in indices[0]]
    return results


## Step 7: Load Language Model (FLAN-T5)

We use FLAN-T5 for answer generation.


In [16]:
from transformers import pipeline

generator = pipeline(
    "text-generation",
    model="google/flan-t5-base",
    max_new_tokens=256,
    temperature=0.3
)


Loading weights:   0%|          | 0/282 [00:00<?, ?it/s]

Passing `generation_config` together with generation-related arguments=({'max_new_tokens', 'temperature'}) is deprecated and will be removed in future versions. Please pass either a `generation_config` object OR all generation parameters explicitly, but not both.
The model 'T5ForConditionalGeneration' is not supported for text-generation. Supported models are ['PeftModelForCausalLM', 'AfmoeForCausalLM', 'ApertusForCausalLM', 'ArceeForCausalLM', 'AriaTextForCausalLM', 'BambaForCausalLM', 'BartForCausalLM', 'BertLMHeadModel', 'BertGenerationDecoder', 'BigBirdForCausalLM', 'BigBirdPegasusForCausalLM', 'BioGptForCausalLM', 'BitNetForCausalLM', 'BlenderbotForCausalLM', 'BlenderbotSmallForCausalLM', 'BloomForCausalLM', 'BltForCausalLM', 'CamembertForCausalLM', 'LlamaForCausalLM', 'CodeGenForCausalLM', 'CohereForCausalLM', 'Cohere2ForCausalLM', 'CpmAntForCausalLM', 'CTRLLMHeadModel', 'CwmForCausalLM', 'Data2VecTextForCausalLM', 'DbrxForCausalLM', 'DeepseekV2ForCausalLM', 'DeepseekV3ForCausalL

## Step 8: Build RAG Pipeline

The system:
1. Retrieves relevant chunks
2. Injects them into a prompt
3. Generates grounded response


In [17]:
def ask_question(query):
    retrieved_docs = retrieve(query)
    context = "\n".join(retrieved_docs)

    prompt = f"""
    Answer the question using only the context below.

    Context:
    {context}

    Question:
    {query}

    Answer:
    """

    response = generator(prompt)[0]["generated_text"]

    # Remove prompt from response if repeated
    answer = response.replace(prompt, "").strip()

    return answer


## Step 9: Test the RAG System


In [18]:
ask_question("What accuracy did the model achieve?")


Token indices sequence length is longer than the specified maximum sequence length for this model (656 > 512). Running this sequence through the model will result in indexing errors
Both `max_new_tokens` (=256) and `max_length`(=20) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


''

In [19]:
ask_question("Describe the proposed hybrid model.")


Both `max_new_tokens` (=256) and `max_length`(=20) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


''

In [20]:
ask_question("What environmental parameters were used?")


Both `max_new_tokens` (=256) and `max_length`(=20) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


''

In [21]:
ask_question("What are the future enhancements?")

Both `max_new_tokens` (=256) and `max_length`(=20) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


''

# Conclusion

This implementation demonstrates a complete Retrieval-Augmented Generation (RAG) pipeline:

PDF → Chunking → Embeddings → FAISS Index → Retrieval → Context Injection → LLM Answer Generation

The system ensures grounded and context-aware responses derived directly from the research paper.
