# **Retrieval-Augmented Generation (RAG) System with GPT-2: Document Retrieval, Text Generation, and Query Response**


This notebook demonstrates a Retrieval-Augmented Generation (RAG) system using GPT-2. It combines document retrieval and text generation by encoding documents, retrieving relevant ones based on queries, and generating coherent responses. It showcases a practical approach to enhancing query answering with GPT-2.

A **RAG (Retrieval-Augmented Generation**) system is a hybrid model that combines two powerful components: information retrieval and text generation. It is designed to answer questions or generate responses by retrieving relevant documents from a knowledge base and then generating a coherent answer based on that information.

### Install Libraries

In [126]:
!pip install sentence-transformers -q
!pip install transformers -q
!pip install torch -q

### Importing Libraries

In [127]:
from sentence_transformers import SentenceTransformer, util
import torch
from transformers import GPT2LMHeadModel, GPT2Tokenizer

In [128]:
# Mount Google Drive to access files
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


### Loading pre-trained embedding model and GPT-2 model

This is fine-tuned GPT-2 model specifically designed to answer questions based on the provided context. The model has been fine-tuned to generate accurate and contextually relevant responses

In [129]:
# Initialize the SentenceTransformer model for generating embeddings from text.
embedding_model = SentenceTransformer('all-MiniLM-L6-v2')

# Load the tokenizer for GPT-2 from the specified directory.
tokenizer = GPT2Tokenizer.from_pretrained('/content/drive/MyDrive/gpt2-finetuned')

# Load the fine-tuned GPT-2 model from the specified directory.
gpt2_model = GPT2LMHeadModel.from_pretrained('/content/drive/MyDrive/gpt2-finetuned/checkpoint-1680')

### Seting the pad token to be the same as the eos token

In [130]:
# Set the padding token to be the same as the end-of-sequence (EOS) token.
# This ensures that padding tokens used during text generation or batching
tokenizer.pad_token = tokenizer.eos_token

### Example documents

In [131]:
documents = [
    "The Eiffel Tower is in Paris.",
    "The Great Wall of China is in China.",
    "Mount Everest is the highest mountain in the world.",
    "Praveen is a machine learning engineer.",
    "Mount Everest, the highest peak in the world.",
    "The Mona Lisa is a famous painting by Leonardo da Vinci.",
    "The Amazon Rainforest is located in South America.",
    "The Pacific Ocean is the largest ocean on Earth.",
    "Albert Einstein developed the theory of relativity.",
    "The Pyramids of Giza are one of the Seven Wonders of the Ancient World.",
    "The Sahara Desert is the largest hot desert in the world.",
    "Vincent van Gogh was a Dutch post-impressionist painter.",
    "The Taj Mahal is located in Agra, India.",
    "The Grand Canyon is a large canyon in the state of Arizona, USA.",
    "Shakespeare wrote many famous plays including 'Romeo and Juliet'.",
    "The Colosseum is an ancient amphitheater located in Rome, Italy.",
    "The Galápagos Islands are known for their unique wildlife.",
    "Leonardo da Vinci was also an inventor and scientist.",
    "The Berlin Wall once divided East and West Berlin.",
    "The Great Barrier Reef is the largest coral reef system in the world."
]


### Generating embeddings for documents

In [132]:
# Encode the list of documents into vectors (embeddings) using the embedding model.
doc_embeddings = embedding_model.encode(documents, convert_to_tensor=True)

### Function for retrieve document

In [133]:
def retrieve_documents(query, top_k=1):
    # Encode the query into a vector (embedding) using the embedding model.
    query_embedding = embedding_model.encode(query, convert_to_tensor=True)

    # Compute the cosine similarity between the query embedding and the document embeddings.
    similarities = util.pytorch_cos_sim(query_embedding, doc_embeddings)

    # Get the indices of the top_k most similar documents
    top_results = torch.topk(similarities, k=top_k)

    return [documents[idx] for idx in top_results.indices[0]]

### Function to generate a response using GPT-2

In [134]:
def stop_generation(decoded_text):
    # Check if '<que>' token is in the generated text
    return '<que>' in decoded_text

def generate_response(query):
    # retrieve relevant document
    relevant_docs = retrieve_documents(query)
    context = " ".join(relevant_docs)

    # Prepare input for GPT-2
    prompt = f"{context}\n<que>{query}"
    inputs = tokenizer(prompt, return_tensors="pt", max_length=1024, truncation=True, padding=True)

    # Initialize generation process
    generated_text = ""
    input_ids = inputs.input_ids
    attention_mask = inputs["attention_mask"]

    # Loop to generate text manually and stop if '<que>' token is detected
    while True:
        outputs = gpt2_model.generate(
            input_ids,
            attention_mask=attention_mask,
            max_length=len(input_ids[0]) + 1,  # Generate 1 more tokens at a time
            pad_token_id=tokenizer.pad_token_id,
            num_beams=5,
            num_return_sequences=1,
            no_repeat_ngram_size=2,
            eos_token_id=tokenizer.eos_token_id,
            early_stopping=False,  # Disable early stopping
        )

        # Decode the generated tokens and add to the current generated text
        new_generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
        generated_text += new_generated_text[len(generated_text):]  # Append new tokens

        # Check if '<que>' token is found in the text
        if stop_generation(generated_text):
            break  # Stop if the '<que>' token is found

        # Update input_ids for the next loop
        input_ids = outputs

    return generated_text.split("<que>")[0].strip()



### Get predictions

In [136]:
# Get the answers to a given question.
query = "who is praveen"
response = generate_response(query)
print(response)

Praveen is a machine learning engineer.
