# Builing a RAG Pipeline
The purpose of this project is to build a RAG pipeline in order to query academic documents. In particular, we will begin to look at math and machine learning research. This project will go through the following steps:

1) installing the required libraries
2) preparing the knowledge base
3) creating embeddings for the knowledge base
4) encoding the user query
5) retrieving relevant documents
6) combining the query with the retrived document
7) generating a response using GPT-2

### Installing Libraries

In [1]:
#!pip install sentence-transformers transformers faiss-cpu

### Preparing the Knowledge Base
Creating a list of strings. Ideally want to find academic papers on math and machine learning topics as the knowledge base.             

In [2]:
#example knowledge base
knowledge_base = [
    "The Tesla Model S has an estimated range of up to 370 miles on a single charge.",
    "The Tesla Model 3 is a more affordable option with a range of up to 350 miles.",
    "The Tesla Model X offers a range of around 340 miles and features falcon-wing doors.",
    "Tesla's autopilot system is a suite of advanced driver-assistance system features."
]

### Creating Embeddings for the Knowledge Base
sentence-transformers will be used to generate the embeddings for the documents in the knowledge base. Sentence-transformers contain pre-trained embedding (encoding) models, this provides flexibility. This project will use the 'all-MiniLM-L6-v2' model, which has been specifically designed for sentence embeddings. It is based on nthe MiniML architecture which is a smaller version of the BERT model; it is 6 layers deep. Because of its samller size, it is more computationally efficient.

In [3]:
#imports
from sentence_transformers import SentenceTransformer
import numpy as np

  from tqdm.autonotebook import tqdm, trange





In [4]:
#loading a pre-trained model for sentence embedding
embedder = SentenceTransformer('all-MiniLM-L6-v2')

#generating the embeddings of the knowledgebase
kb_embeddings = embedder.encode(knowledge_base, convert_to_tensor = True)


To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


### Encoding the User Query
In order for the model to ingest the query, it must be embedded. 

In [6]:
#example query
query = "What is the range of the Tesla Model S"

#embedding the query
query_embedding = embedder.encode(query, convert_to_tensor = True)

### Retrieving the Relevant Documents
Using the cosine similarity metric to find the most relevant document(s) from the knowledge base.

In [7]:
import torch

In [8]:
#calculting the cosine similarity between the query embedding and the knowledge base embeddings
cosine_scores = torch.nn.functional.cosine_similarity(query_embedding, kb_embeddings)

### retrieving the top 1 most similar document ###

#finding the pair find with best cosine score
top_k = torch.topk(cosine_scores, k=1)

#getting the index of the document with the best cosine score
retrieved_doc_idx = top_k.indices[0].item()

#retrieving the document using the index from the above step
retrieved_doc = knowledge_base[retrieved_doc_idx]

#printing the retrieved doc
print(f"Retrieved Document: {retrieved_doc}")


Retrieved Document: The Tesla Model S has an estimated range of up to 370 miles on a single charge.


### Combining the Query and the Retrieved Document
This step is necessary to provide more context for the generative model.

In [9]:
combined_input = f"Query: {query}\nContext: {retrieved_doc}\nAnswer:"
combined_input

'Query: What is the range of the Tesla Model S\nContext: The Tesla Model S has an estimated range of up to 370 miles on a single charge.\nAnswer:'

### Generative a Response using GPT-2
Using this pre-trained model to generate a response.

In [10]:
from transformers import GPT2LMHeadModel, GPT2Tokenizer

In [11]:
#loading the pre-trained models
model = GPT2LMHeadModel.from_pretrained('gpt2')
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')


In [15]:
#encoding the combined input
input_ids = tokenizer.encode(combined_input, return_tensors = 'pt')

#set the attention mask
attention_mask = torch.ones(input_ids.shape, dtype=torch.long)

#generating the response
# Generate a response with adjusted generation parameters
output = model.generate(
    input_ids,
    attention_mask=attention_mask,
    max_length=100,                 # Increased max length for a longer response
    num_return_sequences=1,         # Return only one sequence
    pad_token_id=tokenizer.eos_token_id,   # Use EOS token for padding
    temperature=0.3,                # Lower temperature for more focused output
    top_p=0.8,                      # Nucleus sampling for more coherence
    top_k=40,                       # Use top-k sampling to reduce unlikely words
    no_repeat_ngram_size=3          # Avoid repetitive phrases
)

generated_text = tokenizer.decode(output[0], skip_special_tokens= True)

#printing the response
print(f"Retrieved Document: {retrieved_doc}")
print(f"Generated Response: {generated_text}")

Retrieved Document: The Tesla Model S has an estimated range of up to 370 miles on a single charge.
Generated Response: Query: What is the range of the Tesla Model S
Context: The Tesla Model S has an estimated range of up to 370 miles on a single charge.
Answer: The range of a Tesla Model X is based on the range and weight of the vehicle.
The range of an electric vehicle is based upon the weight of its battery pack.
Tesla Model S is a vehicle that is designed to be driven on a highway.
It is designed for use in a variety of situations.

