# RAG Model for Document-Based Question Answering
**Author:** Khai Ta  
**Date:** November 2024

In this project, I implement a Retrieval-Augmented Generation (RAG) model using OpenAI's API to answer questions about Lionel Messi. The model retrieves relevant text chunks from a document, generates embeddings for both the text and the question, calculates cosine similarity to find relevant chunks, and sends a contextualized prompt to the LLM for a final answer.

## 1. Importing Libraries

We import the `openai` library for using OpenAI's API and `numpy` for handling numerical data in embeddings.

In [None]:
from openai import OpenAI
import numpy as np

## 2. Setting Up OpenAI API Key

We set up the OpenAI API key to access embeddings and language model completions. Make sure to replace `"YOUR_API_KEY"` with your actual OpenAI API key.

In [None]:
client = OpenAI(api_key="YOUR_API_KEY")

## 3. Loading and Chunking the Document

We load the document `messi.txt` and divide it into smaller chunks of text. This step ensures that the document is manageable for embedding generation and makes it easier to retrieve relevant sections based on questions.

In [None]:
def load_and_chunk(file_path, chunk_size=100):
    with open(file_path, "r") as file:
        text = file.read()

    result = []
    for i in range(0, len(text), chunk_size):
      result.append(text[i:i + chunk_size])

chunks = load_and_chunk("messi.txt")

## 4. Generating Embeddings for Document Chunks

Using OpenAI’s Embedding API, we create embeddings for each chunk in `messi.txt`. These embeddings represent each chunk as numerical vectors, which we later use to find the most relevant chunks based on similarity to the question.

In [None]:
def get_embedding(text):
    response = client.embeddings.create(input=text, model="text-embedding-3-small")
    return response.data[0].embedding

embeddings_dict = {}
for chunk in chunks:
    embeddings_dict[chunk] = get_embedding(chunk)

## 5. Calculating Cosine Similarity and Retrieving Relevant Chunks

We generate an embedding for the question and then calculate cosine similarity between the question's embedding and each chunk's embedding. This allows us to identify the top 5 most relevant chunks, which we will include in the prompt for the LLM.

In [None]:
def cosine_similarity(vec1, vec2):
    vec1 = np.array(vec1)
    vec2 = np.array(vec2)
    return np.dot(vec1, vec2) / (np.linalg.norm(vec1) * np.linalg.norm(vec2))

def find_most_relevant_chunks(question, embeddings_dict, top_n=5):
    question_embedding = get_embedding(question)
    similarities = []

    for chunk, emb in embeddings_dict.items():
        similarity = cosine_similarity(question_embedding, emb)
        similarities.append((chunk, similarity))

    similarities.sort(key=lambda x: x[1], reverse=True)

    top_chunks = []
    for chunk, _ in similarities[:top_n]:
        top_chunks.append(chunk)

    return top_chunks

question = "What achievements has Messi had with the Argentina national team?"
top_chunks = find_most_relevant_chunks(question, embeddings_dict)

## 6. Creating the Prompt for the Language Model

We combine the question with the top relevant chunks to form a single prompt. This prompt will provide the LLM with both the question and the context necessary to generate an accurate answer.

In [None]:
def create_prompt(question, top_chunks):
    prompt = f"Question: {question}\n\nContext:\n" + "\n".join(top_chunks)
    return prompt

prompt = create_prompt(question, top_chunks)

## 7. Querying the Language Model for an Answer

Using the prompt, we query OpenAI’s language model (e.g., GPT-4) to retrieve an answer based on the question and contextual chunks from the document.

In [None]:
def get_answer_from_llm(prompt):
    response = client.ChatCompletion.create(
        model="gpt-4",
        messages=[{"role": "user", "content": prompt}]
    )
    return response['choices'][0]['message']['content']

answer = get_answer_from_llm(prompt)
print(answer)

##Conclusion
We successfully used OpenAI's API to extract relevant information from a document and generate accurate answers to specific questions. By chunking the document, generating embeddings, and calculating cosine similarity, we identified the most relevant sections of the text. This process allowed us to create a prompt with contextual information, which was then used to query the language model for an informed response.