<a href="https://colab.research.google.com/github/solomontessema/Generative-AI-with-Python/blob/main/notebooks/Introduction_to_RAG.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<table>
  <tr>
    <td><img src="https://ionnova.com/img/ionnova_logo_name_2.png" width="120px"></td>
    <td><h1>Day 12: RAG (Retrieval-Augmented Generation)</h1></td>
  </tr>
</table>

## 🔍 What is RAG?

Retrieval-Augmented Generation (RAG) is a technique that enhances language model responses by injecting external context—often referred to as a **knowledge base**—into the prompt. This allows the model to generate more accurate, grounded, and domain-specific answers.

In this section, we demonstrate a simple RAG setup using a plain text file (`knowledge_base.txt`) as the knowledge source. When a user submits a question, the system appends it to the contents of the file and sends the combined context to a language model (LLM) for inference.

### 🧪 Workflow Overview:
- Load knowledge base from `knowledge_base.txt`
- Append user query to the knowledge base
- Pass the combined context to GPT
- Return a contextualized response

> This foundational approach sets the stage for more advanced RAG systems using semantic search and vector databases.


In [None]:
from openai import OpenAI
from dotenv import load_dotenv
import os

# Load environment variables
load_dotenv()
API_KEY = os.getenv("GPT_API_KEY")
client = OpenAI(api_key=API_KEY)

# Load knowledge base (simple text chunks for demo)
def load_knowledge_base():
    with open("knowledge_base.txt", "r", encoding="utf-8") as f:
        chunks = f.read().split("\n\n")  # Assume chunks are separated by double newlines
        #print(chunks)
    return chunks


def chat_with_rag():
    print("Welcome to IonnovaBot! Type 'exit' to quit.\n")
    messages = [{"role": "system", "content": "You are a helpful assistant that uses external knowledge to answer questions."}]
    kb_chunks = load_knowledge_base()

    while True:
        user_input = input("You: ")
        if user_input.lower() == "exit":
            print("IonnovaBot: Goodbye!")
            break

        # Retrieve context

        context = "\n---\n".join(kb_chunks)

        # Inject context into prompt
        messages.append({"role": "user", "content": f"Context:\n{context}\n\nQuestion: {user_input}"})

        response = client.chat.completions.create(
            model="gpt-4",
            messages=messages,
            temperature=0.7
        )

        reply = response.choices[0].message.content
        print(f"IonnovaBot: {reply}")
        messages.append({"role": "assistant", "content": reply})

chat_with_rag()


## 🔁 Retrieval-Augmented Generation with Pinecone

This section builds on the previous text-based RAG demo by introducing a vector database (Pinecone) for semantic search and scalable context retrieval. Instead of injecting the entire knowledge base into the prompt, we now embed each chunk using OpenAI's embedding model and store it in Pinecone for efficient similarity-based querying.

### 🔍 What’s New:
- Embeds knowledge base chunks using `text-embedding-ada-002`
- Stores vectors in Pinecone with namespace partitioning
- Retrieves top-k relevant chunks based on user query embeddings
- Injects only the most relevant context into GPT-4 prompts
- Enables scalable, multi-tenant, and agentic workflows

### 🧰 Benefits Over Static RAG:
- Faster and more focused responses
- Reduced token usage and prompt clutter
- Modular architecture for dynamic knowledge injection
- Supports real-time updates and multi-domain retrieval

> This upgrade transforms IonnovaBot into a context-aware assistant capable of reasoning over large, evolving knowledge bases with precision and speed.


In [None]:
from openai import OpenAI
from pinecone import Pinecone, ServerlessSpec
from dotenv import load_dotenv
import os

# Load environment variables
load_dotenv()
OPENAI_API_KEY = os.getenv("GPT_API_KEY")
PINECONE_API_KEY = os.getenv("PINECONE_API_KEY")
client = OpenAI(api_key=OPENAI_API_KEY)

# Initialize Pinecone
pc = Pinecone(api_key=PINECONE_API_KEY)
index_name = "ionnova-rag"
if index_name not in pc.list_indexes():
    pc.create_index(
        name=index_name,
        dimension=1536,
        metric="cosine",
        spec=ServerlessSpec(cloud="aws", region="us-east-1")
    )
index = pc.Index(index_name)

# Load and embed knowledge base
def load_and_embed_kb():
    with open("knowledge_base.txt", "r", encoding="utf-8") as f:
        chunks = f.read().split("\n\n")
    for i, chunk in enumerate(chunks):
        embedding = client.embeddings.create(
            model="text-embedding-ada-002",
            input=chunk
        ).data[0].embedding
        index.upsert([(f"chunk-{i}", embedding, {"text": chunk})],namespace="ionnova")

# Retrieve top-k relevant chunks
def retrieve_context(query, top_k=5):
    query_embedding = client.embeddings.create(
        model="text-embedding-ada-002",
        input=query
    ).data[0].embedding
    results = index.query(vector=query_embedding, top_k=top_k, include_metadata=True, namespace="ionnova")
    return [match["metadata"]["text"] for match in results["matches"]]

# Chat loop
def chat_with_rag():
    print("Welcome to IonnovaBot! Type 'exit' to quit.\n")
    messages = [{"role": "system", "content": "You are a helpful assistant that uses external knowledge to answer questions."}]

    while True:
        user_input = input("You: ")
        if user_input.lower() == "exit":
            print("IonnovaBot: Goodbye!")
            break

        context_chunks = retrieve_context(user_input)
        context = "\n---\n".join(context_chunks)

        messages.append({"role": "user", "content": f"Context:\n{context}\n\nQuestion: {user_input}"})

        response = client.chat.completions.create(
            model="gpt-4",
            messages=messages,
            temperature=0.7
        )

        reply = response.choices[0].message.content
        print(f"IonnovaBot: {reply}")
        messages.append({"role": "assistant", "content": reply})

# Run once to populate Pinecone
load_and_embed_kb()

chat_with_rag()


In [None]:
EMBED_MODEL = "text-embedding-3-small"  # 1536-dim
EMBED_DIM = 1536

def embed(text: str):
    """Return a single 1536-dim embedding vector for the given text."""
    resp = client.embeddings.create(model=EMBED_MODEL, input=[text])
    return resp.data[0].embedding


query_text = "Who are Ionnova."
query_vec = embed(query_text)

def run_query(index, vector, top_k=5, with_metadata=True):
    return index.query(
        vector=vector,
        top_k=top_k,
        include_metadata=with_metadata
    )

res_cosine = run_query(index, query_vec)
print(res_cosine)