<a href="https://colab.research.google.com/github/patdring/GenerativeAI/blob/main/RAG_System_Photosynthesis_Example.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Retrieval-Augmented Generation (RAG) Example

This notebook demonstrates a simple Retrieval-Augmented Generation (RAG) system using Python. The system combines a retrieval module and a generative model to answer questions based on a small database of documents.

## Steps:
1. **Document Preparation**: We create a small database with example documents.
2. **Retrieval Module**: We use FAISS to find the most similar documents to a given query.
3. **Generative Model**: We use GPT-2 to generate an answer based on the retrieved documents.

### Instructions:
1. Run the code cells step by step to see how the RAG system works.
2. Modify the example query to test with different questions.

In [3]:
!pip install transformers faiss-cpu



In [4]:
import faiss
import numpy as np
from transformers import GPT2LMHeadModel, GPT2Tokenizer, pipeline
from sklearn.feature_extraction.text import TfidfVectorizer

# Example database of documents
documents = [
    "Photosynthesis is a process used by plants and other organisms to convert light energy into chemical energy.",
    "Photosynthesis in plants generally involves the green pigment chlorophyll and generates oxygen as a by-product.",
    "The process of photosynthesis occurs in the chloroplasts of plant cells, where sunlight is used to convert carbon dioxide and water into glucose and oxygen.",
    "Chlorophyll absorbs light most efficiently in the blue portion of the electromagnetic spectrum followed by the red portion.",
    "Photosynthesis is essential for life on Earth as it provides the primary source of energy for nearly all organisms."
]

# Step 1: Vectorize documents
vectorizer = TfidfVectorizer()
X = vectorizer.fit_transform(documents)
index = faiss.IndexFlatL2(X.shape[1])
index.add(X.toarray())

# Function to retrieve the most similar documents to a query
def retrieve_documents(query, k=3):
    query_vec = vectorizer.transform([query]).toarray()
    D, I = index.search(query_vec, k)
    return [documents[i] for i in I[0]]

# Step 2: Load GPT-2 model
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
model = GPT2LMHeadModel.from_pretrained("gpt2")
generator = pipeline('text-generation', model=model, tokenizer=tokenizer)

# Step 3: Generate answer
def generate_answer(question, retrieved_docs):
    context = " ".join(retrieved_docs)
    input_text = f"Question: {question}\nContext: {context}\nAnswer:"
    generated = generator(input_text, max_length=150, num_return_sequences=1)
    return generated[0]['generated_text']

# Example query
question = "How does photosynthesis work?"
retrieved_docs = retrieve_documents(question)
answer = generate_answer(question, retrieved_docs)

# Display results
print("Retrieved Documents:")
for doc in retrieved_docs:
    print("-", doc)
print("\nGenerated Answer:")
print(answer)


Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Retrieved Documents:
- Photosynthesis in plants generally involves the green pigment chlorophyll and generates oxygen as a by-product.
- Photosynthesis is a process used by plants and other organisms to convert light energy into chemical energy.
- Photosynthesis is essential for life on Earth as it provides the primary source of energy for nearly all organisms.

Generated Answer:
Question: How does photosynthesis work?
Context: Photosynthesis in plants generally involves the green pigment chlorophyll and generates oxygen as a by-product. Photosynthesis is a process used by plants and other organisms to convert light energy into chemical energy. Photosynthesis is essential for life on Earth as it provides the primary source of energy for nearly all organisms.
Answer: Light is a source of life and energy, and photosynthesis is necessary to provide for the life to grow and maintain and provide food.
