<a href="https://colab.research.google.com/github/eliezerkapish/RAG-Example-1/blob/main/RAG_Example_1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# RAG Example 1 by Scoras Academy 'https://github.com/Scoras-Academy/RAG'

In [1]:
!pip install transformers faiss-cpu torch scikit-learn

Collecting faiss-cpu
  Downloading faiss_cpu-1.9.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.4 kB)
Downloading faiss_cpu-1.9.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (27.5 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m27.5/27.5 MB[0m [31m24.1 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: faiss-cpu
Successfully installed faiss-cpu-1.9.0


In [3]:
from transformers import AutoTokenizer, AutoModel, AutoModelForSeq2SeqLM
import torch
import faiss
import warnings
from sklearn.metrics.pairwise import cosine_similarity

# Example data
documents = [
    "The quick brown fox jumps over the lazy dog.",
    "Python is a high-level programming language.",
]

# Load Tokenizer and Model
tokenizer = AutoTokenizer.from_pretrained("sentence-transformers/bert-base-nli-mean-tokens")
model = AutoModel.from_pretrained("sentence-transformers/bert-base-nli-mean-tokens")

# Embed function to generate document embeddings
def embed(texts):
    inputs = tokenizer(texts, padding=True, truncation=True, return_tensors="pt")
    with torch.no_grad():
        embeddings = model(**inputs).last_hidden_state.mean(dim=1)
    return embeddings.numpy()

# Generate embeddings for the documents
document_embeddings = embed(documents)

# Create a FAISS index
index = faiss.IndexFlatL2(document_embeddings.shape[1])
index.add(document_embeddings)

# Load Generator Model
generator_tokenizer = AutoTokenizer.from_pretrained("t5-small")
generator_model = AutoModelForSeq2SeqLM.from_pretrained("t5-small")

# Search function for question answering
def rag(question, top_k=1):
    # Embed the question
    question_embedding = embed([question])
    # Retrieve the most similar documents
    distances, indices = index.search(question_embedding, top_k)
    retrieved_docs = [documents[idx] for idx in indices[0]]
    # Concatenate retrieved documents with the question
    input_text = " ".join(retrieved_docs) + " " + question
    inputs = generator_tokenizer.encode(input_text, return_tensors="pt", max_length=512, truncation=True)
    # Generate an answer
    outputs = generator_model.generate(inputs, max_length=150, num_beams=4, early_stopping=True)
    answer = generator_tokenizer.decode(outputs[0], skip_special_tokens=True)
    return answer

# Example usage
question = "What is the high-level programming language?"
print(rag(question))


Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.


config.json:   0%|          | 0.00/1.21k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/242M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

Python is a high-level programming language.


## Code Exaplanation


1.Embeddings Documentation
We use an embeddings model to convert documents into numerical vectors.

2.FAISS Index
We create an index for efficient search of similar vectors.

3.Model Generator
We load a text generation model (T5-small).

4.RAG Function
The RAG function performs the following steps:

    Retrieves the most relevant documents for the question.
    Combines the retrieved documents with the question.
    Generates the answer using the generator model.

5.Use Example
We ask about the programming language, and the model generates the answer.

