<a href="https://colab.research.google.com/drive/1aaU4YZC-fswSImo1fV-w67FXPQg5Ictm?usp=sharing" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"></a>

### 📊 What is Vector Embeddings?

Vector embedding is a way to represent words, phrases, or texts as numerical vectors in a multi-dimensional space. This helps the model understand language better by capturing meanings and relationships between words.

![Vector Embedding](https://qdrant.tech/articles_data/what-are-embeddings/BERT-model.jpg)
Source: [Qdrant Blog](https://qdrant.tech/articles/what-are-embeddings/)



#### Embedding Models • Vector Stores • Vector Embeddings (Guide) → [PDF](https://github.com/genieincodebottle/generative-ai/blob/main/docs/vector-embeddings-guide.pdf)

## Install required libraries

In [2]:
!pip install -qU \
     langchain \
     langchain-chroma \
     langchain-community \
     einops

## Import related libraries

In [None]:
import os
import getpass
from langchain.vectorstores import Chroma
from langchain_community.document_loaders import WebBaseLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.prompts import PromptTemplate
from langchain.chains import RetrievalQA
from sklearn.metrics.pairwise import cosine_similarity

### Embedding Model Decision Flow

<br>

![MTEB Areana](https://raw.githubusercontent.com/genieincodebottle/generative-ai/main/images/embedding.png)

## A. Google's Embedding Model

### Provide Google API Key.

It can be used both for Gemini LLM  & Google Embedding Model. You can create Google API key using following link

- [Google API Key](https://aistudio.google.com/apikey)




In [None]:
from langchain_google_genai import ChatGoogleGenerativeAI, GoogleGenerativeAIEmbeddings

In [4]:
os.environ["GOOGLE_API_KEY"] = getpass.getpass()

··········


### 🔓 Free-Tier: Google Gemini Embeddings  

**Model:** `text-embedding-004`  

**Pros:** Free for experimentation, multilingual (100+), long context (3,072 tokens), multiple dimensions (768/1024), high quality, easy Google Cloud integration.  

**Cons:** Usage limits, closed-source, vendor lock-in, limited customization, infra dependency, possible service changes.  

🔗 [Docs](https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/text-embeddings-api#generative-ai-get-text-embedding-python_vertex_ai_sdk)  


# Implementing Basic RAG with Key Components

1. **Chroma:** Vector store for efficient similarity search  
2. **Embedding Model:** Google’s text-embedding model  
3. **ChatGoogleGenerativeAI:** Gemini LLM for response generation  
4. **Cosine Similarity:** For evaluating query–response–context relevance  

## Step 1: RAG in Action with Evaluation  

This implementation demonstrates the core workflow of a Basic RAG system:  

1. Chunking and embedding source documents  
2. Retrieving relevant documents via similarity search  
3. Generating responses using the retrieved context  
4. Evaluating response quality using similarity scores  

🔗 References:  
- [LangChain Chunking Strategies](https://python.langchain.com/v0.1/docs/modules/data_connection/document_transformers/)  
- [LangChain Vectorstores](https://python.langchain.com/v0.1/docs/modules/data_connection/vectorstores/)  


In [12]:
# Step 1: Initialize the Gemini language model
llm = ChatGoogleGenerativeAI(
    model="gemini-2.0-flash",
    temperature=0.3  # Adjust temperature or other parameters as needed
)

# Step 2: Load documents from a web URL
url = "https://en.wikipedia.org/wiki/Artificial_intelligence"
loader = WebBaseLoader(url)
data = loader.load()

# Step 3: Split text into chunks
# (Experiment with chunk_size and chunk_overlap for optimal results)
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=50
)
chunks = text_splitter.split_documents(data)

# Add unique IDs to each text chunk
for idx, chunk in enumerate(chunks):
    chunk.metadata["id"] = idx

# Step 4: Get embedding model
gemini_embeddings = GoogleGenerativeAIEmbeddings(
    model="models/text-embedding-004"
)

# Step 5: Create vector store using embeddings
vectorstore = Chroma.from_documents(chunks, gemini_embeddings)

# Step 6: Define query
query = "What are the main applications of artificial intelligence in healthcare?"

# Step 7: Retrieve relevant documents
docs = vectorstore.similarity_search(query, k=5)
context = "\n\n".join([doc.page_content for doc in docs])
retrieval_method = "Basic similarity search"

# Step 8: Generate response
prompt = f"{context}\n\nQuestion: {query}\nAnswer:"
final_response = llm.invoke(prompt).content

# Step 9: Print results
print(f"Query: {query}")
print("=========================")
print(f"Final Answer: {final_response}")
print("=========================")
print(f"Retrieval Method: {retrieval_method}")

# Step 2: RAG Evaluation  

1. Generate embeddings for **query**, **response**, and **context**  
2. Measure **cosine similarity** between query–response and response–context  
3. Derive an **overall relevance score** as the average of these similarities  


In [None]:
from sklearn.metrics.pairwise import cosine_similarity

# Step 10: Define evaluation function
def evaluate_response(query, embeddings, response, context):
    """
    Evaluate the relevance of the model's response by comparing embeddings.

    - Computes embeddings for query, response, and context
    - Calculates cosine similarities
    - Returns an average relevance score
    """
    # Compute embeddings
    query_embedding = embeddings.embed_query(query)
    response_embedding = embeddings.embed_query(response)
    context_embedding = embeddings.embed_query(context)

    # Compute cosine similarities
    query_response_similarity = cosine_similarity(
        [query_embedding], [response_embedding]
    )[0][0]

    response_context_similarity = cosine_similarity(
        [response_embedding], [context_embedding]
    )[0][0]

    # Compute overall relevance score (average)
    relevance_score = (
        query_response_similarity + response_context_similarity
    ) / 2

    return {
        "query_response_similarity": query_response_similarity,
        "response_context_similarity": response_context_similarity,
        "relevance_score": relevance_score,
    }

# Step 11: Evaluate the response
evaluation = evaluate_response(query, gemini_embeddings, final_response, context)

# Step 12: Print evaluation results
print("\nEvaluation Results")
print("=========================")
print(f"Query-Response Similarity   : {evaluation['query_response_similarity']:.4f}")
print(f"Response-Context Similarity : {evaluation['response_context_similarity']:.4f}")
print(f"Overall Relevance Score     : {evaluation['relevance_score']:.4f}")

# Similalry OpenAI and Huggingface Embedding models can be used as following

## B. OpenAI Embedding Model

### Provide OpenAI API Key.

If you want to use OpenAI Embedding. You can create OpenAI API key using following link

- [OpenAI API Key](https://platform.openai.com/settings/organization/api-keys)

In [None]:
!pip install -qU langchain-openai

In [None]:
os.environ["OPENAI_API_KEY"] = getpass.getpass()

### 💰 Paid: OpenAI Embedding Models  

**Models:** `text-embedding-3-small`, `text-embedding-3-large`, `ada v2`  

**Pros:** High-quality embeddings, multiple model sizes, seamless API integration, batch processing for cost efficiency, regularly updated, versatile across NLP tasks.  

**Cons:** Paid (costs can scale), closed-source, requires API key + internet, limited customization, data privacy considerations, subject to OpenAI policies.  

🔗 [Docs](https://platform.openai.com/docs/guides/embeddings/what-are-embeddings)  


In [None]:
from langchain_openai import OpenAIEmbeddings

openai_embeddings = OpenAIEmbeddings(model="text-embedding-3-large")

## B. Huggingface Embedding Model

### Provide Huggingface API Key.

If you want to use Huggingface Embedding Models. You can create Huggingface API key using following link

- [Huggingface API Key](https://huggingface.co/settings/tokens)




In [None]:
!pip install -qU Sentence-transformers \
                 langchain-huggingface

In [None]:
os.environ["HF_TOKEN"] = getpass.getpass()

### 🔓 Free: Hugging Face Open-Source Embeddings  

**Models:** gte-large-en-v1.5, bge-multilingual-gemma2, snowflake-arctic-embed-l, nomic-embed-text-v1.5, e5-mistral-7b-instruct, etc. → [MTEB Leaderboard](https://huggingface.co/spaces/mteb/leaderboard)  

**Pros:** Open-source, customizable, community-backed, Hugging Face integration, supports fine-tuning, broad NLP use cases.  
**Cons:** May underperform vs. commercial models, variable quality, limited support, high compute needs, community-depende


### Hugging Face: Nomic AI Embedding Model  

You can choose from various Hugging Face open-source embedding models depending on your use case, performance needs, and system constraints. Model rankings and benchmarks are available on the [MTEB Leaderboard](https://huggingface.co/spaces/mteb/leaderboard).  

**Popular Models:**  
1. `nomic-ai/nomic-embed-text-v1.5`  
2. `nomic-ai/nomic-embed-text-v1`  
3. `sentence-transformers/all-MiniLM-L12-v2`  
4. `sentence-transformers/all-MiniLM-L6-v2`  


In [None]:
from langchain_huggingface import HuggingFaceEmbeddings

# Change model_name as per your choosen huggingface embedding model
nomic_embeddings = HuggingFaceEmbeddings(model_name="nomic-ai/nomic-embed-text-v1.5", model_kwargs = {'trust_remote_code': True})