<a href="https://colab.research.google.com/drive/1aaU4YZC-fswSImo1fV-w67FXPQg5Ictm?usp=sharing" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"></a>

### 📊 What is Vector Embeddings?

Vector embedding is a way to represent words, phrases, or texts as numerical vectors in a multi-dimensional space. This helps the model understand language better by capturing meanings and relationships between words.

![Vector Embedding](https://qdrant.tech/articles_data/what-are-embeddings/BERT-model.jpg)
Source: [Qdrant Blog](https://qdrant.tech/articles/what-are-embeddings/)






## 📝 Step 1: Learn the Basics  
👉 **Embedding Models • Vector Stores • Vector Embeddings (Guide)**: [Read the PDF](https://github.com/genieincodebottle/generative-ai/blob/main/docs/vector-embeddings-guide.pdf)  

## 📦 Step 2: Install & Import required libraries

If you face a library installation error, simply re-run the next cell, this usually resolves it

In [1]:
# Install required libraries:
# - langchain: Core framework for building LLM-based apps (RAG, agents, chains, etc.)
# - langchain-chroma: Integration with Chroma vector store
# - langchain-community: Community-contributed loaders, tools, and integrations
# - langchain-google-genai: LangChain integration package to use Google’s Gemini LLMs and Embedding models
# - einops: Tensor operations library (used in some embedding/LLM models)

!pip install -qU \
     langchain \
     langchain-chroma \
     langchain-community \
     langchain-google-genai \
     einops

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/67.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m67.3/67.3 kB[0m [31m5.1 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.5/2.5 MB[0m [31m57.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m49.4/49.4 kB[0m [31m4.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m19.8/19.8 MB[0m [31m113.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.4/1.4 MB[0m [31m73.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m64.7/64.7 kB[0m [31m6.5 MB/s[0m eta [36m0:00:00[0m

In [2]:
import os   # For environment variable handling and file system operations
import getpass   # For securely entering API keys or passwords in notebooks

from langchain.vectorstores import Chroma   # Vector store for efficient similarity search
from langchain_community.document_loaders import WebBaseLoader   # Loader to fetch and load documents from web URLs
from langchain.text_splitter import RecursiveCharacterTextSplitter   # Splits large text into smaller overlapping chunks
from langchain.prompts import PromptTemplate   # Helps create reusable prompt templates
from langchain.chains import RetrievalQA   # Chain to perform RAG (retrieval + generation) workflow
from sklearn.metrics.pairwise import cosine_similarity   # For evaluating embeddings via cosine similarity



## 🧪 Step 3: Experiment with Google's Embedding Model (Free-tier)

**Embedding Models:**
- `text-embedding-004`: Use this for better free tier availability
- `gemini-embedding-001`: State-of-the-art performance across English, multilingual and code tasks. It unifies the previously specialized models like text-embedding-005 and text-multilingual-embedding-002 and achieves better performance in their respective domains  

🔗 [Docs](https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/text-embeddings-api#generative-ai-get-text-embedding-python_vertex_ai_sdk)  

🔗 [Research Paper - Gemini Embedding](https://arxiv.org/abs/2503.07891)








In [3]:
# ChatGoogleGenerativeAI → Wrapper to use Google Gemini LLM for chat/QA tasks
# GoogleGenerativeAIEmbeddings → Wrapper to use Google’s embedding models for vector representations
from langchain_google_genai import ChatGoogleGenerativeAI, GoogleGenerativeAIEmbeddings


## 🔑 Step 4: Generate Google API Key  

- The same API key works for both **Gemini LLM** and **Google Embedding Models**.  
- You can create your key here:  

  - [Get Google API Key](https://aistudio.google.com/apikey)  

- Once generated, **paste your API key** in the next step.  


In [4]:
os.environ["GOOGLE_API_KEY"] = getpass.getpass()

··········


# Step 5: Implementing Basic RAG with Key Components

This implementation demonstrates the core workflow of a Basic RAG system:  

1. Chunking and embedding source documents  
2. Retrieving relevant documents via similarity search  
3. Generating responses using the retrieved context  
4. Evaluating response quality using similarity scores  

🛠️ **Tech Stack**:

- **Chroma:** Vector store for efficient similarity search  
- **Embedding Model:** Google’s text-embedding model  
- **ChatGoogleGenerativeAI:** Gemini LLM for response generation  
- **Cosine Similarity:** For evaluating query–response–context relevance  

🔗 **References**:  
- [LangChain Chunking Strategies](https://python.langchain.com/v0.1/docs/modules/data_connection/document_transformers/)  
- [LangChain Vectorstores](https://python.langchain.com/v0.1/docs/modules/data_connection/vectorstores/)  


In [5]:
# Step 1: Initialize the Gemini language model
llm = ChatGoogleGenerativeAI(
    model="gemini-2.0-flash",
    temperature=0.3  # Adjust temperature or other parameters as needed
)

# Step 2: Load documents from a web URL
url = "https://en.wikipedia.org/wiki/Artificial_intelligence"
loader = WebBaseLoader(url)
data = loader.load()

# Step 3: Split text into chunks
# (Experiment with chunk_size and chunk_overlap for optimal results)
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=50
)
chunks = text_splitter.split_documents(data)

# Add unique IDs to each text chunk
for idx, chunk in enumerate(chunks):
    chunk.metadata["id"] = idx

# Step 4: Get embedding model
# Options:
# 1. text-embedding-004  (Use this for better free tier)
# 2. Stable: gemini-embedding-001
# 3. Experimental: gemini-embedding-exp-03-07
gemini_embeddings = GoogleGenerativeAIEmbeddings(
    model="models/text-embedding-004"
)

# Step 5: Create vector store using embeddings
vectorstore = Chroma.from_documents(chunks, gemini_embeddings)

# Step 6: Define query
query = "What are the main applications of artificial intelligence in healthcare?"

# Step 7: Retrieve relevant documents
docs = vectorstore.similarity_search(query, k=5)
context = "\n\n".join([doc.page_content for doc in docs])
retrieval_method = "Basic similarity search"

# Step 8: Generate response
prompt = f"{context}\n\nQuestion: {query}\nAnswer:"
final_response = llm.invoke(prompt).content

# Step 9: Print results
print(f"Query: {query}")
print("=========================")
print(f"Final Answer: {final_response}")
print("=========================")
print(f"Retrieval Method: {retrieval_method}")

Query: What are the main applications of artificial intelligence in healthcare?
Final Answer: Based on the provided text, the main applications of artificial intelligence in healthcare are:

*   **Processing and integrating big data:** This is particularly important for organoid and tissue engineering development, which rely heavily on microscopy imaging.
*   **Overcoming discrepancies in funding:** AI may help to address imbalances in funding allocation across different research fields.
*   **Deepening the understanding of biomedically relevant pathways:** AI tools like AlphaFold 2 can provide new insights into these pathways.
*   **Increasing patient care and quality of life:** AI has the potential to improve healthcare outcomes.
*   **More accurately diagnosing and treating patients:** Medical professionals are ethically obligated to use AI if it leads to better diagnoses and treatments.
Retrieval Method: Basic similarity search


# Step 2: RAG Evaluation  

1. Generate embeddings for **query**, **response**, and **context**  
2. Measure **cosine similarity** between query–response and response–context  
3. Derive an **overall relevance score** as the average of these similarities  


In [6]:
from sklearn.metrics.pairwise import cosine_similarity

# Step 10: Define evaluation function
def evaluate_response(query, embeddings, response, context):
    """
    Evaluate the relevance of the model's response by comparing embeddings.

    - Computes embeddings for query, response, and context
    - Calculates cosine similarities
    - Returns an average relevance score
    """
    # Compute embeddings
    query_embedding = embeddings.embed_query(query)
    response_embedding = embeddings.embed_query(response)
    context_embedding = embeddings.embed_query(context)

    # Compute cosine similarities
    query_response_similarity = cosine_similarity(
        [query_embedding], [response_embedding]
    )[0][0]

    response_context_similarity = cosine_similarity(
        [response_embedding], [context_embedding]
    )[0][0]

    # Compute overall relevance score (average)
    relevance_score = (
        query_response_similarity + response_context_similarity
    ) / 2

    return {
        "query_response_similarity": query_response_similarity,
        "response_context_similarity": response_context_similarity,
        "relevance_score": relevance_score,
    }

# Step 11: Evaluate the response
evaluation = evaluate_response(query, gemini_embeddings, final_response, context)

# Step 12: Print evaluation results
print("\nEvaluation Results")
print("=========================")
print(f"Query-Response Similarity   : {evaluation['query_response_similarity']:.4f}")
print(f"Response-Context Similarity : {evaluation['response_context_similarity']:.4f}")
print(f"Overall Relevance Score     : {evaluation['relevance_score']:.4f}")


Evaluation Results
Query-Response Similarity   : 0.8698
Response-Context Similarity : 0.8791
Overall Relevance Score     : 0.8745


# Similarly, OpenAI and Hugging Face embedding models can be used as shown below

## B. OpenAI Embedding Model

### Provide OpenAI API Key.

If you want to use OpenAI Embedding. You can create OpenAI API key using following link

- [OpenAI API Key](https://platform.openai.com/settings/organization/api-keys)

In [None]:
!pip install -qU langchain-openai

In [None]:
os.environ["OPENAI_API_KEY"] = getpass.getpass()

### 💰 Paid: OpenAI Embedding Models  

**Models:** `text-embedding-3-small`, `text-embedding-3-large`, `ada v2`  

**Pros:** High-quality embeddings, multiple model sizes, seamless API integration, batch processing for cost efficiency, regularly updated, versatile across NLP tasks.  

**Cons:** Paid (costs can scale), closed-source, requires API key + internet, limited customization, data privacy considerations, subject to OpenAI policies.  

🔗 [Docs](https://platform.openai.com/docs/guides/embeddings/what-are-embeddings)  


In [None]:
from langchain_openai import OpenAIEmbeddings

openai_embeddings = OpenAIEmbeddings(model="text-embedding-3-large")

## B. Huggingface Embedding Model

### Provide Huggingface API Key.

If you want to use Huggingface Embedding Models. You can create Huggingface API key using following link

- [Huggingface API Key](https://huggingface.co/settings/tokens)




In [None]:
!pip install -qU Sentence-transformers \
                 langchain-huggingface

In [None]:
os.environ["HF_TOKEN"] = getpass.getpass()

### 🔓 Free: Hugging Face Open-Source Embeddings  

**Models:** gte-large-en-v1.5, bge-multilingual-gemma2, snowflake-arctic-embed-l, nomic-embed-text-v1.5, e5-mistral-7b-instruct, etc. → [MTEB Leaderboard](https://huggingface.co/spaces/mteb/leaderboard)  

**Pros:** Open-source, customizable, community-backed, Hugging Face integration, supports fine-tuning, broad NLP use cases.  
**Cons:** May underperform vs. commercial models, variable quality, limited support, high compute needs, community-depende


### Hugging Face: Nomic AI Embedding Model  

You can choose from various Hugging Face open-source embedding models depending on your use case, performance needs, and system constraints. Model rankings and benchmarks are available on the [MTEB Leaderboard](https://huggingface.co/spaces/mteb/leaderboard).  

**Popular Models:**  
1. `nomic-ai/nomic-embed-text-v1.5`  
2. `nomic-ai/nomic-embed-text-v1`  
3. `sentence-transformers/all-MiniLM-L12-v2`  
4. `sentence-transformers/all-MiniLM-L6-v2`  


In [None]:
from langchain_huggingface import HuggingFaceEmbeddings

# Change model_name as per your choosen huggingface embedding model
nomic_embeddings = HuggingFaceEmbeddings(model_name="nomic-ai/nomic-embed-text-v1.5", model_kwargs = {'trust_remote_code': True})