# 🔗 **LangChain Models**

---

![author](https://img.shields.io/badge/author-mohd--faizy-red)

---

## **⭐01 - LLMs**

### 1. ChatGPT

```python
from langchain_openai import OpenAI     # Import OpenAI LLM wrapper
from dotenv import load_dotenv          # Load environment variables

load_dotenv()                           # Load variables from .env (e.g., API key)

llm = OpenAI(model='gpt-3.5-turbo-instruct')         # Initialize LLM with instruct-style model
result = llm.invoke("What is the capital of India")  # Run prompt through model

print(result)
```

### [2. Google Gemini](https://aistudio.google.com/apikey)

In [None]:
# Code using Google Gemnai
from langchain_google_genai import ChatGoogleGenerativeAI
from dotenv import load_dotenv

load_dotenv()

model = ChatGoogleGenerativeAI(
    model="gemini-1.5-flash",   
    max_output_tokens=50,              
    temperature=0.7
    )

result = model.invoke("Hi who are you")

print(result.content)

### [3. Groq](https://console.groq.com/playground?model=llama-3.1-8b-instant)

In [None]:
# 🚀 Code using Groq (Mixtral / LLaMA via Langchain)
from langchain_groq import ChatGroq
from dotenv import load_dotenv

load_dotenv()

model = ChatGroq(
    model="llama-3.1-8b-instant",  
    temperature=0.7
)

result = model.invoke("Tell me a fun fact about black holes.")

print(result.content)

### 4. **⭕llama**

- open `ollama`locally
- open `cmd` -> ollama list -> choose model

In [None]:
from langchain_ollama import ChatOllama 

llm = ChatOllama(model="deepseek-r1:1.5b")
response = llm.invoke("What are black holes?")

print(response.content)

## **⭐02 - ChatModels**

### 🔷 **a) OpenAI**

🧪 **Example Usage:**

```python
llm = ChatOpenAI(
    model="gpt-4o",
    temperature=0.7,
    max_tokens=300,
    streaming=True,
    api_key="your-api-key")


```

| Feature                | Type               | Description                                                                |
| ---------------------- | ------------------ | -------------------------------------------------------------------------- |
| `model` / `model_name` | `str`              | Name of the OpenAI model (e.g., `"gpt-3.5-turbo"`, `"gpt-4"`, `"gpt-4o"`). |
| `temperature`          | `float`            | Controls randomness in output (0 = deterministic, 2 = creative).           |
| `max_tokens`           | `int`              | Limits the length of the generated response.                               |
| `top_p`                | `float`            | Controls diversity via nucleus sampling (used with temperature).           |
| `frequency_penalty`    | `float`            | Penalizes frequent tokens (to reduce repetition).                          |
| `presence_penalty`     | `float`            | Encourages discussing new topics.                                          |
| `n`                    | `int`              | Number of completions to generate (default: 1).                            |
| `stop`                 | `str` or `list`    | Stop generation when these tokens are found.                               |
| `streaming`            | `bool`             | If `True`, enables streaming responses.                                    |
| `api_key`              | `str`              | OpenAI API key.                                                            |
| `organization`         | `str`              | OpenAI organization ID (optional).                                         |
| `timeout`              | `float`            | Timeout for the request in seconds.                                        |
| `max_retries`          | `int`              | Number of retry attempts if request fails.                                 |
| `request_timeout`      | `float` or `tuple` | Timeout for requests (alias for `timeout`).                                |
| `http_client`          | `httpx.Client`     | Use a custom HTTP client (advanced).                                       |
| `callbacks`            | `list`             | LangChain callback functions for observability/logging.                    |
| `verbose`              | `bool`             | Print logs for debugging.                                                  |
| `cache`                | `bool`             | Enables caching of responses.                                              |


```python
from langchain_openai import ChatOpenAI
from dotenv import load_dotenv

load_dotenv()                            # Load variables from .env (e.g., OPENAI_API_KEY)

# Initialize ChatOpenAI with custom settings
model = ChatOpenAI(
    model='gpt-4',                       # Use GPT-4 model
    temperature=1.5,                     # High creativity in output
    max_completion_tokens=10             # Limit response length (approx.)
)

# Send a prompt and get the response
result = model.invoke("Write a 5 line poem on cricket")

print(result.content)
```

### 🔷 **b) Anthropic**

```python
from langchain_anthropic import ChatAnthropic
from dotenv import load_dotenv

load_dotenv()

model = ChatAnthropic(model='claude-3-5-sonnet-20241022')

result = model.invoke('What is the capital of India')

print(result.content)
```

### 🔷 **c) Google**

In [None]:
from langchain_google_genai import ChatGoogleGenerativeAI
from dotenv import load_dotenv

load_dotenv()

model = ChatGoogleGenerativeAI(model='gemini-1.5-flash')

result = model.invoke('What is the capital of India')

print(result.content)

### 🔷 **d) Huggingface Using API**




- **Why `HuggingFaceEndpoint`?**
    - This allows you to use models hosted on Hugging Face’s cloud (Inference API).
    - You don't need to download the model or run it locally.
    - Hugging Face handles the heavy compute (GPU/TPU), scaling, and updates.
    - Recommended when your hardware is limited or when you need quick access to large models (e.g., 7B+).

```python
from langchain_huggingface import HuggingFaceEndpoint
from langchain.prompts import PromptTemplate
from dotenv import load_dotenv
import os

load_dotenv()
hf_token = os.getenv("HUGGINGFACEHUB_API_TOKEN")

# Use a proper text-generation model instead of conversational model
llm = HuggingFaceEndpoint(
    repo_id="google/flan-t5-large",  # Text-to-text generation model
    huggingfacehub_api_token=hf_token,
)

prompt = PromptTemplate.from_template("Answer: {question}")
chain = prompt | llm

result = chain.invoke({"question": "What is the capital of France?"})
print(result)
```

In [None]:
from langchain_huggingface import ChatHuggingFace, HuggingFaceEndpoint
from dotenv import load_dotenv

load_dotenv()

llm = HuggingFaceEndpoint(
    repo_id="TinyLlama/TinyLlama-1.1B-Chat-v1.0",
    task="text-generation"
)

model = ChatHuggingFace(llm=llm)

result = model.invoke("What is the capital of France?")

print(result.content)

### 🔷 **e) Huggingface Local**

-  **Why `HuggingFacePipeline`?**
    - This runs the model locally using your system's CPU or GPU.
    - For this we must have enough resources to load and run the model (RAM, VRAM, etc.).
    - Recommended when:
        - You want offline access.
        - You're working with smaller models (e.g., TinyLlama 1B) that fit on your hardware.
        - You want full control over the inference process.

In [None]:
from langchain_huggingface import ChatHuggingFace, HuggingFacePipeline
import os

os.environ['HF_HOME'] = 'D:/huggingface_cache'

llm = HuggingFacePipeline.from_model_id(
    model_id='TinyLlama/TinyLlama-1.1B-Chat-v1.0',
    task='text-generation',
    pipeline_kwargs=dict(
        temperature=0.5,
        max_new_tokens=100
    )
)
model = ChatHuggingFace(llm=llm)

result = model.invoke("What is the capital of France?")

print(result.content)

### **Disadvantages of running models locally**



| **Disadvantage**                 | **Details**                                                                                                   |
| -------------------------------- | ------------------------------------------------------------------------------------------------------------- |
| **High Hardware Requirements**   | Running large models (e.g., LLaMA-2 70B) requires **expensive GPUs**.                                         |
| **Setup Complexity**             | You need to install and configure many dependencies like **PyTorch**, **CUDA**, and **transformers**.         |
| **Lack of RLHF**                 | Many open-source models aren’t **fine-tuned with human feedback (RLHF)**, so they struggle with instructions. |
| **Limited Multimodal Abilities** | Open models generally don’t support **images, audio, or video**, unlike models like **GPT-4**.               |


In contrast, using the **Hugging Face Inference API** offloads the hardware and complexity to the cloud, though it may come with usage costs.



## **⭐03 - Embedding Models**

### 🔍 What is **Semantic Search**?

**Semantic Search** is a search technique that understands the **meaning** (semantics) of a query, rather than just matching exact **keywords**.

---

### 🧠 Traditional Search vs Semantic Search

| Type               | How it works                    | Example                                                                     |
| ------------------ | ------------------------------- | --------------------------------------------------------------------------- |
| 🔤 Keyword Search  | Matches exact words             | Searching “apple” finds documents containing the word "apple" only          |
| 🧠 Semantic Search | Understands meaning and context | Searching “fruit company in California” may return results about Apple Inc. |

---

### 📦 How It Works (Simplified):

1. **Embed** all documents using a model like `text-embedding-3-small`.
2. **Embed** the user’s query using the same model.
3. **Compare** the query and document embeddings using **cosine similarity** (how "close" their meanings are).
4. **Return** the most similar results, even if they don't contain the exact words.

---

### 🧪 Example

```python
query = "Capital city of France"
documents = ["Paris is beautiful", "Berlin is in Germany", "Rome is ancient"]

# Semantic Search will likely return: "Paris is beautiful"
# Even though it doesn't contain the words "capital" or "France"
```

---

### 📍 Use Cases

* 🔎 Search engines (Google, YouTube, etc.)
* 📚 Question-answering systems
* 🤖 RAG (Retrieval-Augmented Generation)
* 🛍️ E-commerce product search
* 📄 Document similarity and clustering

---





### 🧠 **OpenAI Embedding Models**

| Model Name               | Dimensions | Released | Notes                                                                        |
| ------------------------ | ---------- | -------- | ---------------------------------------------------------------------------- |
| `text-embedding-3-small` | 1536       | Jan 2024 | Smaller, cheaper, fast — good for general tasks                              |
| `text-embedding-3-large` | 3072       | Jan 2024 | More accurate, better quality — recommended for high-quality semantic search |
| `text-embedding-ada-002` | 1536       | 2022     | Legacy model — widely used, still effective                                  |

---

### 📏 **What are Dimensions?**

* **Dimensions** refer to the **length of the embedding vector** (e.g., `[0.012, -0.034, ..., 0.921]` of size 1536).
* Higher dimensions → more expressive power, better semantic precision (but larger in size).
* You can **reduce dimensions** using the `dimensions` parameter (e.g., `dimensions=256`) for efficiency, but it may reduce quality.

---

### ✅ **Examples in LangChain**

```python
# Use the latest small model (default dimensions = 1536)
OpenAIEmbeddings(model="text-embedding-3-small")

# Use large model with reduced dimensions (e.g., 256 instead of 3072)
OpenAIEmbeddings(model="text-embedding-3-large", dimensions=256)

# Legacy model (1536 dimensions)
OpenAIEmbeddings(model="text-embedding-ada-002")
```

---

### 💡 Recommendation

* Use `text-embedding-3-small` for cost-effective tasks.
* Use `text-embedding-3-large` for best accuracy (e.g., RAG, vector search).
* Avoid `ada-002` unless you're maintaining older systems.




### **🟢`GoogleGenerativeAIEmbeddings`**

In [5]:
from langchain_google_genai import GoogleGenerativeAIEmbeddings
from dotenv import load_dotenv
import os

load_dotenv()
api_key = os.getenv("GOOGLE_API_KEY")

# Specify the output dimensionality (e.g., 32)
embedding = GoogleGenerativeAIEmbeddings(
    model="models/embedding-001",
    google_api_key=api_key,
    output_dimensionality=32
)

result = embedding.embed_query("Delhi is the capital of India")
print(result)
print(f"Embedding length: {len(result)}")

[0.039407067000865936, -0.018922094255685806, -0.0009071125532500446, -0.01786704733967781, 0.036745503544807434, 0.015680961310863495, 0.01019361987709999, -0.02330910786986351, 0.034217964857816696, 0.03701641410589218, -0.022665681317448616, 0.018859250470995903, -0.05702941492199898, 0.012907612137496471, 0.014460166916251183, 0.017844563350081444, 0.035518236458301544, 0.01950516179203987, -0.0006382864667102695, 0.00936785526573658, -0.029074648395180702, -0.0026689369697123766, -0.0201535914093256, 0.009515677578747272, 0.004564085975289345, -0.012744205072522163, -0.005489777773618698, -0.06872699409723282, -0.03392169252038002, 0.052660927176475525, -0.061396170407533646, 0.02195260301232338, -0.06900513172149658, 0.004953321535140276, 0.04128174111247063, -0.022239556536078453, -0.02898537367582321, 0.03567483648657799, 0.004350830335170031, 0.0425601489841938, -0.05218883976340294, -0.025903677567839622, -0.05331810191273689, -0.0338418073952198, 0.02870023436844349, -0.0044

In [None]:
from langchain_google_genai import GoogleGenerativeAIEmbeddings

from dotenv import load_dotenv                
load_dotenv()    

api_key = os.getenv("GOOGLE_API_KEY")

# Create the Gemini embedding model
embedding = GoogleGenerativeAIEmbeddings(
    model="models/embedding-001",
    google_api_key=api_key
)

# List of documents
documents = [
    "Delhi is the capital of India",
    "Kolkata is the capital of West Bengal",
    "Paris is the capital of France"
]

# Generate embeddings for documents
vector = embedding.embed_documents(documents)
print(vector)

# Output the vectors
for i, vec in enumerate(vector):
    print(f"Embedding for doc {i+1}: {vec[:5]}... ({len(vec)} dimensions)")

[[0.04069492220878601, -0.011983578093349934, -0.0547618567943573, -0.011952034197747707, 0.05519137531518936, 0.021976325660943985, 0.03871120512485504, -0.03310953453183174, -0.006099534686654806, 0.0716414824128151, -0.033610038459300995, 0.01601981930434704, -0.041998524218797684, 0.03233834728598595, -0.006570169236510992, -0.0028739452827721834, 0.0336742177605629, 0.025740820914506912, 0.029920311644673347, 0.008333345875144005, 0.01218071486800909, 0.0015350879402831197, 0.002048530150204897, 0.019412696361541748, 0.023731239140033722, -0.01696494221687317, -0.003499244339764118, -0.04713676869869232, -0.03277035802602768, 0.012926013208925724, -0.04666461423039436, 0.02837965451180935, -0.04963891953229904, 0.002817858476191759, 0.0336885079741478, -0.03885143622756004, -0.03768584504723549, 0.027926117181777954, -0.014777661301195621, 0.0335109680891037, -0.012286227196455002, -0.03798437863588333, -0.06953994184732437, 0.006675881799310446, 0.0011138024274259806, 0.014188318

### **🟢`OpenAIEmbeddings`**

🔷 **a) Embedding Openai Query**

```python
from langchain_openai import OpenAIEmbeddings  # Import OpenAI embedding model wrapper
from dotenv import load_dotenv                 # Load environment variables

load_dotenv()                                  # Load variables from .env (e.g., API key)

# Initialize embedding model with reduced dimensions (default is 3072 for 'text-embedding-3-large')
embedding = OpenAIEmbeddings(model='text-embedding-3-large', dimensions=32)

# Generate vector embedding for the given text
result = embedding.embed_query("Delhi is the capital of India")

# Print the embedding vector as a string
print(str(result))
```

```
[-0.0231, 0.0456, ..., 0.0123]  # (total: 32 values)

```

🔷 **b) Embedding OpenAIi Docs**

```python
from langchain_openai import OpenAIEmbeddings
from dotenv import load_dotenv

load_dotenv()                                  )

# Initialize embedding model with reduced dimensions
embedding = OpenAIEmbeddings(model='text-embedding-3-large', dimensions=32)

# List of input texts to embed
documents = [
    "Delhi is the capital of India",
    "Kolkata is the capital of West Bengal",
    "Paris is the capital of France"
]

# Generate embeddings for the list of documents
result = embedding.embed_documents(documents)

print(str(result))
```

```
[
  [0.0123, -0.0456, ..., 0.0031],   # Embedding for "Delhi is the capital of India"
  [0.0098, -0.0382, ..., 0.0147],   # Embedding for "Kolkata is the capital of West Bengal"
  [0.0235, -0.0512, ..., 0.0194]    # Embedding for "Paris is the capital of France"
]

```

### **🟢`HuggingFaceEmbeddings`**

**✅ Popular Pretrained Sentence Embedding Models**

Here are **commonly used and recommended** models for sentence embeddings via `HuggingFaceEmbeddings`:

| Model Name                                      | Description                       | Embedding Size |
| ----------------------------------------------- | --------------------------------- | -------------- |
| `sentence-transformers/all-MiniLM-L6-v2`        | Fast, small, well-balanced        | 384            |
| `sentence-transformers/all-MiniLM-L12-v2`       | Slightly larger version of MiniLM | 384            |
| `sentence-transformers/paraphrase-MiniLM-L6-v2` | Good for semantic similarity      | 384            |
| `sentence-transformers/all-mpnet-base-v2`       | Higher performance, more accurate | 768            |
| `intfloat/e5-small-v2`                          | Trained on diverse tasks, small   | 384            |
| `intfloat/e5-base-v2`                           | Strong general-purpose embeddings | 768            |
| `BAAI/bge-small-en-v1.5`                        | Popular in RAG apps, small        | 384            |
| `BAAI/bge-base-en-v1.5`                         | Better quality embeddings         | 768            |
| `BAAI/bge-large-en-v1.5`                        | High-performance embeddings       | 1024           |

---


**⚠️ Requirements**

Make sure you have the required packages:

```bash
pip install langchain-huggingface sentence-transformers
```

Or for newer models like `bge-*` or `e5-*`, also ensure:

```bash
pip install transformers accelerate
```


In [4]:
from langchain_huggingface import HuggingFaceEmbeddings

embedding = HuggingFaceEmbeddings(model_name='sentence-transformers/all-MiniLM-L6-v2')

documents = [
    "Delhi is the capital of India",
    "Kolkata is the capital of West Bengal",
    "Paris is the capital of France"
]

vector = embedding.embed_documents(documents)

print(str(vector))

# Output the vectors
for i, vec in enumerate(vector):
    print(f"Embedding for doc {i+1}: {vec[:5]}... ({len(vec)} dimensions)")

[[0.0435495562851429, 0.023877227678894997, -0.04524127393960953, 0.035405032336711884, -0.01665102317929268, -0.06554821133613586, 0.07626008242368698, 0.009940390475094318, -0.001963208895176649, -0.02702266164124012, 0.007385575212538242, -0.12068241089582443, 0.06404845416545868, -0.06795034557580948, 0.036388859152793884, -0.0780777782201767, 0.03318416699767113, 0.0817556157708168, 0.07336154580116272, -0.07802222669124603, -0.02092117816209793, 0.03573280945420265, -0.008563278242945671, -0.03745511174201965, 0.000438876508269459, 0.05346425622701645, 0.005293623544275761, -0.01687041111290455, -0.0004130793095100671, 0.0010302149457857013, 0.06669678539037704, 0.00422316649928689, -0.022522607818245888, -0.002101558493450284, -0.05594784766435623, 0.01686999946832657, -0.12951606512069702, 0.06496336311101913, 0.17288090288639069, -0.11778351664543152, 0.03644103184342384, -0.0006774961948394775, 0.07786676287651062, -0.028167521581053734, 0.036555346101522446, -0.0236988086253

## 📟**Document Similarity**

### ⭕`OpenAIEmbeddings`⭕

```python
from langchain_openai import OpenAIEmbeddings
from dotenv import load_dotenv
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np

load_dotenv()

embedding = OpenAIEmbeddings(model='text-embedding-3-large', dimensions=300)

documents = [
    "Virat Kohli is an Indian cricketer known for his aggressive batting and leadership.",
    "MS Dhoni is a former Indian captain famous for his calm demeanor and finishing skills.",
    "Sachin Tendulkar, also known as the 'God of Cricket', holds many batting records.",
    "Rohit Sharma is known for his elegant batting and record-breaking double centuries.",
    "Jasprit Bumrah is an Indian fast bowler known for his unorthodox action and yorkers."
]

query = 'tell me about bumrah'

doc_embeddings = embedding.embed_documents(documents)
query_embedding = embedding.embed_query(query)

scores = cosine_similarity([query_embedding], doc_embeddings)[0]

index, score = sorted(list(enumerate(scores)),key=lambda x:x[1])[-1]

print(query)
print(documents[index])
print("similarity score is:", score)
```

### ⭕`HuggingFaceEmbeddings`⭕

In [1]:
from langchain_huggingface import HuggingFaceEmbeddings
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np

# Initialize embedding model
embedding = HuggingFaceEmbeddings(model_name='sentence-transformers/all-MiniLM-L6-v2')

# List of documents
documents = [
    "Virat Kohli is an Indian cricketer known for his aggressive batting and leadership.",
    "Sachin Tendulkar, also known as the 'God of Cricket', holds many batting records.",
    "MS Dhoni is a former Indian captain famous for his calm demeanor and finishing skills.",
    "Rohit Sharma is known for his elegant batting and record-breaking double centuries.",
    "Jasprit Bumrah is an Indian fast bowler known for his unorthodox action and yorkers."
]


# query = 'Tell me about bumrah?'
# query = 'Tell me about Dhoni?'
query = 'Tell me about former Indian captain'

doc_embeddings = embedding.embed_documents(documents)
query_embedding = embedding.embed_query(query)

# For cosine similarity both the valuse are need to be 2D lists
# we convert the query Embedding to 2D list
scores = cosine_similarity([query_embedding], doc_embeddings)[0] # Output-> 2D list but we need simple vetor...
                                                                 # hence `[0]` ->>> shape (1, num_docs) => [0] => # shape (num_docs,)


index, score = sorted(list(enumerate(scores)), key=lambda x:x[1])[-1] # Sorts list by the second item in each tuple, i.e., the score:

print(query)
print(scores)
print(documents[index])
print(index, score)
print(sorted(list(enumerate(scores)),key=lambda x:x[1]))
print("similarity score is:", score)

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Tell me about former Indian captain
[0.48681862 0.50402896 0.6239113  0.5084582  0.4019441 ]
MS Dhoni is a former Indian captain famous for his calm demeanor and finishing skills.
2 0.6239113024142589
[(4, np.float64(0.40194410439811434)), (0, np.float64(0.4868186179600195)), (1, np.float64(0.5040289616215774)), (3, np.float64(0.5084582037805135)), (2, np.float64(0.6239113024142589))]
similarity score is: 0.6239113024142589


### ⭕`GoogleGenerativeAIEmbeddings`⭕

In [2]:
from langchain_google_genai import GoogleGenerativeAIEmbeddings
from sklearn.metrics.pairwise import cosine_similarity
from dotenv import load_dotenv
import numpy as np
import os

# Load your .env file to get GOOGLE_API_KEY
load_dotenv()
api_key = os.getenv("GOOGLE_API_KEY")

# Initialize the embedding model
embedding = GoogleGenerativeAIEmbeddings(
    model="models/embedding-001",
    google_api_key=api_key,
    output_dimensionality=768  # Optional, depending on your quota. Try 256 or 1024 if needed.
)

# List of documents
documents = [
    "Virat Kohli is an Indian cricketer known for his aggressive batting and leadership.",
    "Sachin Tendulkar, also known as the 'God of Cricket', holds many batting records.",
    "MS Dhoni is a former Indian captain famous for his calm demeanor and finishing skills.",
    "Rohit Sharma is known for his elegant batting and record-breaking double centuries.",
    "Jasprit Bumrah is an Indian fast bowler known for his unorthodox action and yorkers."
]

# Query
query = 'Tell me about former Indian captain'

# Generate embeddings
doc_embeddings = embedding.embed_documents(documents)
query_embedding = embedding.embed_query(query)

# Compute cosine similarity
scores = cosine_similarity([query_embedding], doc_embeddings)[0]  # shape: (num_docs,)

# Find the most similar document
index, score = sorted(list(enumerate(scores)), key=lambda x: x[1])[-1]

# Print results
print("Query:", query)
print("Cosine Similarity Scores:", scores)
print("Most Relevant Document:", documents[index])
print("Index:", index)
print("Similarity Score:", score)
print("Sorted Scores:", sorted(list(enumerate(scores)), key=lambda x: x[1]))

Query: Tell me about former Indian captain
Cosine Similarity Scores: [0.63876666 0.63108307 0.68281189 0.59895082 0.60484759]
Most Relevant Document: MS Dhoni is a former Indian captain famous for his calm demeanor and finishing skills.
Index: 2
Similarity Score: 0.6828118908852889
Sorted Scores: [(3, np.float64(0.5989508248226574)), (4, np.float64(0.6048475852066184)), (1, np.float64(0.6310830681778788)), (0, np.float64(0.638766662443817)), (2, np.float64(0.6828118908852889))]
