In [12]:
from dotenv import load_dotenv
load_dotenv()
import pprint

# 🧭 Vector Stores

Vector stores are specialized databases for **semantic search**. They index embeddings—numerical representations of your text—to help retrieve relevant content based on **meaning**, not just keywords.

---

## 🧠 Why Use Vector Stores?

- 📚 Search unstructured data like text, images, audio  
- 🔍 Find relevant content using **semantic similarity**  
- 🧩 Great for building intelligent search and RAG apps

![Vector Stores](assets/vectorstores-2540b4bc355b966c99b0f02cfdddb273.png "Vector store")

---

## 🔌 Integrations

LangChain supports many vectorstore backends. Easily swap providers using a **standard interface**.

🔗 [See full list of integrations](https://python.langchain.com/docs/integrations/vectorstores/)

---

## 🛠️ Core Interface Methods

- `add_documents`: Add data to the vector store  
- `delete`: Remove documents  
- `similarity_search`: Retrieve documents similar to a query

🛠️ Initialization
```python
from langchain_core.vectorstores import InMemoryVectorStore
# Initialize with an embedding model
vector_store = InMemoryVectorStore(embedding=SomeEmbeddingModel())
```


🛠️ Add Documents:
```python
from langchain_core.documents import Document

document_1 = Document(
    page_content="I had chocalate chip pancakes and scrambled eggs for breakfast this morning.",
    metadata={"source": "tweet"},
)

document_2 = Document(
    page_content="The weather forecast for tomorrow is cloudy and overcast, with a high of 62 degrees.",
    metadata={"source": "news"},
)

documents = [document_1, document_2]

vector_store.add_documents(documents=documents)
```


🛠️ Delete Documents:
```python
vector_store.delete(ids=["doc1"])
```


🛠️ Search:
```python
```

---

## 📏 How Similarity Works

Embeddings = points in space. Compare them using:

- 📐 Cosine Similarity  
- 📍 Euclidean Distance  
- 🎯 Dot Product

Choice depends on the vectorstore. Check its docs for supported metrics.

---

## 🧮 Under the Hood: Similarity Search

Search happens in two steps:

1. Embed the query  
2. Compare it with document embeddings to find matches

Efficient search algorithms like **HNSW** power this under the hood.

🛠️ Similarity Search:
```python
query = "my query"
docs = vectorstore.similarity_search(query)
```

Supports `k` (number of docs) and `filter` (metadata filtering) parameters.

---

## 🧾 Metadata Filtering

Filter results using metadata like source, date, or tags.

🛠️ Metadata Filtering:
```python
vectorstore.similarity_search(
    "LangChain provides abstractions to make working with LLMs easy",
    k=2,
    filter={"source": "tweet"},
)
```
🔎 Combines semantic + structured search for precision.

🔗 [See Pinecone metadata filtering docs](#)

---

## 🚀 Advanced Search Techniques

| **Technique**             | **Use When**                          | **What It Does**                                        |
|--------------------------|---------------------------------------|---------------------------------------------------------|
| Hybrid Search            | You want **keyword + semantic** match | Merges keyword and vector search for better recall      |
| Maximal Marginal Relevance (MMR) | You want **diverse results**         | Reduces redundancy by reranking search results          |

🔗 [How-to guide on hybrid search](https://python.langchain.com/docs/how_to/hybrid/)
