# **1️⃣ What Are Vector Databases?**
* A vector database is a specialized database designed to store and search vector embeddings efficiently. These embeddings are dense numerical representations of data, such as text, images, audio, or video, and are often generated by deep learning models.

* Unlike traditional relational databases that store structured data in rows and columns, vector databases store data in the form of vectors and use similarity search techniques (e.g., cosine similarity, Euclidean distance, dot product) to find relevant information.

**🔹 Key Characteristics of Vector Databases**

✅ Efficient similarity search → Finds the closest matches to a query vector.

✅ Scalability → Handles millions/billions of vectors efficiently.

✅ Fast retrieval → Optimized for Approximate Nearest Neighbor (ANN) search.

✅ Hybrid search support → Combines vector similarity + keyword-based search for better results.

✅ Multi-modal support → Works with text, images, audio, video, and multimodal embeddings.

# **2️⃣ How Do Vector Databases Work?**
**🔹 Step 1: Convert Data into Vectors**
Before storing data in a vector database, it must be converted into numerical representations using an embedding model.
Example:

* Text embeddings → OpenAI’s text-embedding-ada-002, Sentence-BERT, BERT
* Image embeddings → CLIP, ResNet
* Audio embeddings → Wav2Vec
* Multimodal embeddings → BLIP, CLIP

In [1]:
!pip install langchain-community faiss-cpu tiktoken

Collecting langchain-community
  Downloading langchain_community-0.3.16-py3-none-any.whl.metadata (2.9 kB)
Collecting faiss-cpu
  Downloading faiss_cpu-1.9.0.post1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.4 kB)
Collecting tiktoken
  Downloading tiktoken-0.8.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.6 kB)
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain-community)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting httpx-sse<0.5.0,>=0.4.0 (from langchain-community)
  Downloading httpx_sse-0.4.0-py3-none-any.whl.metadata (9.0 kB)
Collecting langchain<0.4.0,>=0.3.16 (from langchain-community)
  Downloading langchain-0.3.17-py3-none-any.whl.metadata (7.1 kB)
Collecting langchain-core<0.4.0,>=0.3.32 (from langchain-community)
  Downloading langchain_core-0.3.33-py3-none-any.whl.metadata (6.3 kB)
Collecting pydantic-settings<3.0.0,>=2.4.0 (from langchain-community)
  Downloading pydantic_setting

In [2]:
import openai
from google.colab import userdata
import os


openai_api= userdata.get("OPENAI_API_KEY")

In [3]:
from langchain.embeddings.openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings(model="text-embedding-3-small",openai_api_key=openai_api)


  embeddings = OpenAIEmbeddings(model="text-embedding-3-small",openai_api_key=openai_api)


**🔹 Step 2: Store Vectors in a Vector Database**

In [5]:
from langchain.vectorstores import FAISS
from langchain.schema import Document

# Sample documents
docs = [
    Document(page_content="Vector databases store and search embeddings."),
    Document(page_content="FAISS is a fast similarity search library."),
    Document(page_content="Pinecone is a cloud-native vector database.")
]

# Store in FAISS
vector_store = FAISS.from_documents(docs, embeddings)


**Step 3: Retrieve Similar Data Using Similarity Search**

In [6]:
query = "What is a vector database?"
retrieved_docs = vector_store.similarity_search(query, k=2)

for doc in retrieved_docs:
    print(doc.page_content)


Vector databases store and search embeddings.
Pinecone is a cloud-native vector database.


# **3️⃣Applications of Vector Databases in GenAI**
## **🔹 1. Retrieval-Augmented Generation (RAG)**
* Enhances LLMs by fetching relevant external knowledge before generating responses.
* Used in chatbots, customer support, AI assistants.

## **🔹 2. Semantic Search & Question Answering**
* Finds meaning-based matches instead of just keyword matches.
* Example: Searching "best laptop for gaming" retrieves relevant reviews even if the exact phrase is not in the dataset.

## **🔹 3. Personalized Recommendation Systems**

* Used in e-commerce, content streaming (Netflix, Spotify), and news feeds.
* Matches user preferences to recommended products, movies, or articles.

## **🔹 4. Image & Video Search**
* Search for similar images or videos using embeddings.
* Example: Google Reverse Image Search.

## **🔹 5. Fraud Detection & Anomaly Detection**
* Identifies unusual patterns in financial transactions, cybersecurity, and healthcare.

## **🔹 6. Multimodal AI**
* Enables cross-modal searches (e.g., input an image and retrieve text descriptions).