# Databricks Vector Search Index Embeddings

This guide demonstrates how to set up and manage a Vector Search Index in Databricks using the VectorSearchClient. The objective is to effectively store, index, and search for vector embeddings within a Delta Table, serving as our context retriever for the RAG application.

## Get ready to put your ML Engineering Hat! 🧢


By following these steps, you have: 
- ✅ Created a Vector Search Endpoint
- ✅ Defined a Delta Table Index for embeddings

This setup allows efficient semantic search on vectorized data in Databricks, improving performance in RAG (Retrieval-Augmented Generation) or other AI-driven search applications.

In [0]:
%pip install -q databricks-vectorsearch
dbutils.library.restartPython()

In [0]:
import databricks
from databricks.vector_search.client import VectorSearchClient

# 1. Create the Vector Search Engine
---

The following initializes a Vector Search Endpoint in Databricks, which is a crucial component for Retrieval-Augmented Generation (RAG) systems. RAG enhances large language models (LLMs) by retrieving relevant external knowledge before generating responses, significantly improving accuracy and relevance.

**What is a Vector Search Engine and How Does it Work?**

A Vector Search Engine is a specialized search system that finds the most relevant results based on similarity between high-dimensional vector embeddings instead of traditional keyword matching. It is essential for applications like semantic search, recommendation systems, image retrieval, and Retrieval-Augmented Generation (RAG).

​Databricks Vector Search utilizes the Hierarchical Navigable Small World (HNSW) algorithm for approximate nearest neighbor searches, employing the L2 distance metric (Euclidean distance) to measure embedding vector similarity. Read more here: https://learn.microsoft.com/en-us/azure/databricks/generative-ai/vector-search

In [0]:
# The following line automatically generates a PAT Token for authentication
client = VectorSearchClient()

# The following line uses the service principal token for authentication
try:
    client.create_endpoint_and_wait(
        name="silveraiwolf_vector_search",
        endpoint_type="STANDARD"
    )
except:
    print(f"Endpoint already exists")

# 2. Creating the Vector Search Index
---

​In Retrieval-Augmented Generation (RAG) workflows, creating a Delta Sync Index facilitates efficient and up-to-date retrieval of relevant information from a knowledge base. This index automatically synchronizes with the source Delta Table, incrementally updating as the underlying data changes. This ensures that the vector search index remains current, enabling accurate and efficient retrieval of information for RAG applications.

These embeddings are numerical representations of text, capturing semantic meaning in a high-dimensional vector space.  

When creating an index for a **vector search** system, each row of text in the database is converted into an **embedding**, which is typically a fixed-length array of floating-point numbers. This allows for quick similarity searches, where the system can find relevant documents based on meaning rather than exact word matches.  

For example:

- "Hello" → [0.12, 0.45, 0.78, ..., 0.99] (768 dimensions, for example)
- "Hello, how are you?" → [0.34, 0.87, 0.65, ..., 0.21] (same 768 dimensions)
- "Today is a wonderful day to learn about embeddings!" → [0.56, 0.32, 0.89, ..., 0.44] (again, 768 dimensions)

A **sentence embedding** represents an entire piece of text (e.g., a sentence, paragraph, or document) as a single vector, rather than embedding each word separately. Importantly, the size of the embedding is **fixed** by the model, meaning that regardless of how many words are in the sentence, the resulting vector always has the same number of dimensions (e.g., `768` for BERT-based models, `1024` for GPT-3).  

This approach allows for efficient retrieval, semantic similarity comparison, and ranking in RAG applications, making it a fundamental technique for improving search and retrieval in AI-driven systems.

In [0]:
index = client.create_delta_sync_index_and_wait(
  endpoint_name="silveraiwolf_vector_search",
  source_table_name="llm.rag.knowledge_base",
  index_name="llm.rag.knowledge_base_idx",
  pipeline_type="TRIGGERED",
  primary_key="id",
  embedding_source_column="content",
  embedding_model_endpoint_name="databricks-gte-large-en"
)

# END OF NOTEBOOK