# Create Vector Search Index

This parameterized notebook creates and manages a Databricks Vector Search index.

## Workflow

1. **Configure** - Set parameters for your table and index
2. **Install** - Install databricks-vectorsearch package
3. **Verify** - Check endpoint and source table
4. **Create** - Create Delta Sync index with managed embeddings
5. **Monitor** - Wait for index to be ready
6. **Test** - Run sample queries
7. **Maintain** - Sync and status functions

## Parameters

This notebook uses widget parameters for easy customization. Run the Configuration cell to set up widgets.


## Step 1: Configuration (Parameterized)


In [None]:
# =============================================================================
# PARAMETERIZED CONFIGURATION
# =============================================================================
# Create widgets for notebook parameters (can be overridden when running as a job)

# Remove existing widgets if re-running
try:
    dbutils.widgets.removeAll()
except:
    pass

# Define parameters with defaults for customer reviews use case
dbutils.widgets.text("catalog", "juan_use1_catalog", "1. Catalog")
dbutils.widgets.text("schema", "retail", "2. Schema")
dbutils.widgets.text("source_table", "gold_customer_reviews", "3. Source Table")
dbutils.widgets.text("index_name", "gold_customer_reviews_idx", "4. Index Name")
dbutils.widgets.text("endpoint_name", "one-env-shared-endpoint-11", "5. VS Endpoint")
dbutils.widgets.text("primary_key", "review_id", "6. Primary Key Column")
dbutils.widgets.text("embedding_column", "review_text", "7. Embedding Column")
dbutils.widgets.dropdown("embedding_model", "databricks-gte-large-en", 
                         ["databricks-bge-large-en", "databricks-gte-large-en"], 
                         "8. Embedding Model")
dbutils.widgets.dropdown("sync_mode", "TRIGGERED", 
                         ["TRIGGERED", "CONTINUOUS"], 
                         "9. Sync Mode")
dbutils.widgets.text("filter_columns", "product_category,product_brand,customer_segment,rating", 
                     "10. Filter Columns (comma-separated)")

print("‚úì Widgets created - configure values above or use defaults")


In [None]:
# Load parameters from widgets
CATALOG = dbutils.widgets.get("catalog")
SCHEMA = dbutils.widgets.get("schema")
SOURCE_TABLE = dbutils.widgets.get("source_table")
INDEX_NAME = dbutils.widgets.get("index_name")
ENDPOINT_NAME = dbutils.widgets.get("endpoint_name")
PRIMARY_KEY = dbutils.widgets.get("primary_key")
EMBEDDING_COLUMN = dbutils.widgets.get("embedding_column")
EMBEDDING_MODEL = dbutils.widgets.get("embedding_model")
SYNC_MODE = dbutils.widgets.get("sync_mode")
FILTER_COLUMNS = [c.strip() for c in dbutils.widgets.get("filter_columns").split(",") if c.strip()]

# Construct full names
FULL_SOURCE_TABLE = f"{CATALOG}.{SCHEMA}.{SOURCE_TABLE}"
FULL_INDEX_NAME = f"{CATALOG}.{SCHEMA}.{INDEX_NAME}"

# Display configuration
print("=" * 70)
print("VECTOR SEARCH INDEX CONFIGURATION")
print("=" * 70)
print(f"""
  Source Table:     {FULL_SOURCE_TABLE}
  Index Name:       {FULL_INDEX_NAME}
  Endpoint:         {ENDPOINT_NAME}
  
  Primary Key:      {PRIMARY_KEY}
  Embedding Column: {EMBEDDING_COLUMN}
  Embedding Model:  {EMBEDDING_MODEL}
  Sync Mode:        {SYNC_MODE}
  
  Filter Columns:   {FILTER_COLUMNS}
""")
print("=" * 70)


## Step 2: Install Vector Search Package


In [None]:
%pip install databricks-vectorsearch --quiet
dbutils.library.restartPython()


In [None]:
# =============================================================================
# RE-INITIALIZE AFTER PYTHON RESTART
# =============================================================================
from databricks.vector_search.client import VectorSearchClient
import time

# Reload parameters after restart
CATALOG = dbutils.widgets.get("catalog")
SCHEMA = dbutils.widgets.get("schema")
SOURCE_TABLE = dbutils.widgets.get("source_table")
INDEX_NAME = dbutils.widgets.get("index_name")
ENDPOINT_NAME = dbutils.widgets.get("endpoint_name")
PRIMARY_KEY = dbutils.widgets.get("primary_key")
EMBEDDING_COLUMN = dbutils.widgets.get("embedding_column")
EMBEDDING_MODEL = dbutils.widgets.get("embedding_model")
SYNC_MODE = dbutils.widgets.get("sync_mode")
FILTER_COLUMNS = [c.strip() for c in dbutils.widgets.get("filter_columns").split(",") if c.strip()]

FULL_SOURCE_TABLE = f"{CATALOG}.{SCHEMA}.{SOURCE_TABLE}"
FULL_INDEX_NAME = f"{CATALOG}.{SCHEMA}.{INDEX_NAME}"

# Initialize Vector Search client
vsc = VectorSearchClient()
print("‚úì Vector Search Client initialized")
print(f"  Target Index: {FULL_INDEX_NAME}")


## Step 3: Verify Prerequisites


In [None]:
# =============================================================================
# VERIFY ENDPOINT
# =============================================================================
print("Checking Vector Search endpoint...")
try:
    endpoint = vsc.get_endpoint(ENDPOINT_NAME)
    state = endpoint.get('endpoint_status', {}).get('state', 'Unknown')
    print(f"‚úì Endpoint '{ENDPOINT_NAME}' exists")
    print(f"  State: {state}")
    
    if state != 'ONLINE':
        print(f"\n‚ö†Ô∏è  Warning: Endpoint is not ONLINE (current: {state})")
        print("   Wait for endpoint to come online before creating index.")
        ENDPOINT_READY = False
    else:
        ENDPOINT_READY = True
except Exception as e:
    print(f"‚úó Endpoint '{ENDPOINT_NAME}' not found")
    print(f"  Error: {e}")
    print("\nüìã To create endpoint, go to:")
    print("   Databricks UI ‚Üí Compute ‚Üí Vector Search ‚Üí Create Endpoint")
    ENDPOINT_READY = False


Enable Change Data Feed
Delta Sync Vector Search indexes require Change Data Feed (CDF) to be enabled on the source table. This allows the index to automatically track and sync changes.

In [None]:
# =============================================================================
# ENABLE CHANGE DATA FEED (Required for Delta Sync)
# =============================================================================
# Vector Search Delta Sync indexes require Change Data Feed (CDF) to be enabled
# on the source table. This allows the index to track and sync changes.

print(f"Checking Change Data Feed status for {FULL_SOURCE_TABLE}...")

try:
    # Get current table properties
    props_df = spark.sql(f"SHOW TBLPROPERTIES {FULL_SOURCE_TABLE}")
    props = {row['key']: row['value'] for row in props_df.collect()}
    
    cdf_enabled = props.get('delta.enableChangeDataFeed', 'false').lower() == 'true'
    
    if cdf_enabled:
        print(f"‚úì Change Data Feed is already enabled")
    else:
        print(f"  Change Data Feed is not enabled. Enabling now...")
        spark.sql(f"""
            ALTER TABLE {FULL_SOURCE_TABLE} 
            SET TBLPROPERTIES (delta.enableChangeDataFeed = true)
        """)
        print(f"‚úì Change Data Feed enabled successfully")
    
    CDF_READY = True
    
except Exception as e:
    print(f"‚úó Error checking/enabling Change Data Feed: {e}")
    print(f"\n  Manual fix:")
    print(f"  ALTER TABLE {FULL_SOURCE_TABLE} SET TBLPROPERTIES (delta.enableChangeDataFeed = true)")
    CDF_READY = False

In [None]:
# =============================================================================
# VERIFY SOURCE TABLE
# =============================================================================
print(f"Checking source table: {FULL_SOURCE_TABLE}...")

try:
    # Check table exists and get row count
    count_df = spark.sql(f"SELECT COUNT(*) as count FROM {FULL_SOURCE_TABLE}")
    row_count = count_df.collect()[0]['count']
    print(f"‚úì Table exists with {row_count:,} rows")
    
    if row_count == 0:
        print("‚ö†Ô∏è  Warning: Table is empty. Populate data before creating index.")
        TABLE_READY = False
    else:
        TABLE_READY = True
    
    # Verify required columns exist
    columns = [c.name for c in spark.table(FULL_SOURCE_TABLE).schema]
    
    missing_cols = []
    if PRIMARY_KEY not in columns:
        missing_cols.append(f"Primary key: {PRIMARY_KEY}")
    if EMBEDDING_COLUMN not in columns:
        missing_cols.append(f"Embedding column: {EMBEDDING_COLUMN}")
    for fc in FILTER_COLUMNS:
        if fc not in columns:
            missing_cols.append(f"Filter column: {fc}")
    
    if missing_cols:
        print(f"\n‚úó Missing columns:")
        for mc in missing_cols:
            print(f"    - {mc}")
        TABLE_READY = False
    else:
        print(f"‚úì All required columns present")
        print(f"  Primary key: {PRIMARY_KEY}")
        print(f"  Embedding column: {EMBEDDING_COLUMN}")
        print(f"  Filter columns: {FILTER_COLUMNS}")
        
except Exception as e:
    print(f"‚úó Table not found or error: {e}")
    TABLE_READY = False


## Step 4: Check for Existing Index


In [None]:
# =============================================================================
# CHECK IF INDEX ALREADY EXISTS
# =============================================================================
print(f"Checking for existing index: {FULL_INDEX_NAME}...")
print(f"  Endpoint: {ENDPOINT_NAME}")

INDEX_EXISTS = False

try:
    existing_index = vsc.get_index(
        endpoint_name=ENDPOINT_NAME,
        index_name=FULL_INDEX_NAME
    )
    status = existing_index.describe()
    state = status.get('status', {}).get('state', 'Unknown')
    
    print(f"\n‚ö†Ô∏è  Index '{FULL_INDEX_NAME}' already exists")
    print(f"   State: {state}")
    
    # Show index details
    if 'status' in status:
        status_info = status['status']
        if 'indexed_row_count' in status_info.get('index_details', {}):
            print(f"   Indexed Rows: {status_info['index_details']['indexed_row_count']:,}")
    
    INDEX_EXISTS = True
    print("\n   To recreate, uncomment and run the deletion cell below.")
    
except Exception as e:
    print(f"\n‚úì Index '{FULL_INDEX_NAME}' does not exist")
    print("   Ready to create new index.")


In [None]:
# =============================================================================
# (OPTIONAL) DELETE EXISTING INDEX
# =============================================================================
# Uncomment the code below to delete an existing index before recreating

# if INDEX_EXISTS:
#     print(f"Deleting existing index '{FULL_INDEX_NAME}'...")
#     try:
#         vsc.delete_index(FULL_INDEX_NAME)
#         print("‚úì Index deleted successfully")
#         INDEX_EXISTS = False
#         print("  You can now proceed to create a new index.")
#     except Exception as e:
#         print(f"‚úó Failed to delete index: {e}")
# else:
#     print("No existing index to delete.")

print("‚¨ÜÔ∏è Uncomment the code above to delete an existing index")


## Step 5: Create Delta Sync Index

This creates a vector search index with:
- **Delta Sync**: Automatically syncs when source table changes
- **Managed Embeddings**: Databricks computes embeddings using the specified model
- **Filter Columns**: Pre-filtering for efficient queries


In [None]:
# =============================================================================
# CREATE DELTA SYNC VECTOR SEARCH INDEX
# =============================================================================
if not INDEX_EXISTS:
    print("=" * 70)
    print("CREATING VECTOR SEARCH INDEX")
    print("=" * 70)
    print(f"""
  Endpoint:         {ENDPOINT_NAME}
  Source Table:     {FULL_SOURCE_TABLE}
  Index Name:       {FULL_INDEX_NAME}
  
  Primary Key:      {PRIMARY_KEY}
  Embedding Column: {EMBEDDING_COLUMN}
  Embedding Model:  {EMBEDDING_MODEL}
  Sync Mode:        {SYNC_MODE}
  Filter Columns:   {FILTER_COLUMNS}
""")
    
    try:
        # Build columns to sync (primary key + embedding + filters)
        columns_to_sync = [PRIMARY_KEY, EMBEDDING_COLUMN] + FILTER_COLUMNS
        # Remove duplicates while preserving order
        columns_to_sync = list(dict.fromkeys(columns_to_sync))
        
        print(f"Columns to sync: {columns_to_sync}")
        print("\nCreating index...")
        
        index = vsc.create_delta_sync_index(
            endpoint_name=ENDPOINT_NAME,
            source_table_name=FULL_SOURCE_TABLE,
            index_name=FULL_INDEX_NAME,
            pipeline_type=SYNC_MODE,
            primary_key=PRIMARY_KEY,
            embedding_source_column=EMBEDDING_COLUMN,
            embedding_model_endpoint_name=EMBEDDING_MODEL,
            columns_to_sync=columns_to_sync
        )
        
        print("\n" + "=" * 70)
        print("‚úì VECTOR SEARCH INDEX CREATION INITIATED")
        print("=" * 70)
        print("""
  Index creation and initial sync may take several minutes.
  
  Next steps:
  1. Run the monitoring cells below to check status
  2. Wait for state to become ONLINE
  3. For TRIGGERED mode, manually sync after data updates
""")
        INDEX_EXISTS = True
        
    except Exception as e:
        print(f"\n‚úó Failed to create index: {e}")
        print("\nTroubleshooting:")
        print("  - Ensure endpoint is ONLINE")
        print("  - Verify source table has data")
        print("  - Check Change Data Feed is enabled on source table:")
        print(f"    ALTER TABLE {FULL_SOURCE_TABLE} SET TBLPROPERTIES (delta.enableChangeDataFeed = true)")
        print("  - Verify you have permissions to create indexes")
else:
    print("‚ö†Ô∏è  Index already exists. Skipping creation.")
    print("   To recreate, delete the existing index first.")


## Step 6: Monitor Index Status


In [None]:
# =============================================================================
# INDEX STATUS FUNCTIONS
# =============================================================================
def check_index_status(index_name=None, endpoint_name=None):
    """Check and display current index status."""
    # Resolve names from parameters or globals/widgets
    if index_name is None:
        try:
            index_name = FULL_INDEX_NAME
        except NameError:
            cat = dbutils.widgets.get("catalog")
            sch = dbutils.widgets.get("schema")
            idx = dbutils.widgets.get("index_name")
            index_name = f"{cat}.{sch}.{idx}"
    
    if endpoint_name is None:
        try:
            endpoint_name = ENDPOINT_NAME
        except NameError:
            endpoint_name = dbutils.widgets.get("endpoint_name")
    
    try:
        index = vsc.get_index(endpoint_name=endpoint_name, index_name=index_name)
        status = index.describe()
        
        print("=" * 60)
        print(f"INDEX STATUS: {index_name}")
        print(f"ENDPOINT: {endpoint_name}")
        print("=" * 60)
        
        state = status.get('status', {}).get('state', 'Unknown')
        print(f"  State: {state}")
        
        status_info = status.get('status', {})
        if 'detailed_state' in status_info:
            print(f"  Detailed State: {status_info['detailed_state']}")
        if 'message' in status_info:
            print(f"  Message: {status_info['message']}")
        if 'index_details' in status_info:
            details = status_info['index_details']
            if 'indexed_row_count' in details:
                print(f"  Indexed Rows: {details['indexed_row_count']:,}")
            if 'pending_row_count' in details:
                print(f"  Pending Rows: {details['pending_row_count']:,}")
        
        print("=" * 60)
        return status
        
    except Exception as e:
        print(f"‚úó Error checking index status: {e}")
        return None


def wait_for_index_online(index_name=None, endpoint_name=None, max_wait_minutes=30, check_interval_seconds=30):
    """Wait for index to reach ONLINE state."""
    # Resolve names
    if index_name is None:
        try:
            index_name = FULL_INDEX_NAME
        except NameError:
            cat = dbutils.widgets.get("catalog")
            sch = dbutils.widgets.get("schema")
            idx = dbutils.widgets.get("index_name")
            index_name = f"{cat}.{sch}.{idx}"
    
    if endpoint_name is None:
        try:
            endpoint_name = ENDPOINT_NAME
        except NameError:
            endpoint_name = dbutils.widgets.get("endpoint_name")
    
    print(f"Waiting for index to be ONLINE (max {max_wait_minutes} minutes)...")
    print(f"  Index: {index_name}")
    print(f"  Endpoint: {endpoint_name}")
    
    start_time = time.time()
    max_wait_seconds = max_wait_minutes * 60
    
    while time.time() - start_time < max_wait_seconds:
        elapsed_min = int((time.time() - start_time) / 60)
        
        try:
            index = vsc.get_index(endpoint_name=endpoint_name, index_name=index_name)
            status = index.describe()
            state = status.get('status', {}).get('state', 'Unknown')
            
            if state == 'ONLINE':
                print(f"\n‚úì Index is ONLINE after {elapsed_min} minutes!")
                return True
            else:
                indexed = status.get('status', {}).get('index_details', {}).get('indexed_row_count', '?')
                print(f"  [{elapsed_min}m] State: {state} | Indexed: {indexed}")
                time.sleep(check_interval_seconds)
                
        except Exception as e:
            print(f"  [{elapsed_min}m] Error: {e}")
            time.sleep(check_interval_seconds)
    
    print(f"\n‚ö†Ô∏è  Timeout after {max_wait_minutes} minutes. Index may still be provisioning.")
    return False


print("‚úì Status functions defined:")
print("  - check_index_status(): Show current index state")
print("  - wait_for_index_online(): Wait for ONLINE state")


In [None]:
# Check current index status
check_index_status()


In [None]:
# Wait for index to be online (uncomment to use)
# wait_for_index_online(max_wait_minutes=15)


## Step 7: Trigger Sync (for TRIGGERED mode)


In [None]:
# =============================================================================
# SYNC FUNCTION
# =============================================================================
def sync_index(index_name=None, endpoint_name=None):
    """Trigger a manual sync for the index (TRIGGERED mode only)."""
    # Resolve names
    if index_name is None:
        try:
            index_name = FULL_INDEX_NAME
        except NameError:
            cat = dbutils.widgets.get("catalog")
            sch = dbutils.widgets.get("schema")
            idx = dbutils.widgets.get("index_name")
            index_name = f"{cat}.{sch}.{idx}"
    
    if endpoint_name is None:
        try:
            endpoint_name = ENDPOINT_NAME
        except NameError:
            endpoint_name = dbutils.widgets.get("endpoint_name")
    
    try:
        index = vsc.get_index(endpoint_name=endpoint_name, index_name=index_name)
        index.sync()
        print(f"‚úì Sync triggered for '{index_name}'")
        print(f"  Endpoint: {endpoint_name}")
        print("  Check status in a few moments to monitor progress.")
        return True
    except Exception as e:
        print(f"‚úó Failed to trigger sync: {e}")
        return False

print("‚úì Sync function defined:")
print("  - sync_index(): Trigger manual sync for TRIGGERED mode")


In [None]:
# Trigger sync (uncomment to use after data updates)
# sync_index()


## Step 8: Test Queries

Once the index is ONLINE, test with sample queries.


In [None]:
# =============================================================================
# TEST QUERY FUNCTION
# =============================================================================
def test_query(query_text, num_results=5, filters=None, index_name=None, endpoint_name=None):
    """
    Test the vector search index with a similarity query.
    
    Args:
        query_text: Natural language query
        num_results: Number of results to return
        filters: Filter dict for standard endpoints (e.g., {"product_category": "footwear"})
                 or filter string for storage-optimized endpoints
        index_name: Index to query (defaults to configured index)
        endpoint_name: Vector Search endpoint name
    
    Returns:
        Query results
    """
    # Resolve index name from parameter, global, or widget
    if index_name is None:
        try:
            index_name = FULL_INDEX_NAME
        except NameError:
            try:
                cat = dbutils.widgets.get("catalog")
                sch = dbutils.widgets.get("schema")
                idx = dbutils.widgets.get("index_name")
                index_name = f"{cat}.{sch}.{idx}"
            except:
                print("‚úó Index name not specified.")
                print("  Run configuration cells first or pass index_name parameter.")
                return None
    
    # Resolve endpoint name
    if endpoint_name is None:
        try:
            endpoint_name = ENDPOINT_NAME
        except NameError:
            try:
                endpoint_name = dbutils.widgets.get("endpoint_name")
            except:
                print("‚úó Endpoint name not specified.")
                print("  Run configuration cells first or pass endpoint_name parameter.")
                return None
    
    # Resolve column names
    try:
        pk, emb_col = PRIMARY_KEY, EMBEDDING_COLUMN
        filter_cols = FILTER_COLUMNS[:3]
    except NameError:
        try:
            pk = dbutils.widgets.get("primary_key")
            emb_col = dbutils.widgets.get("embedding_column")
            filter_cols = [c.strip() for c in dbutils.widgets.get("filter_columns").split(",")[:3]]
        except:
            pk, emb_col = "review_id", "review_text"
            filter_cols = ["product_category", "customer_segment", "rating"]
    
    display_cols = [pk, emb_col] + filter_cols
    
    try:
        index = vsc.get_index(endpoint_name=endpoint_name, index_name=index_name)
        
        query_params = {
            "query_text": query_text,
            "columns": display_cols,
            "num_results": num_results
        }
        if filters:
            query_params["filters"] = filters
        
        results = index.similarity_search(**query_params)
        
        print("=" * 70)
        print(f"QUERY: \"{query_text}\"")
        print(f"INDEX: {index_name}")
        if filters:
            print(f"FILTER: {filters}")
        print("=" * 70)
        
        data = results.get('result', {}).get('data_array', [])
        print(f"Found {len(data)} results\n")
        
        for i, row in enumerate(data, 1):
            print(f"Result {i}:")
            for j, col_name in enumerate(display_cols):
                if j < len(row):
                    value = row[j]
                    if col_name == emb_col and len(str(value)) > 100:
                        value = str(value)[:100] + "..."
                    print(f"  {col_name}: {value}")
            print()
        
        return results
        
    except Exception as e:
        print(f"‚úó Query failed: {e}")
        print(f"  Index: {index_name}")
        print(f"  Endpoint: {endpoint_name}")
        print("  Troubleshooting:")
        print("    1. Run check_index_status() to verify index is ONLINE")
        print("    2. Ensure VectorSearchClient is initialized (Step 2)")
        return None


print("‚úì Query function defined:")
print("  - test_query(query_text, num_results=5, filters=None)")


In [None]:
# =============================================================================
# SAMPLE TEST QUERIES (for customer reviews)
# =============================================================================
# Uncomment and run when index is ONLINE
# (test_query auto-loads config from widgets if needed)
#
# NOTE: Standard endpoints use DICT filters: {"column": "value"}
#       Storage-optimized endpoints use STRING filters: "column = 'value'"

# Basic similarity search
test_query("sizing runs small need to order larger size")

# Search with filter (dict format for standard endpoints)
# test_query("comfortable shoes great for walking", filters={"product_category": "footwear"})

# Search for quality issues
# test_query("poor quality fabric feels cheap", num_results=3)

# Search by customer segment
# test_query("excellent service highly recommend", filters={"customer_segment": "vip"})

# Search with multiple filter values (matches ANY of the values)
# test_query("great quality material", filters={"product_category": ["footwear", "outerwear"]})


## Step 9: SQL Query Examples

You can also query the index using the SQL `VECTOR_SEARCH` function.

**Note**: SQL uses string-style filters (different from Python SDK dict filters).


In [None]:
# =============================================================================
# SQL VECTOR SEARCH EXAMPLES
# =============================================================================
# These queries can be run once the index is ONLINE

sql_examples = f"""
-- Basic similarity search
SELECT * FROM VECTOR_SEARCH(
  index => '{FULL_INDEX_NAME}',
  query => 'sizing runs small need to order larger',
  num_results => 10
)

-- Search with filter
SELECT * FROM VECTOR_SEARCH(
  index => '{FULL_INDEX_NAME}',
  query => 'comfortable shoes excellent quality',
  num_results => 10,
  filters => 'product_category = "footwear"'
)

-- Join with source table for additional columns
SELECT vs.*, src.rating, src.review_date
FROM VECTOR_SEARCH(
  index => '{FULL_INDEX_NAME}',
  query => 'shipping was fast arrived early',
  num_results => 5
) AS vs
LEFT JOIN {FULL_SOURCE_TABLE} AS src
  ON vs.{PRIMARY_KEY} = src.{PRIMARY_KEY}
"""

print("SQL VECTOR_SEARCH Examples:")
print("=" * 70)
print(sql_examples)


In [None]:
# Run SQL vector search (uncomment when index is ONLINE)
# display(spark.sql(f"""
#     SELECT * FROM VECTOR_SEARCH(
#       index => '{FULL_INDEX_NAME}',
#       query => 'sizing runs small need to order larger',
#       num_results => 5
#     )
# """))


## Step 10: Summary & Next Steps


In [None]:
# =============================================================================
# SUMMARY
# =============================================================================
print("=" * 70)
print("VECTOR SEARCH INDEX SETUP SUMMARY")
print("=" * 70)

print(f"""
‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ  INDEX CONFIGURATION                                                ‚îÇ
‚îú‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î§
‚îÇ  Index Name:      {FULL_INDEX_NAME:<45} ‚îÇ
‚îÇ  Endpoint:        {ENDPOINT_NAME:<45} ‚îÇ
‚îÇ  Source Table:    {FULL_SOURCE_TABLE:<45} ‚îÇ
‚îÇ  Embedding Model: {EMBEDDING_MODEL:<45} ‚îÇ
‚îÇ  Sync Mode:       {SYNC_MODE:<45} ‚îÇ
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò

‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ  AVAILABLE FUNCTIONS                                                ‚îÇ
‚îú‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î§
‚îÇ  check_index_status()     - View current index state                ‚îÇ
‚îÇ  wait_for_index_online()  - Wait for ONLINE state                   ‚îÇ
‚îÇ  sync_index()             - Trigger manual sync (TRIGGERED mode)    ‚îÇ
‚îÇ  test_query(text)         - Run similarity search                   ‚îÇ
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò

‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ  NEXT STEPS                                                         ‚îÇ
‚îú‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î§
‚îÇ  1. Wait for index to reach ONLINE state                            ‚îÇ
‚îÇ  2. For TRIGGERED mode, run sync_index() after data updates         ‚îÇ
‚îÇ  3. Test queries using test_query() or SQL VECTOR_SEARCH            ‚îÇ
‚îÇ  4. Integrate with Knowledge Assistant Agent or application         ‚îÇ
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò

‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ  MAINTENANCE                                                        ‚îÇ
‚îú‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î§
‚îÇ  ‚Ä¢ TRIGGERED mode: Call sync_index() after source table updates     ‚îÇ
‚îÇ  ‚Ä¢ CONTINUOUS mode: Syncs automatically (uses more resources)       ‚îÇ
‚îÇ  ‚Ä¢ Monitor status with check_index_status()                         ‚îÇ
‚îÇ  ‚Ä¢ Delete and recreate if schema changes significantly              ‚îÇ
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
""")

print("=" * 70)
print("‚úì Notebook complete!")
print("=" * 70)
