# Search Index Management - Exploratory Notebook

This notebook demonstrates how to use the search index writer and reader clients to manage LanceDB search indices for financial data analysis.

## Overview

- **Writer Service**: Creates and manages tables from Vena model hierarchy data
- **Reader Service**: Provides hybrid search capabilities over the indexed data
- **Multi-tenant**: Supports multiple tenants with isolated data

## Configuration Variables

Configure these variables for your exploration:

In [1]:
import os
from dotenv import load_dotenv

load_dotenv(override=True)

# Configuration variables - modify these as needed
TENANT_ID = 1466591000091951104  # Tenant identifier
MODEL_IDS = [1466593582621392896, 1466596072087486464]
AUTH = (os.getenv("VENA_USER"), os.getenv("VENA_KEY"))

# Service URLs
WRITER_URL = "http://localhost:8001"
READER_URL = "http://localhost:8002"

# Search parameters
SEARCH_QUERY = "revenue"  # Example search query
SEARCH_LIMIT = 5          # Number of results to return

print(f"Configuration:")
print(f"  Tenant ID: {TENANT_ID}")
print(f"  Model IDs: {MODEL_IDS}")
print(f"  Writer Service: {WRITER_URL}")
print(f"  Reader Service: {READER_URL}")

Configuration:
  Tenant ID: 1466591000091951104
  Model IDs: [1466593582621392896, 1466596072087486464]
  Writer Service: http://localhost:8001
  Reader Service: http://localhost:8002


## Import Dependencies

In [2]:
import sys
from typing import List, Dict, Any

# Add project root to Python path
sys.path.append('.')

# Import the clients
from clients.writer_client import SearchIndexWriterClient
from clients.reader_client import SearchIndexReaderClient

print("✓ Dependencies imported successfully")

✓ Dependencies imported successfully


## Health Checks

First, let's verify that both services are running:

In [3]:
def check_services():
    """Check if both services are healthy."""
    print("Checking service health...")
    
    try:
        # Check writer service
        with SearchIndexWriterClient(base_url=WRITER_URL, auth=AUTH) as writer:
            writer_health = writer.health_check()
            print(f"✓ Writer Service: {writer_health}")
    except Exception as e:
        print(f"✗ Writer Service: {e}")
        return False
    
    try:
        # Check reader service
        with SearchIndexReaderClient(base_url=READER_URL, auth=AUTH) as reader:
            reader_health = reader.health_check()
            print(f"✓ Reader Service: {reader_health}")
    except Exception as e:
        print(f"✗ Reader Service: {e}")
        return False
    
    return True

services_healthy = check_services()

Checking service health...
✓ Writer Service: {'status': 'healthy', 'timestamp': '2025-07-14T09:01:13.037685', 'version': '1.0.0', 'dependencies': {'azure_openai': 'healthy', 'vena_api': 'healthy', 'lancedb': 'healthy'}}
✓ Reader Service: {'status': 'healthy', 'timestamp': '2025-07-14T09:01:13.048842', 'version': '1.0.0', 'dependencies': {'azure_openai': 'healthy', 'lancedb': 'healthy'}}


## Writer Service Operations

### List Existing Tables

In [4]:
def list_tables():
    """List all tables for the configured tenant."""
    with SearchIndexWriterClient(base_url=WRITER_URL, auth=AUTH) as writer:
        tables = writer.list_tables(tenant_id=TENANT_ID)
        
        if tables:
            print(f"Found {len(tables)} tables for tenant '{TENANT_ID}':")
            for table in tables:
                print(table)
        else:
            print(f"No tables found for tenant '{TENANT_ID}'")
        
        return tables

if services_healthy:
    existing_tables = list_tables()
else:
    print("Skipping table listing - services not healthy")

No tables found for tenant '1466591000091951104'


### Create a New Table

This will fetch data from the Vena API and create a new search index table:

In [5]:
def create_table(model_id: int):
    """Create a new table from Vena model hierarchy data."""
    print(f"Creating table for tenant '{TENANT_ID}' with model ID {model_id}...")
    
    with SearchIndexWriterClient(base_url=WRITER_URL, auth=AUTH) as writer:
        try:
            response = writer.create_table(
                tenant_id=TENANT_ID,
                model_id=model_id,
                force_recreate=True  # Set to True to force recreation
            )
            
            print(f"✓ Table created successfully:")
            print(f"  Table name: {response.table_name}")

            return response
            
        except Exception as e:
            print(f"✗ Error creating table: {e}")
            return None

# Uncomment the following lines to create a table
if services_healthy:
    for model_id in MODEL_IDS:
        table_response = create_table(model_id)
else:
    print("Skipping table creation - services not healthy")

Creating table for tenant '1466591000091951104' with model ID 1466593582621392896...
✓ Table created successfully:
  Table name: hierarchies_1466593582621392896
Creating table for tenant '1466591000091951104' with model ID 1466596072087486464...
✓ Table created successfully:
  Table name: hierarchies_1466596072087486464


## Reader Service Operations

### Hybrid Search

Perform semantic search on the indexed data:

In [6]:
def perform_hybrid_search(model_id: int, query: str, limit: int = 5, dimension_filter: str = None):
    """Perform hybrid search on the model."""
    print(f"Performing hybrid search for model {model_id} with query: '{query}'")
    
    with SearchIndexReaderClient(base_url=READER_URL, auth=AUTH) as reader:
        try:
            results = reader.hybrid_search(
                tenant_id=TENANT_ID,
                model_id=model_id,
                query=query,
                limit=limit,
                dimension_filter=dimension_filter
            )
            
            print(f"✓ Found {len(results.results)} results")
            for result in results.results:
                print(result)
                
            return results
            
        except Exception as e:
            print(f"✗ Error performing search: {e}")
            return None

# Example searches
if services_healthy:
    # Basic search
    for model_id in MODEL_IDS:
        search_results = perform_hybrid_search(model_id, SEARCH_QUERY, limit=SEARCH_LIMIT)
    
    # Search with dimension filter (uncomment to try)
    # filtered_results = perform_hybrid_search("cost", limit=3, dimension_filter="Account")
else:
    print("Skipping search - services not healthy")

Performing hybrid search for model 1466593582621392896 with query: 'revenue'
✓ Found 5 results
member_id='1514408816967024640_651631428002119680' member_name='Revenue' member_alias='Revenue' dimension='Account' score=0.03333333507180214 search_text='Revenue'
member_id='1514408816967024640_814709306394148865' member_name='CF - Change in Deferred Revenue' member_alias='CF - Change in Deferred Revenue' dimension='Account' score=0.032522473484277725 search_text='CF - Change in Deferred Revenue'
member_id='1514408816967024640_651631427951788032' member_name='Net Income' member_alias='Net Income' dimension='Account' score=0.016393441706895828 search_text='Net Income'
member_id='1514408816967024640_691121205046280192' member_name='Other Income/Expense' member_alias='Other Income/Expense' dimension='Account' score=0.01587301678955555 search_text='Other Income/Expense'
member_id='1514408816967024640_691119840308494336' member_name='EBIT' member_alias='EBIT' dimension='Account' score=0.015625 se

## Table Management

### Delete Tables (Use with caution!)

In [None]:
def delete_table(table_name: str, confirm: bool = False):
    """Delete a table. Use with caution!"""
    if not confirm:
        print("⚠️  Table deletion requires confirmation. Set confirm=True to proceed.")
        return False
    
    with SearchIndexWriterClient(base_url=WRITER_URL, auth=AUTH) as writer:
        try:
            success = writer.delete_table(tenant_id=TENANT_ID, table_name=table_name)
            
            if success:
                print(f"✓ Table '{table_name}' deleted successfully")
            else:
                print(f"✗ Failed to delete table '{table_name}'")
                
            return success
            
        except Exception as e:
            print(f"✗ Error deleting table: {e}")
            return False

# Example (commented out for safety)
# WARNING: This will permanently delete the table!
if services_healthy:
    for model_id in MODEL_IDS:
        delete_table(f"hierarchies_{model_id}", confirm=True)
else:
    print("Skipping table deletion - services not healthy")