# Genie Space Enhancement V2 - Debug Notebook

## Three-Space Architecture with Sequential Fixes

This notebook tests the V2 enhancement workflow:

1. **Three-Space Architecture**
   - Production: Original space (never modified)
   - Dev-Best: Best-performing configuration
   - Dev-Working: Where changes are tested

2. **Four Fix Types Only**
   - Metric Views (create/delete)
   - Metadata (descriptions, synonyms)
   - Sample Queries (parameterized templates)
   - Instructions (text instructions)

3. **Sequential Evaluation**
   - Apply one fix ‚Üí Evaluate ‚Üí Update best or rollback
   - ~40 minutes per full loop

## Usage
1. Run cells in order
2. Check intermediate results
3. At the end, decide whether to promote to production

## 1Ô∏è‚É£ Environment Setup

In [None]:
# Project path setup
import sys
import os
from pathlib import Path

print(f"Current working directory: {os.getcwd()}")

# Find project root
current_path = Path(os.getcwd())
if current_path.name == 'genie_enhancer':
    project_root = current_path
else:
    search_path = current_path
    while search_path.name != 'genie_enhancer' and search_path != search_path.parent:
        search_path = search_path.parent
    project_root = search_path if search_path.name == 'genie_enhancer' else current_path

if str(project_root) not in sys.path:
    sys.path.insert(0, str(project_root))

print(f"‚úÖ Project root: {project_root}")

In [None]:
# Import modules (updated for v3 lib/ structure)
import json
import yaml
import pandas as pd
from datetime import datetime

from lib.genie_client import GenieConversationalClient
from lib.space_api import SpaceUpdater
from lib.scorer import BenchmarkScorer
from lib.benchmark_parser import BenchmarkLoader
from lib.llm import DatabricksLLMClient
from lib.sql import SQLExecutor

# V3 Components (Batch Apply)
from lib.space_cloner import SpaceCloner
from lib.enhancer import EnhancementPlanner
from lib.applier import BatchApplier

print("‚úÖ All modules imported successfully")

In [None]:
# Load configuration from config/app.yaml
config_path = 'config/app.yaml'
if os.path.exists(config_path):
    with open(config_path, 'r') as f:
        app_config = yaml.safe_load(f)

    for env_var in app_config.get('env', []):
        name = env_var.get('name')
        value = env_var.get('value')
        os.environ[name] = value
        print(f"{name} = {value}")

    print("\n‚úÖ config/app.yaml loaded")
else:
    print(f"‚ö†Ô∏è {config_path} not found - using manual configuration below")

In [None]:
# Connection settings (update with your values)
# === GENIE SPACE CONNECTION ===
DATABRICKS_HOST = "your-workspace.cloud.databricks.com"  # Genie Space host
DATABRICKS_TOKEN = "YOUR_TOKEN_HERE"  # Genie token - TODO: Update
GENIE_SPACE_ID = "your-space-id"  # Production space ID
WAREHOUSE_ID = "your-warehouse-id"  # Required for metric views

# === LLM CONNECTION (can be different!) ===
LLM_HOST = "your-workspace.cloud.databricks.com"  # LLM host - TODO: Change if different
LLM_TOKEN = "YOUR_LLM_TOKEN_HERE"  # LLM token - TODO: Set your LLM token (different from Genie!)
LLM_ENDPOINT = "databricks-claude-sonnet-4"  # Model endpoint

os.environ['DATABRICKS_HOST'] = DATABRICKS_HOST
os.environ['DATABRICKS_TOKEN'] = DATABRICKS_TOKEN
os.environ['GENIE_SPACE_ID'] = GENIE_SPACE_ID

print(f"=== Genie Connection ===")
print(f"  Host: {DATABRICKS_HOST}")
print(f"  Token: {DATABRICKS_TOKEN[:10]}...{DATABRICKS_TOKEN[-4:] if len(DATABRICKS_TOKEN) > 14 else '(set your token!)'}")
print(f"  Space ID: {GENIE_SPACE_ID}")
print(f"\n=== LLM Connection ===")
print(f"  Host: {LLM_HOST}")
print(f"  Token: {LLM_TOKEN[:10]}...{LLM_TOKEN[-4:] if len(LLM_TOKEN) > 14 else '(set your token!)'}")
print(f"  Endpoint: {LLM_ENDPOINT}")
print(f"\n‚úÖ Connection settings configured")

## 2Ô∏è‚É£ Initialize Clients

In [None]:
# Initialize Space Cloner (for three-space architecture)
print("Initializing Space Cloner...")
space_cloner = SpaceCloner(
    host=DATABRICKS_HOST,
    token=DATABRICKS_TOKEN
)
print("‚úÖ Space Cloner initialized")

In [None]:
# Initialize LLM Client (uses LLM_HOST, can be different from Genie host)
print(f"Initializing LLM Client...")
print(f"  Host: {LLM_HOST}")
print(f"  Endpoint: {LLM_ENDPOINT}")

llm_client = DatabricksLLMClient(
    host=LLM_HOST,      # Uses LLM host (can be different from DATABRICKS_HOST)
    token=LLM_TOKEN,    # Uses LLM token
    endpoint_name=LLM_ENDPOINT
)

if llm_client.test_connection():
    print(f"‚úÖ LLM Client initialized and connected")
else:
    print("‚ùå LLM connection failed")

## 3Ô∏è‚É£ Setup Three-Space Architecture

This creates:
- `{SpaceName}_dev_working` - Where changes are tested
- `{SpaceName}_dev_best` - Best-performing configuration

Production space is **never modified**.

In [None]:
# Setup three-space architecture
print("Setting up Three-Space Architecture...")
print("(This will clone the production space)\n")

setup_result = space_cloner.setup_three_spaces(
    production_space_id=GENIE_SPACE_ID,
    working_suffix="_dev_working",
    best_suffix="_dev_best"
)

if setup_result["success"]:
    print(f"\n‚úÖ Three-Space Architecture Ready!")
    print(f"\nSpace IDs:")
    print(f"  Production:   {setup_result['production_id']}")
    print(f"  Dev-Working:  {setup_result['dev_working_id']}")
    print(f"  Dev-Best:     {setup_result['dev_best_id']}")
    print(f"\nProduction Name: {setup_result['production_name']}")
    
    # Store for later use
    PRODUCTION_ID = setup_result['production_id']
    DEV_WORKING_ID = setup_result['dev_working_id']
    DEV_BEST_ID = setup_result['dev_best_id']
    initial_config = setup_result['initial_config']
else:
    print(f"‚ùå Setup failed: {setup_result['error']}")

In [None]:
# Initialize Genie Client for DEV-WORKING space
print(f"Initializing Genie Client for dev-working space...")
genie_client = GenieConversationalClient(
    host=DATABRICKS_HOST,
    token=DATABRICKS_TOKEN,
    space_id=DEV_WORKING_ID,  # Use dev-working for testing
    verbose=True
)
print(f"‚úÖ Genie Client initialized (space: {DEV_WORKING_ID})")

In [None]:
# Quick API test on dev-working
print("Testing dev-working space...")
test_response = genie_client.ask("What tables are available?", timeout=60)
print(f"Status: {test_response['status']}")
if test_response['status'] == 'COMPLETED':
    print("‚úÖ Dev-working space is functional")
else:
    print(f"‚ö†Ô∏è Response: {test_response}")

## 4Ô∏è‚É£ Load Benchmarks & Initial Scoring

In [None]:
# Load benchmarks
benchmark_file = "benchmarks/benchmarks.json"
print(f"Loading benchmarks from: {benchmark_file}")

loader = BenchmarkLoader(benchmark_file)
all_benchmarks = loader.load()
print(f"\n‚úÖ Loaded {len(all_benchmarks)} total benchmarks")

# Filter for testing (optional)
TEST_MODE = True
if TEST_MODE:
    # Use subset for faster testing
    benchmarks = [x for x in all_benchmarks if x['source_file']=='social_analytics_benchmark.md']
    print(f"‚ö†Ô∏è TEST MODE: Using {len(benchmarks)} benchmarks")
else:
    benchmarks = all_benchmarks
    print(f"FULL MODE: Using {len(benchmarks)} benchmarks")

In [None]:
# Initialize Benchmark Scorer
print("Initializing Benchmark Scorer...")
scorer = BenchmarkScorer(
    genie_client=genie_client,
    llm_client=llm_client,
    config={
        "parallel_workers": 2,      # Run 2 questions at once
        "question_timeout": 180     # 3 min timeout per question
    }
)
print("‚úÖ Scorer initialized (parallel_workers=2)")

In [None]:
# Run initial scoring on dev-working
print("Scoring initial state on dev-working space...")
print("(This may take a few minutes)\n")

initial_results = scorer.score(benchmarks)

print("\n" + "=" * 80)
print("INITIAL BENCHMARK RESULTS")
print("=" * 80)
print(f"Score: {initial_results['score']:.1%}")
print(f"Passed: {initial_results['passed']}/{initial_results['total']}")
print(f"Failed: {initial_results['failed']}/{initial_results['total']}")
print(f"Duration: {initial_results['duration_seconds']:.1f}s")
print("=" * 80)

In [None]:
# Show failed benchmarks
failed_results = [r for r in initial_results['results'] if not r['passed']]

if failed_results:
    print(f"\n‚ùå Failed Benchmarks ({len(failed_results)}):\n")
    for i, result in enumerate(failed_results, 1):
        print(f"{i}. {result['question'][:70]}...")
        print(f"   Category: {result.get('failure_category', 'unknown')}")
        print()

## 5Ô∏è‚É£ Analyze Failures & Generate Fixes

Using the simplified prompt with **only 4 fix types**:
1. Metric Views
2. Metadata (descriptions, synonyms)
3. Sample Queries
4. Instructions

In [None]:
# Initialize Sequential Enhancer with SQL Executor for metric views
print("Initializing SQL Executor...")
sql_executor = SQLExecutor(
    host=DATABRICKS_HOST,
    token=DATABRICKS_TOKEN,
    warehouse_id=WAREHOUSE_ID
)
print(f"‚úÖ SQL Executor initialized (warehouse: {WAREHOUSE_ID})")

print("Initializing Sequential Enhancer...")
sequential_enhancer = SequentialEnhancer(
    llm_client=llm_client,
    space_cloner=space_cloner,
    scorer=scorer,
    sql_executor=sql_executor,  # For creating metric views in Unity Catalog
    config={
        "catalog": "sandbox",
        "schema": "agent_poc",
        "metric_view_prefix": "mv_"
    }
)

print("‚úÖ Sequential Enhancer initialized (with metric view support)")

In [None]:
# Analyze all failures and group fixes by type
print("Analyzing failures and generating fixes...")
print("(LLM will analyze each failure)\n")

grouped_fixes = sequential_enhancer.analyze_all_failures(
    benchmark_results=initial_results,
    space_config=initial_config
)

print("\n" + "=" * 80)
print("FIXES GENERATED (Grouped by Type)")
print("=" * 80)

total_fixes = 0
for category, fixes in grouped_fixes.items():
    count = len(fixes)
    total_fixes += count
    print(f"\n{category.upper()}: {count} fixes")
    for i, fix in enumerate(fixes, 1):
        fix_type = fix.get('type', 'unknown')
        print(f"  {i}. {fix_type}")
        if fix_type == 'add_synonym':
            print(f"     {fix['table']}.{fix['column']} ‚Üí '{fix['synonym']}'")
        elif fix_type == 'add_column_description':
            print(f"     {fix['table']}.{fix['column']}")
        elif fix_type == 'add_example_query':
            print(f"     Pattern: {fix.get('pattern_name', 'N/A')}")
        elif fix_type == 'create_metric_view':
            print(f"     {fix['catalog']}.{fix['schema']}.{fix['metric_view_name']}")

print(f"\n{'=' * 80}")
print(f"TOTAL FIXES: {total_fixes}")
print(f"{'=' * 80}")

## 6Ô∏è‚É£ Sequential Fix Application

Apply fixes **one at a time**:
1. Apply fix to dev-working
2. Wait for indexing
3. Evaluate benchmarks
4. If improved ‚Üí Update dev-best
5. If not improved ‚Üí Rollback from dev-best

‚ö†Ô∏è **This will take ~40 minutes for a full loop**

In [None]:
# Configuration for sequential loop
INDEXING_WAIT_TIME = 60  # seconds to wait after each change
DRY_RUN = True  # Set to False to actually apply changes

if DRY_RUN:
    print("‚ö†Ô∏è DRY RUN MODE - Changes will NOT be applied")
    print("Set DRY_RUN = False to run the sequential loop")
else:
    print("‚ùó LIVE MODE - Changes WILL be applied to dev-working")
    print(f"Indexing wait time: {INDEXING_WAIT_TIME}s")
    print(f"Estimated time: ~{total_fixes * (INDEXING_WAIT_TIME + 30) / 60:.0f} minutes")

In [None]:
# Run sequential enhancement loop
if not DRY_RUN and total_fixes > 0:
    print("Starting Sequential Enhancement Loop...")
    print(f"This will apply {total_fixes} fixes one at a time.\n")
    
    loop_result = sequential_enhancer.run_sequential_loop(
        benchmarks=benchmarks,
        grouped_fixes=grouped_fixes,
        indexing_wait_time=INDEXING_WAIT_TIME
    )
    
    print("\n" + "=" * 80)
    print("SEQUENTIAL LOOP RESULTS")
    print("=" * 80)
    print(f"Initial Score: {loop_result['initial_score']:.1%}")
    print(f"Final Score: {loop_result['final_score']:.1%}")
    print(f"Improvement: {loop_result['final_score'] - loop_result['initial_score']:+.1%}")
    print(f"Fixes Applied: {len(loop_result['fixes_applied'])}")
    print(f"Fixes Rejected: {len(loop_result['fixes_rejected'])}")
    print("=" * 80)
    
else:
    print("‚ö†Ô∏è Sequential loop skipped (DRY_RUN=True or no fixes)")
    loop_result = None

In [None]:
# Show detailed history (if loop was run)
if loop_result and loop_result.get('history'):
    print("\nFix Application History:")
    print("-" * 80)
    
    history_df = pd.DataFrame(loop_result['history'])
    display_cols = ['fix_category', 'fix_type', 'score_before', 'score_after', 'accepted']
    display(history_df[display_cols])

## 7Ô∏è‚É£ Final Results & Decision

Now you can decide:
1. **Promote** dev-best to production
2. **Keep** all spaces for review
3. **Delete** dev spaces (no promotion)

In [None]:
# Summary of three spaces
print("\n" + "=" * 80)
print("THREE-SPACE STATUS")
print("=" * 80)
print(f"\nProduction (unchanged): {PRODUCTION_ID}")
print(f"Dev-Working (test space): {DEV_WORKING_ID}")
print(f"Dev-Best (best config): {DEV_BEST_ID}")

if loop_result:
    print(f"\nBest Score Achieved: {loop_result['final_score']:.1%}")
else:
    print(f"\nInitial Score: {initial_results['score']:.1%}")

In [None]:
# OPTION 1: Promote dev-best to production
PROMOTE_TO_PRODUCTION = False  # Set to True to promote

if PROMOTE_TO_PRODUCTION:
    print("Promoting dev-best configuration to production...")
    result = space_cloner.promote_to_production()
    if result['success']:
        print("‚úÖ Production updated with best configuration!")
    else:
        print(f"‚ùå Promotion failed: {result['error']}")
else:
    print("‚ö†Ô∏è Promotion skipped (PROMOTE_TO_PRODUCTION=False)")

In [None]:
# OPTION 2: Keep all spaces for review
print("\nTo keep all spaces for manual review, do nothing.")
print(f"\nYou can access:")
print(f"  - Production: https://{DATABRICKS_HOST}/genie/spaces/{PRODUCTION_ID}")
print(f"  - Dev-Working: https://{DATABRICKS_HOST}/genie/spaces/{DEV_WORKING_ID}")
print(f"  - Dev-Best: https://{DATABRICKS_HOST}/genie/spaces/{DEV_BEST_ID}")

In [None]:
# OPTION 3: Delete dev spaces (cleanup)
CLEANUP_DEV_SPACES = False  # Set to True to delete dev spaces

if CLEANUP_DEV_SPACES:
    print("Deleting dev spaces...")
    result = space_cloner.cleanup_dev_spaces()
    if result['success']:
        print(f"‚úÖ Deleted: {result['deleted']}")
    else:
        print(f"‚ö†Ô∏è Cleanup issues: {result['error']}")
else:
    print("‚ö†Ô∏è Cleanup skipped (CLEANUP_DEV_SPACES=False)")

## 8Ô∏è‚É£ Debug Utilities

In [None]:
# Test a single fix manually
def test_single_fix(fix: dict):
    """Apply a single fix and evaluate."""
    print(f"Testing fix: {fix.get('type')}")
    
    # Get current config
    current_config = space_cloner.get_dev_working_config()
    
    # Apply fix
    new_config = sequential_enhancer._apply_single_fix(current_config, fix)
    
    # Show diff
    print("\nConfig changes:")
    print(json.dumps(fix, indent=2))
    
    return new_config

# Example: Test first metadata fix
# if grouped_fixes.get('metadata'):
#     test_single_fix(grouped_fixes['metadata'][0])

In [None]:
# Validate current dev-working config
def validate_dev_working():
    """Validate the dev-working space configuration."""
    space_updater = SpaceUpdater(DATABRICKS_HOST, DATABRICKS_TOKEN)
    config = space_cloner.get_dev_working_config()
    validation = space_updater.validate_config(config)
    
    print("\nDev-Working Validation:")
    print(f"  Valid: {'‚úÖ' if validation['is_valid'] else '‚ùå'}")
    
    if validation['errors']:
        print(f"  Errors: {len(validation['errors'])}")
        for e in validation['errors'][:5]:
            print(f"    - {e}")
    
    if validation['warnings']:
        print(f"  Warnings: {len(validation['warnings'])}")
    
    return validation

validate_dev_working()

In [None]:
# Save fixes to JSON for review
if grouped_fixes:
    output_file = "debug_fixes_v2.json"
    with open(output_file, 'w') as f:
        json.dump(grouped_fixes, f, indent=2, default=str)
    print(f"‚úÖ Fixes saved to {output_file}")

## üéØ Next Steps

After testing in this notebook:

1. ‚úÖ Verify three-space architecture works
2. ‚úÖ Check fix generation (4 types only)
3. ‚úÖ Test sequential application with DRY_RUN=False
4. ‚úÖ Decide on promotion to production

Then move to the **Streamlit App** for interactive use:
```bash
streamlit run databricks_apps/interactive_enhancement_app.py
```