# Snowflake AI Cost Toolkit - Local Development Setup

This notebook provides a streamlined testing environment for the AI Cost Toolkit.
Works both locally (using `config.env`) and in Snowflake environments.

**Features:**
- 🔧 Automatic session management (Snowflake or local)
- 📊 Data population and verification
- 💰 Cost analysis testing with smart SQL matching
- 📈 Comprehensive analytics function testing


## 1. Setup Session (Works Locally & in Snowflake)


In [1]:
# Import required libraries
import pandas as pd
import json

# Force reload utils module to get latest changes
import importlib
import utils
importlib.reload(utils)

from utils import (
    fetch_semantic_model_paths,
    get_cortex_analyst_logs,
    write_logs_to_table,
    create_sf_intelligence_query_history,
    refresh_sf_intelligence_query_history,
    get_session,
    create_local_session,
    get_ai_services_total_credits
)

print("📦 Modules loaded successfully")

# Get session - now uses the simplified, fixed logic in utils.py
session = get_session()
print("✅ Session initialized successfully")


📦 Modules loaded successfully
🔧 No active session found, creating local session for development...
✅ Local Snowpark session created successfully
   Account: zqb38977.us-east-1
   User: kaitlyn
   Role: SNOWFLAKE_INTELLIGENCE_ADMIN_RL
   Warehouse: CORTEX_ANALYST_WH
   Database: CORTEX_ANALYTICS
   Schema: PUBLIC
✅ Session initialized successfully


## 2. Populate Cortex Analyst Logs Table


In [None]:
# Check if CORTEX_ANALYST_LOGS table exists before creating
def table_exists(session, table_name, schema_name="PUBLIC", database_name="CORTEX_ANALYTICS"):
    """Check if a table exists in the specified database and schema."""
    try:
        result = session.sql(f"""
            SELECT COUNT(*) as table_count 
            FROM INFORMATION_SCHEMA.TABLES 
            WHERE TABLE_SCHEMA = '{schema_name}' 
            AND TABLE_NAME = '{table_name}' 
            AND TABLE_CATALOG = '{database_name}'
        """).collect()
        return result[0]['TABLE_COUNT'] > 0
    except:
        return False


# Create CORTEX_ANALYST_LOGS table only if it doesn't exist
if not table_exists(session, "CORTEX_ANALYST_LOGS"):
    print("📋 Creating CORTEX_ANALYST_LOGS table...")
    create_cortex_logs_table = """
    CREATE TABLE CORTEX_ANALYST_LOGS (
        TIMESTAMP                TIMESTAMP_NTZ,
        REQUEST_ID               STRING,
        SEMANTIC_MODEL_NAME      STRING,
        TABLES_REFERENCED        STRING,
        USER_NAME                STRING,
        SOURCE                   STRING,
        FEEDBACK                 STRING,
        RESPONSE_STATUS_CODE     INTEGER,
        USER_QUESTION            STRING,
        LATENCY_MS               NUMBER,
        GENERATED_SQL            STRING,
        ORCHESTRATION_PATH       STRING,
        QUESTION_CATEGORY        STRING,
        VERIFIED_QUERY_NAME      STRING,
        VERIFIED_QUERY_QUESTION  STRING,
        QUERY_TYPE               STRING,
        CORTEX_ANALYST_CREDITS   FLOAT
    )
    """

    session.sql(create_cortex_logs_table).collect()
    print("✅ CORTEX_ANALYST_LOGS table created successfully")
else:
    print("✅ CORTEX_ANALYST_LOGS table already exists, skipping creation")

# Check if query history table exists before creating
if not table_exists(session, "SF_INTELLIGENCE_QUERY_HISTORY"):
    print("📋 Creating SF_INTELLIGENCE_QUERY_HISTORY table...")
    target_table = "CORTEX_ANALYTICS.PUBLIC.SF_INTELLIGENCE_QUERY_HISTORY"

    try:
        create_sf_intelligence_query_history(session, target_table)
        print(f"✅ {target_table} created successfully")
    except Exception as e:
        print(f"⚠️  Warning: Could not create query history table: {e}")
        print("This may be due to insufficient permissions on ACCOUNT_USAGE views")
else:
    print("✅ SF_INTELLIGENCE_QUERY_HISTORY table already exists, skipping creation")

In [2]:
# Test basic connection
result = session.sql("SELECT CURRENT_USER(), CURRENT_ROLE(), CURRENT_DATABASE(), CURRENT_SCHEMA()").collect()
user_info = result[0]

print(f"👤 User: {user_info[0]}")
print(f"🎭 Role: {user_info[1]}")
print(f"🗄️ Database: {user_info[2]}")
print(f"📁 Schema: {user_info[3]}")

# Test semantic model path fetching
print("\n🔍 Testing fetch_semantic_model_paths()...")

try:
    df_results = fetch_semantic_model_paths(session)
    
    if not df_results.empty:
        print(f"✅ Found {len(df_results)} semantic model configurations")
        print("\nSemantic model files found:")
        display(df_results)
    else:
        print("⚠️  No semantic model configurations found")
        print("This is normal if you haven't set up Cortex Agents yet")
        
except Exception as e:
    print(f"⚠️  Error fetching semantic model paths: {e}")
    print("This may be because:")
    print("  • No Cortex Agents are configured in your account")
    print("  • Your role doesn't have access to SNOWFLAKE_INTELLIGENCE schema")


👤 User: KAITLYN
🎭 Role: SNOWFLAKE_INTELLIGENCE_ADMIN_RL
🗄️ Database: CORTEX_ANALYTICS
📁 Schema: PUBLIC

🔍 Testing fetch_semantic_model_paths()...
✅ Found 13 semantic model configurations

Semantic model files found:


Unnamed: 0,agent_name,tool_name,semantic_model_file
0,Contract,CONTRACT_SEARCH,
1,Contract,cobranding_contracts (1).yaml,@DOC_AI_QS_DB.DOC_AI_SCHEMA.SEMANTIC_MODELS/co...
2,DATA_ENGINEER_ASSISTANT,Snowflake_Documentation,
3,ECOMMERCE_COMPANY,ecommerce_customer_behavior_data,@CAR_GURU_DB.PUBLIC.SEMANTIC_MODEL/product_ana...
4,Equity Research,SEC Doc Search,
5,Equity Research,SENTIMENT_ANALYSIS,
6,Equity Research,sp500_semantic_model.yaml,@CUBE_TESTING.PUBLIC.ANALYST/sp500_semantic_mo...
7,MARKETING_GENIE,MARKETING_RESEARCH,
8,MARKETING_GENIE,ads_data,@CME.KAITLYN_ADS_IMAGE_ANALYTICS.SEMANTIC_MODE...
9,MORTGAGE_LOAN_ANALYSIS,ml_explainability,


In [4]:
# Populate Cortex Analyst Logs - following the pattern from your example
if not df_results.empty and 'semantic_model_file' in df_results.columns:
    semantic_model_files = df_results['semantic_model_file'].dropna(
    ).unique().tolist()
    print(
        f"📄 Processing {len(semantic_model_files)} unique semantic model files")

    for file in semantic_model_files:
        if file != None:
            print(f"\n📊 Processing: {file}")

            try:
                # Get logs for this semantic model
                df = get_cortex_analyst_logs(session, file)

                if not df.empty:
                    print(f"   ✅ Retrieved {len(df)} log entries")

                    # Write to table using pandas write
                    session.write_pandas(
                        df,
                        table_name="CORTEX_ANALYST_LOGS",
                        auto_create_table=False,
                        overwrite=False  # Append to existing data
                    )
                    print(f"   ✅ Data written to CORTEX_ANALYST_LOGS table")
                else:
                    print(f"   ⚠️  No log entries found for this semantic model")

            except Exception as e:
                print(f"   ❌ Error processing {file}: {e}")
                continue

    print("\n🎉 Completed processing all semantic model files")
else:
    print("⚠️  No semantic model files found to process")
    print("Please ensure you have Cortex Agents configured with semantic models")

📄 Processing 5 unique semantic model files

📊 Processing: @DOC_AI_QS_DB.DOC_AI_SCHEMA.SEMANTIC_MODELS/cobranding_contracts (1).yaml
   ✅ Retrieved 6 log entries
   ✅ Data written to CORTEX_ANALYST_LOGS table

📊 Processing: @CAR_GURU_DB.PUBLIC.SEMANTIC_MODEL/product_analytics.yaml
   ✅ Retrieved 31 log entries
   ✅ Data written to CORTEX_ANALYST_LOGS table

📊 Processing: @CUBE_TESTING.PUBLIC.ANALYST/sp500_semantic_model.yaml
   ✅ Retrieved 37 log entries
   ✅ Data written to CORTEX_ANALYST_LOGS table

📊 Processing: @CME.KAITLYN_ADS_IMAGE_ANALYTICS.SEMANTIC_MODELS/media_mix_data.yaml
   ✅ Retrieved 36 log entries
   ✅ Data written to CORTEX_ANALYST_LOGS table

📊 Processing: @E2E_SNOW_MLOPS_DB.STREAMLIT.SEMANTIC/mortgage_lending_prediction_v2.yaml
   ✅ Retrieved 7 log entries
   ✅ Data written to CORTEX_ANALYST_LOGS table

🎉 Completed processing all semantic model files


Verify Data Population

In [5]:
# Check how many records were loaded
record_count = session.sql(
    "SELECT COUNT(*) as total_records FROM CORTEX_ANALYST_LOGS").collect()[0]['TOTAL_RECORDS']
print(f"📊 Total records in CORTEX_ANALYST_LOGS table: {record_count:,}")

if record_count > 0:
    # Show sample data
    sample_data = session.sql("""
        SELECT 
            semantic_model_name,
            COUNT(*) as log_count,
            MIN(timestamp) as earliest_log,
            MAX(timestamp) as latest_log
        FROM CORTEX_ANALYST_LOGS 
        GROUP BY semantic_model_name 
        ORDER BY log_count DESC
    """).to_pandas()

    print("\n📈 Summary by Semantic Model:")
    display(sample_data)

    # Show query type breakdown
    query_types = session.sql("""
        SELECT 
            query_type,
            COUNT(*) as count,
            ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER (), 2) as percentage
        FROM CORTEX_ANALYST_LOGS 
        GROUP BY query_type
        ORDER BY count DESC
    """).to_pandas()

    print("\n🔍 Query Type Breakdown:")
    display(query_types)
else:
    print("\n⚠️  No data was loaded. This could be because:")
    print("   • No Cortex Agents are configured")
    print("   • No queries have been made to the agents yet")
    print("   • Semantic model files are not accessible")

📊 Total records in CORTEX_ANALYST_LOGS table: 117

📈 Summary by Semantic Model:


Unnamed: 0,SEMANTIC_MODEL_NAME,LOG_COUNT,EARLIEST_LOG,LATEST_LOG
0,@CUBE_TESTING.PUBLIC.ANALYST/sp500_semantic_mo...,37,116085-09-19 22:17:59.503872,19887-10-23 14:36:00.609280
1,@CME.KAITLYN_ADS_IMAGE_ANALYTICS.SEMANTIC_MODE...,36,-24868-10-30 13:01:14.457088,-154931-05-09 03:52:32.434176
2,@CAR_GURU_DB.PUBLIC.SEMANTIC_MODEL/product_ana...,31,60517-04-12 04:14:53.539328,25165-01-09 01:22:05.531136
3,@E2E_SNOW_MLOPS_DB.STREAMLIT.SEMANTIC/mortgage...,7,264452-03-24 15:08:02.256896,264015-09-05 05:10:19.879424
4,@DOC_AI_QS_DB.DOC_AI_SCHEMA.SEMANTIC_MODELS/co...,6,-58047-12-10 15:21:29.660416,263924-03-20 07:44:33.289216



🔍 Query Type Breakdown:


Unnamed: 0,QUERY_TYPE,COUNT,PERCENTAGE
0,Non-Verified Query,116,99.15
1,Verified Query,1,0.85


In [3]:
# Test AI Services queries
print("💰 Testing AI Services Credit Queries...\n")

# Import some of the LLM dashboard functions
from utils import (
    get_ai_services_total_credits,
    get_llm_inference_summary,
    get_cortex_analyst_summary
)

try:
    # Get total AI services credits for last 30 days
    credits_df = get_ai_services_total_credits(session, days=30)
    
    if not credits_df.empty:
        total_credits = credits_df['TOTAL_CREDITS'].iloc[0]
        print(f"✅ Total AI Services Credits (last 30 days): {total_credits:,.2f}")
        display(credits_df)
    else:
        print("ℹ️  No AI services usage found in the last 30 days")
        
except Exception as e:
    print(f"⚠️  Could not fetch AI services credits: {e}")

💰 Testing AI Services Credit Queries...

✅ Total AI Services Credits (last 30 days): 61.59


Unnamed: 0,TOTAL_CREDITS
0,61.59


## 3. Test Cost Analysis Functions


In [2]:
# Test create_cortex_analyst_query_history function
print("🔗 Testing create_cortex_analyst_query_history()...")

from utils import (
    create_cortex_analyst_query_history,
    total_cost_by_semantic_model,
    cost_breakdown_by_semantic_model
)

try:
    # First, get the Cortex Analyst logs we just populated
    cortex_logs_df = session.sql("SELECT * FROM CORTEX_ANALYST_LOGS").to_pandas()
    print(f"📊 Retrieved {len(cortex_logs_df)} Cortex Analyst log records")
    
    # Check if SF_INTELLIGENCE_QUERY_HISTORY table exists and has data
    try:
        query_history_count = session.sql("""
            SELECT COUNT(*) as count 
            FROM CORTEX_ANALYTICS.PUBLIC.SF_INTELLIGENCE_QUERY_HISTORY
        """).collect()[0]['COUNT']
        print(f"📋 Found {query_history_count:,} query history records")
        
        if query_history_count > 0:
            # Test the join function
            print("🔄 Creating joined query history...")
            ca_query_history = create_cortex_analyst_query_history(
                session, 
                cortex_logs_df, 
                "CORTEX_ANALYTICS.PUBLIC.SF_INTELLIGENCE_QUERY_HISTORY"
            )
            
            if not ca_query_history.empty:
                print(f"✅ Successfully created joined dataset with {len(ca_query_history)} records")
                
                # Show sample of joined data
                print("\n📈 Sample of joined data (first 3 records):")
                sample_columns = ['SEMANTIC_MODEL_NAME', 'USER_QUESTION', 'LATENCY_MS', 
                                'CORTEX_ANALYST_CREDITS', 'CREDITS_ATTRIBUTED_COMPUTE']
                available_columns = [col for col in sample_columns if col in ca_query_history.columns]
                display(ca_query_history[available_columns].head(3))
                
                # Test cost analysis functions
                print("\n💰 Testing cost analysis functions...")
                
                # Test total cost by semantic model
                total_costs = total_cost_by_semantic_model(ca_query_history)
                if not total_costs.empty:
                    print(f"✅ Total costs calculated for {len(total_costs)} semantic models")
                    print("\n💸 Total Costs by Semantic Model:")
                    display(total_costs.head())
                
                # Test cost breakdown
                cost_breakdown = cost_breakdown_by_semantic_model(ca_query_history)
                if not cost_breakdown.empty:
                    print(f"✅ Cost breakdown calculated for {len(cost_breakdown)} semantic models")
                    print("\n📊 Cost Breakdown by Semantic Model:")
                    display(cost_breakdown.head())
            else:
                print("⚠️  No matching records found between Cortex logs and query history")
                print("This might be because:")
                print("  • Query history table is empty")
                print("  • No matching queries between the two datasets")
                print("  • Time range mismatch")
        else:
            print("⚠️  No query history data found")
            print("💡 To populate query history, you may need to:")
            print("  • Run queries through Cortex Analyst")
            print("  • Wait for data to appear in ACCOUNT_USAGE views")
            print("  • Ensure your role has access to ACCOUNT_USAGE")
            
    except Exception as qh_error:
        print(f"⚠️  Could not access query history table: {qh_error}")
        print("This might be because:")
        print("  • Table doesn't exist (run the table creation cell above)")
        print("  • Insufficient permissions on ACCOUNT_USAGE views")
        
except Exception as e:
    print(f"❌ Error testing create_cortex_analyst_query_history: {e}")
    import traceback
    traceback.print_exc()


🔗 Testing create_cortex_analyst_query_history()...
📊 Retrieved 117 Cortex Analyst log records
📋 Found 67 query history records
🔄 Creating joined query history...
⚠️  No matching records found between Cortex logs and query history
This might be because:
  • Query history table is empty
  • No matching queries between the two datasets
  • Time range mismatch


In [None]:
# Test create_cortex_analyst_query_history function
print("🔗 Testing create_cortex_analyst_query_history()...")

from utils import (
    create_cortex_analyst_query_history,
    total_cost_by_semantic_model,
    cost_breakdown_by_semantic_model
)

try:
    # Get the Cortex Analyst logs we just populated
    cortex_logs_df = session.sql("SELECT * FROM CORTEX_ANALYST_LOGS").to_pandas()
    print(f"📊 Retrieved {len(cortex_logs_df)} Cortex Analyst log records")
    
    # Check if SF_INTELLIGENCE_QUERY_HISTORY table exists and has data
    try:
        query_history_count = session.sql("""
            SELECT COUNT(*) as count 
            FROM CORTEX_ANALYTICS.PUBLIC.SF_INTELLIGENCE_QUERY_HISTORY
        """).collect()[0]['COUNT']
        print(f"📋 Found {query_history_count:,} query history records")
        
        if query_history_count > 0:
            # Test the improved join function with multi-layered matching
            print("🔄 Creating joined query history...")
            ca_query_history = create_cortex_analyst_query_history(
                session, 
                cortex_logs_df, 
                "CORTEX_ANALYTICS.PUBLIC.SF_INTELLIGENCE_QUERY_HISTORY"
            )
            
            if not ca_query_history.empty:
                print(f"✅ Successfully created joined dataset with {len(ca_query_history)} records")
                print(f"   Match rate: {len(ca_query_history)/len(cortex_logs_df)*100:.1f}%")
                
                # Show sample of joined data
                print("\n💰 Sample joined data showing cost analysis:")
                sample_columns = ['SEMANTIC_MODEL_NAME', 'USER_QUESTION', 'LATENCY_MS', 
                                'CORTEX_ANALYST_CREDITS', 'CREDITS_ATTRIBUTED_COMPUTE', 'TOTAL_CREDITS_WH_AND_CA']
                available_columns = [col for col in sample_columns if col in ca_query_history.columns]
                display(ca_query_history[available_columns].head(3))
                
                # Test cost analysis functions
                print("\n📊 Testing cost analysis functions...")
                
                # Test total cost by semantic model
                total_costs = total_cost_by_semantic_model(ca_query_history)
                if not total_costs.empty:
                    print(f"✅ Total costs calculated for {len(total_costs)} semantic models")
                    print("\n💸 Total Costs by Semantic Model:")
                    display(total_costs.head())
                
                # Test cost breakdown
                cost_breakdown = cost_breakdown_by_semantic_model(ca_query_history)
                if not cost_breakdown.empty:
                    print(f"✅ Cost breakdown calculated for {len(cost_breakdown)} semantic models")
                    print("\n📊 Cost Breakdown by Semantic Model:")
                    display(cost_breakdown.head())
            else:
                print("⚠️  No matching records found between Cortex logs and query history")
                print("This might be because:")
                print("  • Query history table is empty")
                print("  • No matching queries between the two datasets")
                print("  • Time range mismatch")
        else:
            print("⚠️  No query history data found")
            print("💡 To populate query history, you may need to:")
            print("  • Run queries through Cortex Analyst")
            print("  • Wait for data to appear in ACCOUNT_USAGE views")
            print("  • Ensure your role has access to ACCOUNT_USAGE")
            
    except Exception as qh_error:
        print(f"⚠️  Could not access query history table: {qh_error}")
        print("This might be because:")
        print("  • Table doesn't exist (run the table creation cell above)")
        print("  • Insufficient permissions on ACCOUNT_USAGE views")
        
except Exception as e:
    print(f"❌ Error testing create_cortex_analyst_query_history: {e}")
    import traceback
    traceback.print_exc()


🔍 Debugging join key mismatch...

📊 Sample GENERATED_SQL from Cortex Analyst logs:
  1. ...
  2. WITH __co_branding_agreements AS (
  SELECT
    have_force_majeure_value
  FROM doc_ai_qs_db.doc_ai_schema.co_branding_agreements
)
SELECT
  COUNT(*) AS contracts_with_force_majeure
FROM __co_branding...
  3. WITH __co_branding_agreements AS (
  SELECT
    have_renewal_options_value
  FROM doc_ai_qs_db.doc_ai_schema.co_branding_agreements
)
SELECT
  COUNT(*) AS contracts_with_renewal_clause
FROM __co_brand...

📋 Sample CLEANED_QUERY_TEXT from query history:
  1. WITH __co_branding_agreement  AS (
  SELECT
    have_renewal_option _value
  FROM doc_ai_q _db.doc_ai_ chema.co_branding_agreement 
)
SELECT
  *
FROM __co_branding_agreement 
WHERE
  have_renewal_opti...
  2. WITH __demo_mortgage_lending_te t_0 AS (
  SELECT
    time tamp,
    loan_id,
    loan_purpo e_name_home_improvement,
    loan_purpo e_name_home_purcha e,
    loan_purpo e_name_refinancing,
    mortga...
  3. CALL E2E_SNOW_MLOP

Unnamed: 0,SOURCE,AVG_LENGTH,MIN_LENGTH,MAX_LENGTH,TOTAL_RECORDS
0,Cortex_Logs_GENERATED_SQL,538.726496,0,2477,117
1,Query_History_CLEANED_TEXT,659.671642,64,2440,67


In [None]:
# Test additional analysis functions with the populated data
print("📊 Testing additional analysis functions...")

from utils import (
    verified_query_count,
    top_verified_queries,
    slowest_queries,
    latency_summary_by_semantic_model,
    user_activity_by_semantic_model
)

try:
    # Get all Cortex Analyst logs for analysis
    all_logs_df = session.sql("SELECT * FROM CORTEX_ANALYST_LOGS").to_pandas()
    
    if not all_logs_df.empty:
        print(f"📋 Analyzing {len(all_logs_df)} Cortex Analyst log records...\n")
        
        # Test verified query count analysis
        print("🔍 Testing verified_query_count()...")
        verified_stats = verified_query_count(all_logs_df)
        if not verified_stats.empty:
            print(f"✅ Query type analysis completed for {len(verified_stats)} entries")
            print("\n📈 Query Type Breakdown by Semantic Model:")
            display(verified_stats.head())
        
        # Test top verified queries
        print("\n⭐ Testing top_verified_queries()...")
        top_queries = top_verified_queries(all_logs_df)
        if not top_queries.empty:
            print(f"✅ Found {len(top_queries)} top verified queries")
            print("\n🏆 Most Frequent Verified Queries:")
            display(top_queries.head())
        else:
            print("ℹ️  No verified queries found in the data")
        
        # Test slowest queries analysis (sample)
        print("\n🐌 Testing slowest_queries() - Top 3...")
        slow_queries = slowest_queries(all_logs_df, number=3)
        if not slow_queries.empty:
            print(f"✅ Identified slowest queries")
            display(slow_queries[['SEMANTIC_MODEL_NAME', 'USER_QUESTION', 'LATENCY_SECONDS']])
        
        # Test latency summary
        print("\n⚡ Testing latency_summary_by_semantic_model()...")
        latency_stats = latency_summary_by_semantic_model(all_logs_df)
        if not latency_stats.empty:
            print(f"✅ Latency statistics calculated for {len(latency_stats)} semantic models")
            print("\n📊 Latency Summary by Semantic Model:")
            display(latency_stats)
        
        # Test user activity analysis
        print("\n👥 Testing user_activity_by_semantic_model()...")
        user_activity = user_activity_by_semantic_model(all_logs_df)
        if not user_activity.empty:
            print(f"✅ User activity analyzed for {len(user_activity)} entries")
            print("\n👤 User Activity Summary:")
            display(user_activity.head(3))
        
        print("\n🎉 All analysis function tests completed successfully!")
        
    else:
        print("⚠️  No Cortex Analyst log data found for analysis")
        
except Exception as e:
    print(f"❌ Error testing analysis functions: {e}")
    import traceback
    traceback.print_exc()


🧪 Testing IMPROVED create_cortex_analyst_query_history function...
📊 Using 117 Cortex Analyst log records
🔗 Running improved SQL matching with multi-layered approach...
🔗 Join Debug Info:
   Query History records (after normalization): 67
   Cortex Analyst records (after normalization): 93
   Unique normalized SQLs in Query History: 49
   Unique normalized SQLs in Cortex Logs: 73
   Overlapping normalized SQLs: 0
   🔄 No exact matches found, trying fuzzy matching on SQL fragments...
   ✅ Found 5 matches using fuzzy SQL fragment matching!

🎉 SUCCESS! Found 5 matching records!

📈 Join Success Metrics:
   Original Cortex logs: 117
   Matched records: 5
   Match rate: 4.3%

💰 Sample joined data showing cost analysis:


Unnamed: 0,SEMANTIC_MODEL_NAME,USER_QUESTION,LATENCY_MS,CORTEX_ANALYST_CREDITS,CREDITS_ATTRIBUTED_COMPUTE,TOTAL_CREDITS_WH_AND_CA
0,@DOC_AI_QS_DB.DOC_AI_SCHEMA.SEMANTIC_MODELS/co...,how many contracts have a renewal clause,5590.0,0.067,0.001323,0.068323
1,@DOC_AI_QS_DB.DOC_AI_SCHEMA.SEMANTIC_MODELS/co...,which contracts have renewal terms?,1633.0,0.067,0.001323,0.068323
2,@DOC_AI_QS_DB.DOC_AI_SCHEMA.SEMANTIC_MODELS/co...,how many contracts have a renewal clause,2467.0,0.067,2.4e-05,0.067024
3,@DOC_AI_QS_DB.DOC_AI_SCHEMA.SEMANTIC_MODELS/co...,how many contracts have a renewal clause,5590.0,0.067,0.00177,0.06877
4,@DOC_AI_QS_DB.DOC_AI_SCHEMA.SEMANTIC_MODELS/co...,which contracts have renewal terms?,1633.0,0.067,0.00177,0.06877



📊 Testing cost analysis with joined data...
✅ Total cost analysis completed for 1 semantic models

💸 Total Costs by Semantic Model:


Unnamed: 0,SEMANTIC_MODEL_NAME,TOTAL_CREDITS_WH_AND_CA
0,@DOC_AI_QS_DB.DOC_AI_SCHEMA.SEMANTIC_MODELS/co...,0.3412


✅ Cost breakdown analysis completed for 1 semantic models

📋 Detailed Cost Breakdown:


Unnamed: 0,SEMANTIC_MODEL_NAME,cortex_analyst_credits,warehouse_credits,total_credits,query_count,percentage_of_total_cost,avg_credits_per_query
0,@DOC_AI_QS_DB.DOC_AI_SCHEMA.SEMANTIC_MODELS/co...,0.335,0.0062,0.3412,5,100.0,0.0682



🎯 The improved function successfully joined Cortex Analyst logs with query history!
   This enables full cost analysis combining both Cortex Analyst credits and warehouse costs!


📊 Testing additional analysis functions...
📋 Analyzing 117 Cortex Analyst log records...

🔍 Testing verified_query_count()...
✅ Query type analysis completed for 6 entries

📈 Query Type Breakdown by Semantic Model:


Unnamed: 0,SEMANTIC_MODEL_NAME,QUERY_TYPE,request_count,percentage
0,@CAR_GURU_DB.PUBLIC.SEMANTIC_MODEL/product_ana...,Non-Verified Query,31,100.0
1,@CME.KAITLYN_ADS_IMAGE_ANALYTICS.SEMANTIC_MODE...,Non-Verified Query,36,100.0
2,@CUBE_TESTING.PUBLIC.ANALYST/sp500_semantic_mo...,Non-Verified Query,37,100.0
3,@DOC_AI_QS_DB.DOC_AI_SCHEMA.SEMANTIC_MODELS/co...,Non-Verified Query,5,83.33
4,@DOC_AI_QS_DB.DOC_AI_SCHEMA.SEMANTIC_MODELS/co...,Verified Query,1,16.67



⭐ Testing top_verified_queries()...
✅ Found 1 top verified queries

🏆 Most Frequent Verified Queries:


Unnamed: 0,SEMANTIC_MODEL_NAME,VERIFIED_QUERY_NAME,VERIFIED_QUERY_QUESTION,frequency,percentage_of_verified_queries
0,@DOC_AI_QS_DB.DOC_AI_SCHEMA.SEMANTIC_MODELS/co...,number of contracts with renewal options,how many contracts have renewal options?,1,100.0



🐌 Testing slowest_queries()...
✅ Identified 20 slowest queries

⏱️  Slowest Queries (Top 5):


Unnamed: 0,SEMANTIC_MODEL_NAME,USER_QUESTION,LATENCY_SECONDS,ORCHESTRATION_PATH,QUESTION_CATEGORY
0,@CAR_GURU_DB.PUBLIC.SEMANTIC_MODEL/product_ana...,Show me sample data for ClickRecommendation ev...,23.94,regular_sqlgen,UNAMBIGUOUS_SQL
1,@CAR_GURU_DB.PUBLIC.SEMANTIC_MODEL/product_ana...,What is the distribution of customers across d...,16.7,regular_sqlgen,UNAMBIGUOUS_SQL
2,@CAR_GURU_DB.PUBLIC.SEMANTIC_MODEL/product_ana...,Count the number of ClickRecommendation events...,12.37,regular_sqlgen,UNAMBIGUOUS_SQL
3,@CAR_GURU_DB.PUBLIC.SEMANTIC_MODEL/product_ana...,What is the click-through rate of recommendati...,11.99,regular_sqlgen,UNAMBIGUOUS_SQL
4,@CAR_GURU_DB.PUBLIC.SEMANTIC_MODEL/product_ana...,Show me all the different event types availabl...,8.78,regular_sqlgen,UNAMBIGUOUS_SQL
5,@CME.KAITLYN_ADS_IMAGE_ANALYTICS.SEMANTIC_MODE...,2024 performance marketing spend summary and t...,31.26,regular_sqlgen,UNAMBIGUOUS_SQL
6,@CME.KAITLYN_ADS_IMAGE_ANALYTICS.SEMANTIC_MODE...,total 2024 marketing spend compared to previou...,21.92,regular_sqlgen,UNAMBIGUOUS_SQL
7,@CME.KAITLYN_ADS_IMAGE_ANALYTICS.SEMANTIC_MODE...,2024 performance marketing spend summary and t...,19.0,regular_sqlgen,UNAMBIGUOUS_SQL
8,@CME.KAITLYN_ADS_IMAGE_ANALYTICS.SEMANTIC_MODE...,2024 performance marketing spend and trends,14.85,regular_sqlgen,UNAMBIGUOUS_SQL
9,@CME.KAITLYN_ADS_IMAGE_ANALYTICS.SEMANTIC_MODE...,What was the total performance marketing spend...,13.77,regular_sqlgen,UNAMBIGUOUS_SQL



⚡ Testing latency_summary_by_semantic_model()...
✅ Latency statistics calculated for 4 semantic models

📊 Latency Summary by Semantic Model:


Unnamed: 0,SEMANTIC_MODEL_NAME,query_count,avg_latency_seconds,median_latency_seconds,min_latency_seconds,max_latency_seconds
0,@CAR_GURU_DB.PUBLIC.SEMANTIC_MODEL/product_ana...,26,6.24,4.27,2.5,23.94
1,@CME.KAITLYN_ADS_IMAGE_ANALYTICS.SEMANTIC_MODE...,36,8.45,6.82,2.9,31.26
2,@DOC_AI_QS_DB.DOC_AI_SCHEMA.SEMANTIC_MODELS/co...,6,4.47,3.44,1.63,10.26
3,@E2E_SNOW_MLOPS_DB.STREAMLIT.SEMANTIC/mortgage...,7,3.66,4.19,1.9,4.82



👥 Testing user_activity_by_semantic_model()...
✅ User activity analyzed for 5 entries

👤 User Activity Summary:


Unnamed: 0,USER_NAME,SEMANTIC_MODEL_NAME,queries_count,avg_latency_ms,total_credits,percentage_of_user_queries
0,KAITLYN,@CUBE_TESTING.PUBLIC.ANALYST/sp500_semantic_mo...,37,,2.48,31.62
1,KAITLYN,@CME.KAITLYN_ADS_IMAGE_ANALYTICS.SEMANTIC_MODE...,36,8454.42,2.41,30.77
2,KAITLYN,@CAR_GURU_DB.PUBLIC.SEMANTIC_MODEL/product_ana...,31,6235.5,2.08,26.5
3,KAITLYN,@E2E_SNOW_MLOPS_DB.STREAMLIT.SEMANTIC/mortgage...,7,3663.57,0.47,5.98
4,KAITLYN,@DOC_AI_QS_DB.DOC_AI_SCHEMA.SEMANTIC_MODELS/co...,6,4473.83,0.4,5.13



🎉 All analysis function tests completed successfully!


## 4. Development Summary

### ✅ What This Notebook Tests:

1. **Local Session Management**: Connects using `config.env` for local development
2. **Semantic Model Discovery**: Finds and analyzes semantic model configurations  
3. **Data Population**: Populates Cortex Analyst logs from your semantic models
4. **Cost Analysis**: Tests AI services credit calculations and joins with query history
5. **Advanced Analytics**: Query analysis, latency stats, and user activity functions

### 🚀 Next Steps:

- **Run Streamlit App**: Use `streamlit run streamlit_app.py` to see the full dashboard
- **Deploy to Snowflake**: Upload to Snowsight for production use
- **Set up Monitoring**: Use the budget alerting functions for automated cost monitoring
- **Customize Analysis**: Modify functions in `utils.py` for your specific needs

### 💡 Development Tips:

- **Module Changes**: When modifying `utils.py`, restart the notebook kernel to reload changes
- **Test Incrementally**: Use individual cells to test specific functions
- **Data Quality**: Use the verification cells to ensure proper data population
- **Performance**: Use latency analysis to identify optimization opportunities

### 🔧 Local Development Features:

- **Automatic Session Detection**: Uses Snowflake session when available, falls back to `config.env`
- **Smart SQL Matching**: Advanced normalization for joining Cortex logs with query history
- **Comprehensive Testing**: Full test suite for all toolkit functions
