# 00 ‚Äî Master Orchestrator: Bronze ‚Üí Silver Processing

Main orchestration notebook for processing parquet files through Bronze and Silver layers.

## Architecture Overview

```
Parquet Files (Files/{source}/{run_ts}/)
    ‚Üì
Bronze Layer (append with run_ts for CDC)
    ‚Üì
Silver Layer (CDC merge: INSERT/UPDATE/DELETE)
    ‚Üì
Watermark Update (incremental tables only)
```

## Process Flow

1. **Load Configuration** (DAG, enabled tables, retry filter)
2. **Check Incremental** ‚Üí Run watermark merge if needed
3. **Bronze Processing** ‚Üí Parallel table loading (10 workers)
4. **Bronze Logging** ‚Üí Batch log all results
5. **Silver Processing** ‚Üí Parallel CDC merge (tables with business_keys)
6. **Silver Logging** ‚Üí Batch log all results
7. **Summary Statistics** ‚Üí Performance metrics, efficiency

## Key Features

- **Parallel Processing**: ThreadPoolExecutor for 5-10x speedup
- **Idempotency**: Check logs before reprocessing
- **Retry Support**: Process only specific tables
- **Error Resilience**: Continue on failure, comprehensive logging
- **Performance Tracking**: Efficiency metrics (theoretical vs actual time)

## Parameters

- `source`: Source system name (e.g., "vizier")
- `run_ts`: Run timestamp (e.g., "20251105T142752505")
- `dag_path`: Path to DAG configuration JSON
- `retry_tables`: Optional list of tables to retry
- `force_reload`: Ignore log and reload all
- `max_workers`: Parallel workers (default: 10)
- `debug`: Enable debug output

In [20]:
# Parameters (Papermill compatible)
source = "anva_meeus"                               # Source system name
run_ts = "20251001T183103260"                       # Run timestamp
dag_path = "config/dag_anva_meeus_week.json"        # DAG configuration path
retry_tables = None                                 # Optional: list of table names to retry
force_reload = True                                 # If True, ignore logs and reload all
debug = True                                        # Enable debug output
log_to_console = True                               # Also stream logs to stdout/stderr
optimize_for = "throughput"                         # Worker profile optimization goal, choose throughput or efficiency


## [1] Setup and Imports

In [21]:
from datetime import datetime, timezone
from concurrent.futures import ThreadPoolExecutor, as_completed
from uuid import uuid4

from modules.logging_utils import configure_logging
import logging
from modules.worker_utils import choose_worker_profile_from_history

log_file = configure_logging(run_name="master_orchestrator", enable_console_logging=log_to_console)
logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)
logger.info("Logfile: %s", log_file)

logger.info("="*80)
logger.info("MASTER ORCHESTRATOR STARTING")
logger.info("="*80)
logger.info(f"Source: {source}")
logger.info(f"Run TS: {run_ts}")
logger.info(f"DAG: {dag_path}")
logger.info(f"Retry tables: {retry_tables}")
logger.info(f"Force reload: {force_reload}")
logger.info(f"Debug: {debug}")
logger.info("="*80)


2025-12-03 15:07:08,390 [INFO] - Logfile: /data/lakehouse/gh_b_avd/lh_gh_bronze/Files/notebook_outputs/logs/master_orchestrator_20251203_145516.log
2025-12-03 15:07:08,392 [INFO] - MASTER ORCHESTRATOR STARTING
2025-12-03 15:07:08,392 [INFO] - Source: anva_meeus
2025-12-03 15:07:08,393 [INFO] - Run TS: 20251001T183103260
2025-12-03 15:07:08,393 [INFO] - DAG: config/dag_anva_meeus_week.json
2025-12-03 15:07:08,393 [INFO] - Retry tables: None
2025-12-03 15:07:08,393 [INFO] - Force reload: True
2025-12-03 15:07:08,394 [INFO] - Debug: True


## [2] Load Utility Notebooks

In [None]:
# Import all required utilities from modules
from modules.config_utils import (
    load_dag, get_enabled_tables, get_tables_to_process,
    get_tables_by_load_mode, get_dag_metadata, summarize_dag,
    get_business_keys
)

from modules.logging_utils import (
    build_run_date,
    get_successful_tables,
    log_batch,
    log_summary
)

from modules.path_utils import get_base_path

# Import worker functions directly
from modules.bronze_processor import process_bronze_table
from modules.silver_processor import process_silver_cdc_merge

logger.info("‚úì Utility functions imported from modules")

In [22]:
from modules.spark_session import get_or_create_spark_session

spark = get_or_create_spark_session(
    app_name="DWH_Bronze_Silver_Processing",
    enable_hive=True
)

2025-12-03 15:07:08,421 [INFO] - ‚úì Using existing Spark session
2025-12-03 15:07:08,422 [INFO] -   Spark version: 3.5.5
2025-12-03 15:07:08,423 [INFO] -   Application ID: app-20251203145518-0867
2025-12-03 15:07:08,423 [INFO] -   Application name: DWH_Bronze_Silver_Processing


In [23]:
# Notebook 01 no longer needed - all functions imported from modules
logger.info("‚úì Logging utilities imported from modules (notebook 01 no longer needed)")

2025-12-03 15:07:08,448 [INFO] - ‚úì Logging utilities imported from modules (notebook 01 no longer needed)


In [24]:
# This cell is no longer needed - all functions are imported from modules
logger.info("‚úì All utilities imported from modules (notebook 02 no longer needed)")

2025-12-03 15:07:08,468 [INFO] - ‚úì All utilities imported from modules (notebook 02 no longer needed)


## [3] Load DAG Configuration

In [25]:
# Load and validate DAG
logger.info(f"\nüìã Loading DAG configuration...")

# Get base path for Files directory (environment-aware)
base_files_path = get_base_path(spark)
logger.info(f"  Base Files path: {base_files_path}")

# Load DAG (handles both absolute and relative paths)
dag = load_dag(dag_path, base_path=base_files_path)
logger.info(f"‚úì DAG loaded: {dag.get('source')}")

# Get metadata
dag_metadata = get_dag_metadata(dag)
base_files = dag_metadata['base_files']

logger.info(f"  Base files: {base_files}")

# Get tables to process
tables_to_process = get_tables_to_process(
    dag=dag,
    retry_tables=retry_tables,
    only_enabled=True
)

# Ensure schemas exist        
schemas = set()

for t in tables_to_process:
    delta_table = t.get("delta_table")
    delta_schema = t.get("delta_schema")

    if delta_table and "." in delta_table:
        # Vorm: schema.tabel in delta_table
        schema = delta_table.split(".")[0]
    else:
        # Anders: gebruik delta_schema of standaard 'bronze'
        schema = (delta_schema or "bronze")

    schemas.add(schema)

for schema in sorted(schemas):
    spark.sql(f"CREATE SCHEMA IF NOT EXISTS `{schema}`")

logger.info("Schemas ensured: %s", ", ".join(sorted(schemas)))

logger.info(f"\nüìä Tables to process: {len(tables_to_process)}")

# Show summary
dag_summary = summarize_dag(dag)
logger.info(f"  Total enabled: {dag_summary['enabled_tables']}")
logger.info(f"  Load modes: {dag_summary['load_mode_counts']}")

if not tables_to_process:
    logger.info("\n‚ö†Ô∏è  No tables to process. Exiting.")
    raise SystemExit(0)

2025-12-03 15:07:08,489 [INFO] - 
üìã Loading DAG configuration...
2025-12-03 15:07:09,483 [INFO] -   Base Files path: /data/lakehouse/gh_b_avd/lh_gh_bronze/Files
2025-12-03 15:07:09,486 [INFO] - ‚úì DAG loaded: anva_meeus
2025-12-03 15:07:09,486 [INFO] -   Base files: greenhouse_sources
2025-12-03 15:07:09,494 [INFO] - Schemas ensured: anva_meeus
2025-12-03 15:07:09,495 [INFO] - 
üìä Tables to process: 58
2025-12-03 15:07:09,495 [INFO] -   Total enabled: 58
2025-12-03 15:07:09,496 [INFO] -   Load modes: {'snapshot': 57, 'window': 1}


## [4] Generate Run ID

In [26]:
# Generate unique run ID
RUN_ID = f"{run_ts}_{uuid4().hex[:8]}"
run_date = build_run_date(run_ts)
logger.info(f"\nüÜî Run ID: {RUN_ID}")

2025-12-03 15:07:09,500 [INFO] - 
üÜî Run ID: 20251001T183103260_2dc7d8f3


## [5] Check for Incremental Tables (Watermark Merge)

If incremental tables are present, run watermark merge notebook.
This must happen BEFORE Bronze loading starts.

In [27]:
logger.info(f"\nüíß Checking for incremental tables...")

# Filter incremental tables
incremental_tables = get_tables_by_load_mode(tables_to_process, "incremental")

if len(incremental_tables) > 0:
    logger.info(f"  Found {len(incremental_tables)} incremental tables")
    logger.info(f"  Tables: {[t['name'] for t in incremental_tables[:5]]}")
    
    # Get watermarks path from DAG
    wm_configpath = dag_metadata.get('watermarks_path', 'config/watermarks.json')
    
    # Build watermark folder path (where extraction pipeline writes watermarks)
    wm_folder = f"runtime/{source}/{run_ts}/"
    
    logger.info(f"  Config: {wm_configpath}")
    logger.info(f"  Runtime folder: {wm_folder}")
    
    # Note: In Fabric, this would use mssparkutils.notebook.run()
    # For local testing, we skip watermark merge (not critical for Bronze/Silver testing)
    logger.info(f"\n  ‚ö†Ô∏è  Watermark merge would run here (11_bronze_watermark_merge.ipynb)")
    logger.info(f"     Skipping for now - watermarks managed by extraction pipeline")
else:
    logger.info(f"  ‚óØ No incremental tables - skipping watermark merge")

logger.info("="*80)

2025-12-03 15:07:09,528 [INFO] - 
üíß Checking for incremental tables...
2025-12-03 15:07:09,530 [INFO] -   ‚óØ No incremental tables - skipping watermark merge


## [6] Bronze Processing (Parallel)

Load all tables from parquet to Bronze Delta tables in parallel.

In [28]:
# Notebook 10 no longer needed - process_bronze_table imported from modules
logger.info("‚úì Bronze worker imported from modules (notebook 10 no longer needed)")

2025-12-03 15:07:09,554 [INFO] - ‚úì Bronze worker imported from modules (notebook 10 no longer needed)


In [29]:
logger.info(f"\nüîµ BRONZE: Loading parquet to Delta tables...")
logger.info(f"  Tables: {len(tables_to_process)}")

bronze_start = datetime.now(timezone.utc)
bronze_results = []

# Filter tables if not force_reload (check logs)
if not force_reload:
    logger.info(f"\n  üìã Checking logs for already processed tables...")
    
    # Get successfully processed tables from log
    processed_tables = get_successful_tables(spark, run_ts, layer="bronze")
    
    if processed_tables:
        logger.info(f"    Found {len(processed_tables)} already processed tables")
        
        # Filter out already processed
        tables_to_process_bronze = [
            t for t in tables_to_process 
            if t['name'] not in processed_tables
        ]

        logger.info(f"    Remaining: {len(tables_to_process_bronze)} tables")
    else:
        tables_to_process_bronze = tables_to_process
else:
    tables_to_process_bronze = tables_to_process
    logger.info(f"  ‚ö†Ô∏è  Force reload enabled - processing all tables")

if not tables_to_process_bronze:
    logger.info(f"\n  ‚úì All tables already processed for this run_ts")
else:
    logger.info(f"\n  üöÄ Processing {len(tables_to_process_bronze)} tables in parallel...\n")
    
    # Wrapper function for parallel execution
    def process_table_wrapper(table_def):
        """Wrapper to catch exceptions and always return a result."""
        try:
            return process_bronze_table(
                spark=spark,
                table_def=table_def,
                source_name=source,
                run_id=RUN_ID,
                run_ts=run_ts,
                run_date=run_date,
                base_files=base_files,
                debug=False  # Disable per-table debug in parallel mode
            )
        except Exception as e:
            # If worker throws unhandled exception, create error result
            return {
                "log_id": f"{source}:{table_def['name']}:{run_ts}:error",
                "run_id": RUN_ID,
                "run_date": run_date,
                "run_ts": run_ts,
                "source": source,
                "table_name": table_def['name'],
                "load_mode": table_def.get('load_mode'),
                "status": "FAILED",
                "rows_read": None,
                "rows_processed": None,
                "start_time": datetime.now(timezone.utc),
                "end_time": datetime.now(timezone.utc),
                "duration_seconds": 0,
                "error_message": f"Unhandled exception: {str(e)[:500]}",
                "parquet_path": None,
                "delta_table": None,
            }
    
    # Optimize voor throughput (snelheid)
    MAX_WORKERS = choose_worker_profile_from_history(
        spark=spark,
        source_name=source,
        summary_table="logs.bronze_run_summary",
        default_workers=10,
        min_workers=2,
        max_workers_cap=12,
        lookback_runs=5,
        optimize_for=optimize_for,  # Focus on rows/second
        debug=debug
    )
    # Cap on number of tables
    MAX_WORKERS = min(MAX_WORKERS, len(tables_to_process_bronze))
    logger.info(f"Using MAX_WORKERS={MAX_WORKERS} for bronze processing")


    # Parallel execution
    with ThreadPoolExecutor(max_workers=MAX_WORKERS) as executor:
        futures = {
            executor.submit(process_table_wrapper, table): table 
            for table in tables_to_process_bronze
        }
        
        completed = 0
        for future in as_completed(futures):
            result = future.result()
            bronze_results.append(result)
            completed += 1
            
            # Progress indicator
            status_icon = "‚úì" if result['status'] == 'SUCCESS' else "‚úó" if result['status'] == 'FAILED' else "‚óØ"
            # Kort foutfragment erbij (max 120 chars, 1 regel)
            error_snippet = (result.get("error_message") or "")[:120].replace("\n", " ")
            
            logger.info(
                f"[{completed}/{len(tables_to_process_bronze)}]"
                f"{status_icon} {result['table_name']:<30} {result['status']:<10} "
                f"{(result.get('rows_processed') or 0):>10,} rows {error_snippet}"
                )

bronze_end = datetime.now(timezone.utc)
bronze_duration = float((bronze_end - bronze_start).total_seconds())

logger.info(f"\n‚úì Bronze processing completed in {bronze_duration}s")

#sys.exit(0)

2025-12-03 15:07:09,574 [INFO] - 
üîµ BRONZE: Loading parquet to Delta tables...
2025-12-03 15:07:09,575 [INFO] -   Tables: 58
2025-12-03 15:07:09,576 [INFO] -   ‚ö†Ô∏è  Force reload enabled - processing all tables
2025-12-03 15:07:09,576 [INFO] - 
  üöÄ Processing 58 tables in parallel...

2025-12-03 15:07:10,374 [INFO] - [WORKER_OPTIMIZER] source=anva_meeus, median_rows=3,939,267, last_workers=8, target=8, new_workers=8, best_throughput=189093 throughput (rows/s)
2025-12-03 15:07:10,374 [INFO] - Using MAX_WORKERS=8 for bronze processing
25/12/03 15:07:13 WARN SessionState: METASTORE_FILTER_HOOK will be ignored, since hive.security.authorization.manager is set to instance of HiveAuthorizerFactory.
2025-12-03 15:07:14,284 [INFO] - [1/58]‚úì Dim_Agent                      SUCCESS         3,193 rows 
2025-12-03 15:07:14,286 [INFO] - [2/58]‚úì Dim_Collectiviteit             SUCCESS        10,212 rows 
2025-12-03 15:07:14,290 [INFO] - [3/58]‚úì Dim_DekkingVariabel            SUCCESS     

## [7] Bronze Logging and Summary

In [30]:
if bronze_results:
    logger.info(f"\nüìä Logging Bronze results...")
        
    # Calculate summary statistics
    success_count = sum(1 for r in bronze_results if r['status'] == 'SUCCESS')
    failed_count = sum(1 for r in bronze_results if r['status'] == 'FAILED')
    empty_count = sum(1 for r in bronze_results if r['status'] == 'EMPTY')
    skipped_count = sum(1 for r in bronze_results if r['status'] == 'SKIPPED')
    
    total_rows = sum(r.get('rows_processed', 0) or 0 for r in bronze_results)
    
    # Performance metrics
    sum_task_seconds = float(sum(r.get('duration_seconds', 0) or 0 for r in bronze_results))
    theoretical_min_sec = float(sum_task_seconds / MAX_WORKERS if MAX_WORKERS > 0 else sum_task_seconds)
    actual_time_sec = bronze_duration # float
    efficiency_pct = float((theoretical_min_sec / actual_time_sec * 100) if actual_time_sec > 0 else 0)
    
    # Failed tables list
    failed_tables = [r['table_name'] for r in bronze_results if r['status'] == 'FAILED']
    
    # Log summary
    bronze_summary = {
    "run_id": RUN_ID,
    "run_date": run_date,
    "run_ts": run_ts,
    "source": source,
    "total_tables": len(bronze_results),
    "tables_success": success_count,
    "tables_empty": empty_count,
    "tables_failed": failed_count,
    "tables_skipped": skipped_count,
    "total_rows": total_rows,
    "workers": MAX_WORKERS,
    "sum_task_seconds": sum_task_seconds,
    "theoretical_min_sec": theoretical_min_sec,
    "actual_time_sec": actual_time_sec,
    "efficiency_pct": efficiency_pct,
    "run_start": bronze_start,
    "run_end": bronze_end,
    "duration_seconds": bronze_duration,
    "error_message": None,
    "failed_tables": failed_tables,
    }

    run_log_id = log_summary(spark, bronze_summary, layer="bronze")

    log_batch(spark, records=bronze_results, layer="bronze", run_log_id=run_log_id)

    
    # Print summary
    logger.info(f"\n  Summary:")
    logger.info(f"    Success: {success_count}")
    logger.info(f"    Failed:  {failed_count}")
    logger.info(f"    Empty:   {empty_count}")
    logger.info(f"    Skipped: {skipped_count}")
    logger.info(f"    Total rows: {total_rows:,}")
    logger.info(f"    Efficiency: {efficiency_pct:.1f}%")
    
    if failed_tables:
        logger.info(f"\n  ‚ö†Ô∏è  Failed tables: {failed_tables}")
else:
    logger.info(f"\n  ‚ÑπÔ∏è  No Bronze results to log")

#sys.exit(0)

2025-12-03 15:07:35,375 [INFO] - 
üìä Logging Bronze results...
2025-12-03 15:07:37,398 [INFO] - ‚úì Logged Bronze summary to logs.bronze_run_summary
2025-12-03 15:07:37,944 [INFO] - ‚úì Logged 58 Bronze records to logs.bronze_processing_log
2025-12-03 15:07:37,944 [INFO] - 
  Summary:
2025-12-03 15:07:37,945 [INFO] -     Success: 49
2025-12-03 15:07:37,945 [INFO] -     Failed:  0
2025-12-03 15:07:37,945 [INFO] -     Empty:   9
2025-12-03 15:07:37,946 [INFO] -     Skipped: 0
2025-12-03 15:07:37,946 [INFO] -     Total rows: 3,939,267
2025-12-03 15:07:37,946 [INFO] -     Efficiency: 79.5%


In [31]:
#spark.sql("SHOW TABLES IN logs").show(truncate=False)
#spark.table("logs.bronze_processing_log").printSchema()
#spark.table("logs.bronze_run_summary").printSchema()

#spark.sql("drop table if exists logs.silver_run_summary").show()
#spark.sql("select * from logs.bronze_run_summary order by run_end desc limit 5").show(truncate=False)

#spark.sql("drop table if exists logs.bronze_processing_log").show()
#spark.sql("drop table if exists logs.bronze_run_summary").show()

# spark.sql("drop table if exists logs.silver_processing_log").show()
# spark.sql("drop table if exists logs.silver_run_summary").show()
# spark.table("logs.bronze_run_summary") \
#       .orderBy("run_end", "source") \
#       .show(20, truncate=False)

# spark.table("logs.bronze_processing_log") \
#       .orderBy("run_ts", "table_name") \
#       .show(200, truncate=False)


## [8] Silver Processing (Parallel CDC Merge)

Process tables that have business_keys defined for CDC merge.

In [32]:
# Notebook 20 no longer needed - process_silver_cdc_merge imported from modules
logger.info("‚úì Silver CDC merge worker imported from modules (notebook 20 no longer needed)")

2025-12-03 15:07:37,990 [INFO] - ‚úì Silver CDC merge worker imported from modules (notebook 20 no longer needed)


In [33]:
logger.info(f"\nüî∑ SILVER: CDC merge from Bronze...")

# Filter tables for Silver processing:
# 1. Must have business_keys defined
# 2. Must have been successfully loaded to Bronze

successful_bronze_tables = [r['table_name'] for r in bronze_results if r['status'] == 'SUCCESS']

tables_for_silver = [
    t for t in tables_to_process 
    if t.get('business_keys') and t['name'] in successful_bronze_tables
]

logger.info(f"  Tables with business_keys: {len([t for t in tables_to_process if t.get('business_keys')])}")
logger.info(f"  Successful Bronze loads: {len(successful_bronze_tables)}")
logger.info(f"  Tables to process in Silver: {len(tables_for_silver)}")

silver_results = []

if not tables_for_silver:
    logger.info(f"\n  ‚ÑπÔ∏è  No tables to process in Silver")
else:
    silver_start = datetime.now(timezone.utc)
    
    logger.info(f"\n  üöÄ Processing {len(tables_for_silver)} tables in parallel...\n")
    
    # Wrapper function for parallel execution
    def process_silver_wrapper(table_def):
        """Wrapper to catch exceptions and always return a result."""
        try:
            return process_silver_cdc_merge(
                spark=spark,
                table_def=table_def,
                source_name=source,
                run_id=RUN_ID,
                run_ts=run_ts,
                debug=False
            )
        except Exception as e:
            return {
                "log_id": f"{source}:{table_def['name']}:{run_ts}:silver:error",
                "run_id": RUN_ID,
                "run_ts": run_ts,
                "source": source,
                "table_name": table_def['name'],
                "load_mode": table_def.get('load_mode'),
                "status": "FAILED",
                "rows_inserted": None,
                "rows_updated": None,
                "rows_deleted": None,
                "rows_unchanged": None,
                "total_silver_rows": None,
                "bronze_rows": None,
                "bronze_table": None,
                "silver_table": None,
                "start_time": datetime.now(timezone.utc),
                "end_time": datetime.now(timezone.utc),
                "duration_seconds": 0,
                "error_message": f"Unhandled exception: {str(e)[:500]}",
            }
    
    # Parallel execution
    with ThreadPoolExecutor(max_workers=MAX_WORKERS) as executor:
        futures = {
            executor.submit(process_silver_wrapper, table): table 
            for table in tables_for_silver
        }
        
        completed = 0
        for future in as_completed(futures):
            result = future.result()
            silver_results.append(result)
            completed += 1
            
            status_icon = "‚úì" if result['status'] == 'SUCCESS' else "‚úó"
            deletes = result.get('rows_deleted', 0) or 0
            delete_info = f" ({deletes} deleted)" if deletes > 0 else ""
            logger.info(f"    [{completed}/{len(tables_for_silver)}] {status_icon} {result['table_name']:<30} {result['status']:<10}{delete_info}")
            
    
    silver_end = datetime.now(timezone.utc)
    silver_duration = int((silver_end - silver_start).total_seconds())
    
    logger.info(f"\n‚úì Silver processing completed in {silver_duration}s")
    #sys.exit(0)

2025-12-03 15:07:38,012 [INFO] - 
üî∑ SILVER: CDC merge from Bronze...
2025-12-03 15:07:38,012 [INFO] -   Tables with business_keys: 0
2025-12-03 15:07:38,013 [INFO] -   Successful Bronze loads: 49
2025-12-03 15:07:38,013 [INFO] -   Tables to process in Silver: 0
2025-12-03 15:07:38,013 [INFO] - 
  ‚ÑπÔ∏è  No tables to process in Silver


## [9] Silver Logging and Summary

In [34]:
if silver_results:
    logger.info(f"\nüìä Logging Silver results...")
    
    # Batch log
    log_batch(spark, records=silver_results, layer="silver")
    
    # Calculate summary
    success_count = sum(1 for r in silver_results if r['status'] == 'SUCCESS')
    failed_count = sum(1 for r in silver_results if r['status'] == 'FAILED')
    skipped_count = sum(1 for r in silver_results if r['status'] == 'SKIPPED')
    
    total_inserts = sum(r.get('rows_inserted', 0) or 0 for r in silver_results)
    total_updates = sum(r.get('rows_updated', 0) or 0 for r in silver_results)
    total_deletes = sum(r.get('rows_deleted', 0) or 0 for r in silver_results)
    total_unchanged = sum(r.get('rows_unchanged', 0) or 0 for r in silver_results)
    
    failed_tables = [r['table_name'] for r in silver_results if r['status'] == 'FAILED']
    
    # Log summary
    silver_summary = {
        "run_id": RUN_ID,
        "source": source,
        "run_ts": run_ts,
        "run_start": silver_start,
        "run_end": silver_end,
        "duration_seconds": silver_duration,
        "total_tables": len(silver_results),
        "tables_success": success_count,
        "tables_failed": failed_count,
        "tables_skipped": skipped_count,
        "total_inserts": total_inserts,
        "total_updates": total_updates,
        "total_deletes": total_deletes,
        "total_unchanged": total_unchanged,
        "failed_tables": failed_tables,
    }
    
    log_summary(spark, summary=silver_summary, layer="silver")
    
    # Print summary
    logger.info(f"\n  Summary:")
    logger.info(f"    Success: {success_count}")
    logger.info(f"    Failed:  {failed_count}")
    logger.info(f"    Skipped: {skipped_count}")
    if total_inserts or total_updates or total_deletes:
        logger.info(f"    CDC: +{total_inserts or 0} ~{total_updates or 0} -{total_deletes}")
    
    if failed_tables:
        logger.info(f"\n  ‚ö†Ô∏è  Failed tables: {failed_tables}")
else:
    logger.info(f"\n  ‚ÑπÔ∏è  No Silver results to log")

2025-12-03 15:07:38,036 [INFO] - 
  ‚ÑπÔ∏è  No Silver results to log


## [10] Final Summary

In [35]:
total_end = datetime.now(timezone.utc)
total_duration = int((total_end - bronze_start).total_seconds())

logger.info("\n" + "="*80)
logger.info("ORCHESTRATOR SUMMARY")
logger.info("="*80)
logger.info(f"Run ID: {RUN_ID}")
logger.info(f"Source: {source}")
logger.info(f"Run TS: {run_ts}")
logger.info(f"\nTiming:")
logger.info(f"  Bronze: {bronze_duration}s")
if silver_results:
    logger.info(f"  Silver: {silver_duration}s")
logger.info(f"  Total:  {total_duration}s")

logger.info(f"\nBronze Results:")
if bronze_results:
    bronze_success = sum(1 for r in bronze_results if r['status'] == 'SUCCESS')
    bronze_failed = sum(1 for r in bronze_results if r['status'] == 'FAILED')
    logger.info(f"  ‚úì Success: {bronze_success}/{len(bronze_results)}")
    if bronze_failed > 0:
        logger.info(f"  ‚úó Failed:  {bronze_failed}")
else:
    logger.info(f"  (No processing)")

logger.info(f"\nSilver Results:")
if silver_results:
    silver_success = sum(1 for r in silver_results if r['status'] == 'SUCCESS')
    silver_failed = sum(1 for r in silver_results if r['status'] == 'FAILED')
    logger.info(f"  ‚úì Success: {silver_success}/{len(silver_results)}")
    if silver_failed > 0:
        logger.info(f"  ‚úó Failed:  {silver_failed}")
else:
    logger.info(f"  (No processing)")

# Overall status
if bronze_results:
    all_bronze_ok = all(r['status'] in ('SUCCESS', 'EMPTY', 'SKIPPED') for r in bronze_results)
else:
    all_bronze_ok = True

if silver_results:
    all_silver_ok = all(r['status'] in ('SUCCESS', 'SKIPPED') for r in silver_results)
else:
    all_silver_ok = True

overall_status = "SUCCESS" if (all_bronze_ok and all_silver_ok) else "PARTIAL" if bronze_results or silver_results else "NO_WORK"

logger.info(f"\nOverall Status: {overall_status}")
logger.info("="*80)

if overall_status != "SUCCESS":
    logger.info(f"\n‚ö†Ô∏è  Some tables failed. Check logs for details.")
    logger.info(f"   Use retry_tables parameter to retry specific tables.")
else:
    logger.info(f"\n‚úì All processing completed successfully!")

2025-12-03 15:07:38,059 [INFO] - 
2025-12-03 15:07:38,060 [INFO] - ORCHESTRATOR SUMMARY
2025-12-03 15:07:38,060 [INFO] - Run ID: 20251001T183103260_2dc7d8f3
2025-12-03 15:07:38,061 [INFO] - Source: anva_meeus
2025-12-03 15:07:38,061 [INFO] - Run TS: 20251001T183103260
2025-12-03 15:07:38,061 [INFO] - 
Timing:
2025-12-03 15:07:38,062 [INFO] -   Bronze: 25.792331s
2025-12-03 15:07:38,062 [INFO] -   Total:  28s
2025-12-03 15:07:38,063 [INFO] - 
Bronze Results:
2025-12-03 15:07:38,063 [INFO] -   ‚úì Success: 49/58
2025-12-03 15:07:38,063 [INFO] - 
Silver Results:
2025-12-03 15:07:38,064 [INFO] -   (No processing)
2025-12-03 15:07:38,064 [INFO] - 
Overall Status: SUCCESS
2025-12-03 15:07:38,065 [INFO] - 
‚úì All processing completed successfully!
