# Phase 1: Best Configuration Selection (Local, Google Colab & Kaggle)

This notebook automates the selection of the best model configuration from MLflow
based on metrics and benchmarking results, then performs final training and model conversion.


## Workflow

**Prerequisites**: Run `01_orchestrate_training_colab.ipynb` first to:
- Train models via HPO
- Run benchmarking on best trials (using `evaluation.benchmarking.benchmark_best_trials`)

Then this notebook:

1. **Best Model Selection**: Query MLflow benchmark runs, join to training runs via grouping tags (`code.study_key_hash`, `code.trial_key_hash`), select best using normalized composite scoring
2. **Artifact Acquisition**: Download the best model's checkpoint using fallback strategy (local disk ‚Üí drive restore ‚Üí MLflow download)
3. **Final Training**: Optionally retrain with best config on full dataset (if not already final training)
4. **Model Conversion**: Convert the final model to ONNX format using canonical path structure


## Important

- This notebook **executes on Local, Google Colab, or Kaggle** (not on Azure ML compute)
- Requires MLflow tracking to be set up (Azure ML workspace or local SQLite)
- All computation happens on the platform's GPU (if available) or CPU
- **Storage & Persistence**:
  - **Local**: Outputs saved to `outputs/` directory in repository root
  - **Google Colab**: Checkpoints are automatically saved to Google Drive for persistence across sessions
  - **Kaggle**: Outputs in `/kaggle/working/` are automatically persisted - no manual backup needed
- The notebook must be **re-runnable end-to-end**
- Uses the dataset path specified in the data config (from `config/data/*.yaml`), typically pointing to a local folder included in the repository
- **Session Management**:
  - **Local**: No session limits, outputs persist in repository
  - **Colab**: Sessions timeout after 12-24 hours (depending on Colab plan). Checkpoints are saved to Drive automatically.
  - **Kaggle**: Sessions have time limits based on your plan. All outputs are automatically saved.


## Step 1: Environment Detection

The notebook automatically detects the execution environment (local, Google Colab, or Kaggle) and adapts its behavior accordingly.


In [8]:
import os
from pathlib import Path
# Detect execution environment
IN_COLAB = "COLAB_GPU" in os.environ or "COLAB_TPU" in os.environ
IN_KAGGLE = "KAGGLE_KERNEL_RUN_TYPE" in os.environ
IS_LOCAL = not IN_COLAB and not IN_KAGGLE
# Set platform-specific constants
if IN_COLAB:
    PLATFORM = "colab"
    BASE_DIR = Path("/content")
    BACKUP_ENABLED = True
elif IN_KAGGLE:
    PLATFORM = "kaggle"
    BASE_DIR = Path("/kaggle/working")
    BACKUP_ENABLED = False
else:
    PLATFORM = "local"
    BASE_DIR = None
    BACKUP_ENABLED = False
print(f"‚úì Detected environment: {PLATFORM.upper()}")
print(f"Platform: {PLATFORM}")
print(
    f"Base directory: {BASE_DIR if BASE_DIR else 'Current working directory'}")
print(f"Backup enabled: {BACKUP_ENABLED}")


‚úì Detected environment: LOCAL
Platform: local
Base directory: Current working directory
Backup enabled: False


### Install Required Packages

Install required packages based on the execution environment.


In [9]:
# Install required packages
if IS_LOCAL:
    print("For local environment, please:")
    print("1. Create conda environment: conda env create -f config/environment/conda.yaml")
    print("2. Activate: conda activate resume-ner-training")
    print("3. Restart kernel after activation")
    print("\nIf you've already done this, you can continue to the next cell.")
    print("\nInstalling Azure ML SDK (required for imports)...")
    # Install Azure ML packages even for local (in case conda env not activated)
    %pip install "azure-ai-ml>=1.0.0" --quiet
    %pip install "azure-identity>=1.12.0" --quiet
    %pip install azureml-defaults --quiet
    %pip install azureml-mlflow --quiet
else:
    # Core ML libraries
    %pip install "transformers>=4.35.0,<5.0.0" --quiet
    %pip install "safetensors>=0.4.0" --quiet
    %pip install "datasets>=2.12.0" --quiet

    # ML utilities
    %pip install "numpy>=1.24.0,<2.0.0" --quiet
    %pip install "pandas>=2.0.0" --quiet
    %pip install "scikit-learn>=1.3.0" --quiet

    # Utilities
    %pip install "pyyaml>=6.0" --quiet
    %pip install "tqdm>=4.65.0" --quiet
    %pip install "seqeval>=1.2.2" --quiet
    %pip install "sentencepiece>=0.1.99" --quiet

    # Experiment tracking
    %pip install mlflow --quiet
    %pip install optuna --quiet

    # Azure ML SDK (required for orchestration imports)
    %pip install "azure-ai-ml>=1.0.0" --quiet
    %pip install "azure-identity>=1.12.0" --quiet
    %pip install azureml-defaults --quiet
    %pip install azureml-mlflow --quiet

    # ONNX support
    %pip install onnxruntime --quiet
    %pip install "onnx>=1.16.0" --quiet
    %pip install "onnxscript>=0.1.0" --quiet

    print("‚úì All dependencies installed")


For local environment, please:
1. Create conda environment: conda env create -f config/environment/conda.yaml
2. Activate: conda activate resume-ner-training
3. Restart kernel after activation

If you've already done this, you can continue to the next cell.

Installing Azure ML SDK (required for imports)...
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


## Step 2: Repository Setup

**Note**: Repository setup is only needed for Colab/Kaggle environments. Local environments should already have the repository cloned.


In [10]:
# Repository setup - only needed for Colab/Kaggle
if not IS_LOCAL:
    if IN_KAGGLE:
        !git clone -b gg_final_training_2 https://github.com/longdang193/resume-ner-azureml.git /kaggle/working/resume-ner-azureml
    elif IN_COLAB:
        !git clone -b gg_final_training_2 https://github.com/longdang193/resume-ner-azureml.git /content/resume-ner-azureml
else:
    print("‚úì Local environment detected - detecting repository root...")

# Set up paths
if not IS_LOCAL:
    ROOT_DIR = BASE_DIR / "resume-ner-azureml"
else:
    # For local, detect repo root by searching for config/ and src/ directories
    # Start from current working directory and search up
    current_dir = Path.cwd()
    ROOT_DIR = None
    
    # Check current directory first
    if (current_dir / "config").exists() and (current_dir / "src").exists():
        ROOT_DIR = current_dir
    else:
        # Search up the directory tree
        for parent in current_dir.parents:
            if (parent / "config").exists() and (parent / "src").exists():
                ROOT_DIR = parent
                break
    
    if ROOT_DIR is None:
        raise ValueError(
            f"Could not find repository root. Searched from: {current_dir}\n"
            "Please ensure you're running from within the repository or a subdirectory."
        )

CONFIG_DIR = ROOT_DIR / "config"
SRC_DIR = ROOT_DIR / "src"

# Add src to path
import sys
if str(SRC_DIR) not in sys.path:
    sys.path.insert(0, str(SRC_DIR))

print(f"‚úì Repository: {ROOT_DIR} (config={CONFIG_DIR.name}, src={SRC_DIR.name})")

# Verify repository structure
required_dirs = [CONFIG_DIR, SRC_DIR]
for dir_path in required_dirs:
    if not dir_path.exists():
        raise ValueError(f"Required directory not found: {dir_path}")
print("‚úì Repository structure verified")


‚úì Local environment detected - detecting repository root...
‚úì Repository: /workspaces/resume-ner-azureml (config=config, src=src)
‚úì Repository structure verified


## Step 3: Load Configuration

Load experiment configuration and define experiment naming convention.


In [11]:
from infrastructure.config.loader import load_experiment_config
from common.constants import EXPERIMENT_NAME
from common.shared.yaml_utils import load_yaml
    # Note: Still in orchestration.jobs.tracking for now
from orchestration.jobs.tracking.naming.tags_registry import load_tags_registry

# Load experiment config
experiment_config = load_experiment_config(CONFIG_DIR, EXPERIMENT_NAME)

# Load best model selection configs
tags_config = load_tags_registry(CONFIG_DIR)
selection_config = load_yaml(CONFIG_DIR / "best_model_selection.yaml")
conversion_config = load_yaml(CONFIG_DIR / "conversion.yaml")
acquisition_config = load_yaml(CONFIG_DIR / "artifact_acquisition.yaml")

print(f"‚úì Loaded configs: experiment={experiment_config.name}, tags, selection, conversion, acquisition")

# Define experiment names (discovery happens after MLflow setup in Cell 4)
experiment_name = experiment_config.name
benchmark_experiment_name = f"{experiment_name}-benchmark"
training_experiment_name = f"{experiment_name}-training"  # For final training runs
conversion_experiment_name = f"{experiment_name}-conversion"

print(f"‚úì Experiment names: benchmark={benchmark_experiment_name}, training={training_experiment_name}, conversion={conversion_experiment_name}")


‚úì Loaded configs: experiment=resume_ner_baseline, tags, selection, conversion, acquisition
‚úì Experiment names: benchmark=resume_ner_baseline-benchmark, training=resume_ner_baseline-training, conversion=resume_ner_baseline-conversion


## Step 4: Setup MLflow

Setup MLflow tracking with fallback to local if Azure ML is unavailable.


In [12]:
# Check if azureml.mlflow is available
try:
    import azureml.mlflow  # noqa: F401
    print("‚úì azureml.mlflow is available - Azure ML tracking will be used if configured")
except ImportError:
    print("‚ö† azureml.mlflow is not available - will fallback to local SQLite tracking")
    print("  To use Azure ML tracking, install: pip install azureml-mlflow")
    print("  Then restart the kernel and re-run this cell")

from common.shared.mlflow_setup import setup_mlflow_from_config
import mlflow

# Setup MLflow tracking (use training experiment for setup - actual queries use discovered experiments)
tracking_uri = setup_mlflow_from_config(
    experiment_name=training_experiment_name,
    config_dir=CONFIG_DIR,
    fallback_to_local=True,
)

print(f"‚úì MLflow tracking URI: {tracking_uri}")
print(f"‚úì MLflow experiment: {training_experiment_name}")

# Discover HPO and benchmark experiments from MLflow (after setup)
from mlflow.tracking import MlflowClient

client = MlflowClient()
all_experiments = client.search_experiments()

# Find HPO experiments (format: {experiment_name}-hpo-{backbone})
hpo_experiments = {}
for exp in all_experiments:
    if exp.name.startswith(f"{experiment_name}-hpo-"):
        backbone = exp.name.replace(f"{experiment_name}-hpo-", "")
        hpo_experiments[backbone] = {
            "name": exp.name,
            "id": exp.experiment_id
        }

# Find benchmark experiment
benchmark_experiment = None
for exp in all_experiments:
    if exp.name == benchmark_experiment_name:
        benchmark_experiment = {
            "name": exp.name,
            "id": exp.experiment_id
        }
        break

hpo_backbones = ", ".join(hpo_experiments.keys())
print(f"‚úì Experiments: {len(hpo_experiments)} HPO ({hpo_backbones}), benchmark={'found' if benchmark_experiment else 'not found'}, training={training_experiment_name}, conversion={conversion_experiment_name}")


2026-01-13 16:54:48,248 - common.shared.mlflow_setup - INFO - Azure ML enabled in config, attempting to connect...
2026-01-13 16:54:48,253 - common.shared.mlflow_setup - INFO - Using Service Principal authentication (from config.env)


‚úì azureml.mlflow is available - Azure ML tracking will be used if configured


2026-01-13 16:54:48,383 - common.shared.mlflow_setup - INFO - Successfully connected to Azure ML workspace: resume-ner-ws


KeyboardInterrupt: 

## Step 5: Drive Backup Setup (Colab Only)

Setup Google Drive backup/restore for Colab environments.


In [None]:
from pathlib import Path

# Fix numpy/pandas compatibility before importing orchestration modules
try:
    from infrastructure.storage.drive import create_colab_store
except (ValueError, ImportError) as e:
    if "numpy.dtype size changed" in str(e) or "numpy" in str(e).lower():
        print("‚ö† Numpy/pandas compatibility issue detected. Fixing...")
        import subprocess
        import sys
        subprocess.check_call([sys.executable, "-m", "pip", "install", "--upgrade", "--force-reinstall", "--no-cache-dir", "numpy>=1.24.0,<2.0.0", "pandas>=2.0.0", "--quiet"])
        print("‚úì Numpy/pandas reinstalled. Please restart the kernel and re-run this cell.")
        raise RuntimeError("Please restart kernel after numpy/pandas fix")
    else:
        raise

# Mount Google Drive and create backup store (Colab only - Kaggle doesn't need this)
DRIVE_BACKUP_DIR = None
drive_store = None
restore_from_drive = None

if IN_COLAB:
    drive_store = create_colab_store(ROOT_DIR, CONFIG_DIR)
    if drive_store:
        BACKUP_ENABLED = True
        DRIVE_BACKUP_DIR = drive_store.backup_root
        # Create restore function wrapper
        def restore_from_drive(local_path: Path, is_directory: bool = False) -> bool:
            """Restore file/directory from Drive backup."""
            try:
                expect = "dir" if is_directory else "file"
                result = drive_store.restore(local_path, expect=expect)
                return result.ok
            except Exception as e:
                print(f"‚ö† Drive restore failed: {e}")
                return False
        print(f"‚úì Google Drive mounted")
        print(f"‚úì Backup base directory: {DRIVE_BACKUP_DIR}")
        print(f"\nNote: All outputs/ will be mirrored to: {DRIVE_BACKUP_DIR / 'outputs'}")
    else:
        BACKUP_ENABLED = False
        print("‚ö† Warning: Could not mount Google Drive. Backup to Google Drive will be disabled.")
elif IN_KAGGLE:
    print("‚úì Kaggle environment detected - outputs are automatically persisted (no Drive mount needed)")
    BACKUP_ENABLED = False
else:
    # Local environment
    print("‚úì Local environment detected - outputs will be saved to repository (no Drive backup needed)")
    BACKUP_ENABLED = False


‚úì Local environment detected - outputs will be saved to repository (no Drive backup needed)


## Step 6: Optional - Run Benchmarking on Champions

**Optional Step**: If you haven't run benchmarking in `01_orchestrate_training_colab.ipynb`, you can run it here before selecting the best model. This step will:
1. Select champions (best trials) from HPO runs using Phase 2 selection logic
2. Run benchmarking on each champion to measure inference performance
3. Save benchmark results to MLflow for use in Step 7

**Note**: If benchmark runs already exist in MLflow, you can skip this step and proceed directly to Step 7.


In [None]:
# Optional: Run benchmarking on champions if not already done
# Skip this cell if benchmark runs already exist in MLflow

RUN_BENCHMARKING = True  # Set to True to run benchmarking

if RUN_BENCHMARKING:
    from evaluation.selection.trial_finder import select_champions_for_backbones
    from evaluation.benchmarking.orchestrator import (
        benchmark_champions,
        filter_missing_benchmarks,
    )
    from infrastructure.naming.mlflow.hpo_keys import (
        compute_data_fingerprint,
        compute_eval_fingerprint,
    )
    from common.shared.platform_detection import detect_platform
    from common.shared.yaml_utils import load_yaml
    from mlflow.tracking import MlflowClient
    from infrastructure.naming.experiments import build_mlflow_experiment_name
    from orchestration import STAGE_HPO
    from orchestration.jobs.tracking.mlflow_tracker import MLflowBenchmarkTracker
    
    print("üîÑ Running benchmarking on champions...")
    
    # Step 1: Load configs and setup MLflow client
    from infrastructure.config.loader import load_experiment_config, load_all_configs
    
    selection_config = load_yaml(CONFIG_DIR / "best_model_selection.yaml")
    benchmark_config = load_yaml(CONFIG_DIR / "benchmark.yaml")
    
    # Load all configs using the standard loader (consistent with other notebooks)
    experiment_config = load_experiment_config(CONFIG_DIR, experiment_name)
    configs = load_all_configs(experiment_config)
    data_config = configs.get("data", {})
    hpo_config = configs.get("hpo", {})
    mlflow_client = MlflowClient()
    
    # Step 2: Build HPO experiments dict (backbone -> {name, id})
    hpo_experiments = {}
    for exp in mlflow_client.search_experiments():
        if exp.name.startswith(f"{experiment_name}-hpo-"):
            backbone = exp.name.replace(f"{experiment_name}-hpo-", "")
            hpo_experiments[backbone] = {
                "name": exp.name,
                "id": exp.experiment_id
            }
    
    if not hpo_experiments:
        print("‚ö† No HPO experiments found. Skipping benchmarking.")
    else:
        # Step 3: Select champions per backbone (Phase 2)
        backbone_values = list(hpo_experiments.keys())
        print(f"‚úì Found {len(hpo_experiments)} HPO experiment(s)")
        print("üèÜ Selecting champions per backbone...")
        
        champions = select_champions_for_backbones(
            backbone_values=backbone_values,
            hpo_experiments=hpo_experiments,
            selection_config=selection_config,
            mlflow_client=mlflow_client,
        )
        
        if not champions:
            print("‚ö† No champions found. Skipping benchmarking.")
            
            # Add diagnostics to help debug
            print("\nüîç Diagnostics:")
            from infrastructure.naming.mlflow.tags_registry import load_tags_registry
            tags_registry = load_tags_registry(CONFIG_DIR)
            
            for backbone, exp_info in hpo_experiments.items():
                backbone_name = backbone.split("-")[0] if "-" in backbone else backbone
                runs = mlflow_client.search_runs(
                    experiment_ids=[exp_info["id"]],
                    filter_string="",
                    max_results=100,
                )
                finished_runs = [r for r in runs if r.info.status == "FINISHED"]
                print(f"\n  {backbone}: {len(finished_runs)} finished run(s)")
                
                # Check for required tags for champion selection
                if finished_runs:
                    # Separate parent and child runs
                    # Child runs: have mlflow.parentRunId tag
                    # Parent runs: don't have mlflow.parentRunId tag
                    child_runs = [r for r in finished_runs if r.data.tags.get("mlflow.parentRunId")]
                    parent_run_ids = {r.data.tags.get("mlflow.parentRunId") for r in child_runs if r.data.tags.get("mlflow.parentRunId")}
                    parent_runs = [r for r in finished_runs if r.info.run_id in parent_run_ids or not r.data.tags.get("mlflow.parentRunId")]
                    
                    print(f"    Parent runs: {len(parent_runs)}, Child runs: {len(child_runs)}")
                    
                    # Check child runs (what select_champion_per_backbone queries)
                    if child_runs:
                        sample_child = child_runs[0]
                        tags = sample_child.data.tags
                        stage_tag = tags_registry.key("process", "stage")
                        study_key_tag = tags_registry.key("grouping", "study_key_hash")
                        trial_key_tag = tags_registry.key("grouping", "trial_key_hash")
                        schema_tag = tags_registry.key("study", "key_schema_version")
                        
                        print(f"    Sample child run:")
                        print(f"      - stage: {tags.get(stage_tag, 'missing')}")
                        print(f"      - study_key_hash: {'present' if tags.get(study_key_tag) else 'missing'}")
                        print(f"      - trial_key_hash: {'present' if tags.get(trial_key_tag) else 'missing'}")
                        print(f"      - schema_version: {tags.get(schema_tag, 'missing')}")
                    
                    # Check parent runs (where Phase 2 tags should be)
                    if parent_runs:
                        sample_parent = parent_runs[0]
                        tags = sample_parent.data.tags
                        schema_tag = tags_registry.key("study", "key_schema_version")
                        data_fp_tag = tags_registry.key("fingerprint", "data")
                        eval_fp_tag = tags_registry.key("fingerprint", "eval")
                        study_key_tag = tags_registry.key("grouping", "study_key_hash")
                        
                        print(f"    Sample parent run:")
                        print(f"      - schema_version: {tags.get(schema_tag, 'missing')}")
                        print(f"      - data_fp: {'present' if tags.get(data_fp_tag) else 'missing'}")
                        print(f"      - eval_fp: {'present' if tags.get(eval_fp_tag) else 'missing'}")
                        print(f"      - study_key_hash: {'present' if tags.get(study_key_tag) else 'missing'}")
            
            print("\nüí° Troubleshooting tips:")
            print("  1. **Artifact filter issue**: If you see 'Artifact filter removed X runs',")
            print("     the runs don't have 'code.artifact.available' tag set to 'true'.")
            print("     Options:")
            print("     a) Set require_artifact_available: false in config/best_model_selection.yaml")
            print("     b) Set code.artifact.available='true' tag on the runs (if artifacts exist)")
            print("  2. Ensure HPO runs have Phase 2 tags set (schema_version, fingerprints, etc.)")
            print("  3. Check that runs meet minimum trial requirements (min_trials_per_group in selection config)")
            print("  4. Check selection config in config/best_model_selection.yaml")
            print("\n   You can still proceed to Step 7 if benchmark runs already exist from notebook 01.")
        else:
            # Step 3.2: Extract fingerprints for benchmark key building (Phase 3)
            from infrastructure.tracking.mlflow.hash_utils import derive_eval_config
            
            data_fp = compute_data_fingerprint(data_config)
            # Derive eval_config consistently using centralized utility
            train_config = configs.get("train", {})
            eval_config = derive_eval_config(train_config, hpo_config)
            eval_fp = compute_eval_fingerprint(eval_config)
            
            # Step 3.3: Filter missing benchmarks (Phase 3 idempotency)
            benchmark_experiment_name = f"{experiment_name}-benchmark"
            benchmark_experiment = None
            for exp in mlflow_client.search_experiments():
                if exp.name == benchmark_experiment_name:
                    benchmark_experiment = {
                        "name": exp.name,
                        "id": exp.experiment_id
                    }
                    break
            
            if not benchmark_experiment:
                # Create benchmark experiment if it doesn't exist
                benchmark_experiment_id = mlflow_client.create_experiment(benchmark_experiment_name)
                benchmark_experiment = {
                    "name": benchmark_experiment_name,
                    "id": benchmark_experiment_id
                }
            
            # Get run mode for idempotency check
            from evaluation.benchmarking.orchestrator import get_benchmark_run_mode
            run_mode = get_benchmark_run_mode(benchmark_config, hpo_config)
            
            champions_to_benchmark = filter_missing_benchmarks(
                champions=champions,
                benchmark_experiment=benchmark_experiment,
                benchmark_config=benchmark_config,
                data_fingerprint=data_fp,
                eval_fingerprint=eval_fp,
                root_dir=ROOT_DIR,
                environment=detect_platform(),
                mlflow_client=mlflow_client,
                run_mode=run_mode,
            )
            
            skipped_count = len(champions) - len(champions_to_benchmark)
            if skipped_count > 0:
                print(f"‚è≠Ô∏è  Skipping {skipped_count} already-benchmarked champion(s)")
            
            # Step 3.4: Benchmark only missing champions (Phase 3)
            if champions_to_benchmark:
                print(f"\nüìä Benchmarking {len(champions_to_benchmark)} champion(s)...")
                
                # Setup test data path (matching notebook 01's logic)
                from pathlib import Path
                test_data_path = None
                
                # First: check benchmark config for explicit test_data path
                if benchmark_config.get("benchmarking", {}).get("test_data"):
                    test_data_path = Path(benchmark_config["benchmarking"]["test_data"])
                    if not test_data_path.is_absolute():
                        test_data_path = CONFIG_DIR / test_data_path
                else:
                    # Fallback: use data config's local_path (matching notebook 01)
                    if data_config.get("local_path"):
                        local_path_str = data_config.get("local_path", "../dataset")
                        dataset_path = (CONFIG_DIR / local_path_str).resolve()
                        
                        # Handle seed subdirectory for dataset_tiny (matching notebook 01)
                        seed = data_config.get("seed")
                        if seed is not None and "dataset_tiny" in str(dataset_path):
                            dataset_path = dataset_path / f"seed{seed}"
                        
                        # Try test.json in dataset directory
                        test_candidates = [
                            dataset_path / "test.json",
                            dataset_path / "validation.json",
                        ]
                        for path in test_candidates:
                            if path.exists():
                                test_data_path = path
                                break
                    
                    # Final fallback: try common locations relative to config
                    if not test_data_path:
                        possible_paths = [
                            CONFIG_DIR / "dataset" / "test.json",
                            CONFIG_DIR / "dataset" / "validation.json",
                        ]
                        for path in possible_paths:
                            if path.exists():
                                test_data_path = path
                                break
                
                if test_data_path and test_data_path.exists():
                    # Setup benchmark tracker
                    benchmark_tracker = MLflowBenchmarkTracker(benchmark_experiment_name)
                    
                    # Extract benchmark config parameters
                    benchmark_params = benchmark_config.get("benchmarking", {})
                    benchmark_batch_sizes = benchmark_params.get("batch_sizes", [1])
                    benchmark_iterations = benchmark_params.get("iterations", 10)
                    benchmark_warmup = benchmark_params.get("warmup_iterations", 10)
                    benchmark_max_length = benchmark_params.get("max_length", 512)
                    benchmark_device = benchmark_params.get("device")
                    
                    # Acquire checkpoints for champions (needed for benchmarking)
                    from evaluation.selection.artifact_acquisition import acquire_best_model_checkpoint
                    acquisition_config = load_yaml(CONFIG_DIR / "artifact_acquisition.yaml")
                    
                    # Acquire checkpoints for champions before benchmarking
                    # Phase 3: benchmark_champions() expects checkpoint_path to be set
                    # and uses all champion data (run_ids, hashes) directly (no redundant lookups)
                    for backbone, champion_data in champions_to_benchmark.items():
                        champion = champion_data["champion"]
                        run_id = champion.get("run_id")
                        refit_run_id = champion.get("refit_run_id")
                        trial_run_id = champion.get("trial_run_id")
                        
                        sweep_run_id = champion.get("sweep_run_id")  # Optional: parent HPO run_id
                        if not run_id:
                            continue
                        
                        # Acquire checkpoint using single source of truth
                        # Note: All champion data (run_ids, hashes) will be passed to benchmark_champions()
                        # which uses them directly without redundant MLflow lookups (Phase 3 optimization)
                        best_run_info = {
                            "run_id": refit_run_id or run_id,
                            "refit_run_id": refit_run_id,
                            "trial_run_id": trial_run_id,
                            "sweep_run_id": sweep_run_id,  # Optional: parent HPO run_id
                            "study_key_hash": champion.get("study_key_hash"),
                            "trial_key_hash": champion.get("trial_key_hash"),
                            "backbone": backbone,
                        }
                        
                        checkpoint_dir = acquire_best_model_checkpoint(
                            best_run_info=best_run_info,
                            root_dir=ROOT_DIR,
                            config_dir=CONFIG_DIR,
                            acquisition_config=acquisition_config,
                            selection_config=selection_config,
                            platform=PLATFORM,
                            restore_from_drive=restore_from_drive if "restore_from_drive" in locals() else None,
                            drive_store=drive_store if "drive_store" in locals() else None,
                            in_colab=IN_COLAB,
                        )
                        
                        # Update champion with checkpoint path
                        champion["checkpoint_path"] = Path(checkpoint_dir) if checkpoint_dir else None
                    
                    # Filter out champions without checkpoints
                    champions_to_benchmark = {
                        k: v for k, v in champions_to_benchmark.items()
                        if v["champion"].get("checkpoint_path")
                    }
                    
                    if champions_to_benchmark:
                        benchmark_results = benchmark_champions(
                            champions=champions_to_benchmark,
                            test_data_path=test_data_path,
                            root_dir=ROOT_DIR,
                            environment=detect_platform(),
                            data_config=data_config,
                            hpo_config=hpo_config,
                            benchmark_config=benchmark_config,
                            benchmark_experiment=benchmark_experiment,
                            benchmark_batch_sizes=benchmark_batch_sizes,
                            benchmark_iterations=benchmark_iterations,
                            benchmark_warmup=benchmark_warmup,
                            benchmark_max_length=benchmark_max_length,
                            benchmark_device=benchmark_device,
                            benchmark_tracker=benchmark_tracker,
                            backup_enabled=BACKUP_ENABLED,
                            backup_to_drive=restore_from_drive if "restore_from_drive" in locals() else None,
                            ensure_restored_from_drive=restore_from_drive if "restore_from_drive" in locals() else None,
                            mlflow_client=mlflow_client,
                        )
                        
                        print(f"\n‚úì Benchmarking complete. Results saved to MLflow experiment: {benchmark_experiment_name}")
                    else:
                        print("‚ö† No champions with checkpoints available for benchmarking.")
                else:
                    print(f"‚ö† Test data not found. Skipping benchmarking.")
                    print(f"   Tried paths:")
                    if benchmark_config.get("benchmarking", {}).get("test_data"):
                        print(f"     - {Path(benchmark_config['benchmarking']['test_data'])}")
                    if data_config.get("local_path"):
                        local_path_str = data_config.get("local_path", "../dataset")
                        dataset_path = (CONFIG_DIR / local_path_str).resolve()
                        seed = data_config.get("seed")
                        if seed is not None and "dataset_tiny" in str(dataset_path):
                            dataset_path = dataset_path / f"seed{seed}"
                        print(f"     - {dataset_path / 'test.json'}")
                        print(f"     - {dataset_path / 'validation.json'}")
                    print(f"     - {CONFIG_DIR / 'dataset' / 'test.json'}")
                    print(f"     - {CONFIG_DIR / 'dataset' / 'validation.json'}")
                    print(f"   üí° Tip: Set 'benchmarking.test_data' in config/benchmark.yaml to specify exact path")
            else:
                print("‚úì All champions already benchmarked - nothing to do!")
else:
    print("‚è≠ Skipping benchmarking (RUN_BENCHMARKING=False).")
    print("   If benchmark runs don't exist, set RUN_BENCHMARKING=True or run benchmarking in notebook 01.")


üîÑ Running benchmarking on champions...
‚úì Found 2 HPO experiment(s)
üèÜ Selecting champions per backbone...


2026-01-13 16:47:58,559 - evaluation.selection.trial_finder - INFO - No runs found with stage='hpo_trial' for distilbert, trying legacy stage='hpo'
2026-01-13 16:47:58,766 - evaluation.selection.trial_finder - INFO - Found 3 runs with stage tag for distilbert (backbone=distilbert)
2026-01-13 16:47:58,766 - evaluation.selection.trial_finder - INFO - Filtered out 1 parent run(s) (only child/trial runs have metrics). 2 child runs remaining.
2026-01-13 16:47:58,767 - evaluation.selection.trial_finder - INFO - Grouped runs for distilbert: 1 v1 group(s), 0 v2 group(s)
2026-01-13 16:47:58,770 - evaluation.selection.trial_finder - INFO - Found 1 eligible group(s) for distilbert (0 skipped due to min_trials requirement)
2026-01-13 16:47:59,322 - evaluation.selection.trial_finder - INFO - Found refit run d2367a4f-eba... for champion trial dd68676b-776... (selected latest from 1 refit run(s))
2026-01-13 16:47:59,515 - evaluation.selection.trial_finder - INFO - No runs found with stage='hpo_trial'


üìä Benchmarking 1 champion(s)...


Downloading artifacts: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 1/1 [00:44<00:00, 44.73s/it]
2026-01-13 16:48:45,653 - evaluation.selection.artifact_acquisition - INFO - [ACQUISITION] Successfully downloaded checkpoint from run d2367a4f-eba...
2026-01-13 16:48:48,886 - evaluation.benchmarking.orchestrator - INFO - Benchmarking distilbert (973c6e7b3da2a400)...
2026-01-13 16:48:48,887 - evaluation.benchmarking.orchestrator - INFO - [BENCHMARK] Final run IDs: trial=dd68676b-776..., refit=d2367a4f-eba..., sweep=None...
2026-01-13 16:48:48,888 - evaluation.benchmarking.utils - INFO - Running benchmark script: /opt/conda/envs/resume-ner-training/bin/python -u /workspaces/resume-ner-azureml/src/evaluation/benchmarking/cli.py --checkpoint /workspaces/resume-ner-azureml/outputs/best_model_selection/local/distilbert/sel_28b7f255_973c6e7b/best_trial_checkpoint.tar.gz/best_trial_checkpoint --test-data /workspaces/resume-ner-azureml/dataset_tiny/seed0/test.json --batch-sizes 1 --iterations 10 --warmup 

Loaded 2 test texts
Starting benchmark for checkpoint: /workspaces/resume-ner-azureml/outputs/best_model_selection/local/distilbert/sel_28b7f255_973c6e7b/best_trial_checkpoint.tar.gz/best_trial_checkpoint
Loading tokenizer from /workspaces/resume-ner-azureml/outputs/best_model_selection/local/distilbert/sel_28b7f255_973c6e7b/best_trial_checkpoint.tar.gz/best_trial_checkpoint...
Tokenizer loaded.
Loading model from /workspaces/resume-ner-azureml/outputs/best_model_selection/local/distilbert/sel_28b7f255_973c6e7b/best_trial_checkpoint.tar.gz/best_trial_checkpoint...
Moving model to cpu...
Model loaded and set to eval mode.
Model ready on device: cpu

Benchmarking batch size 1...
  Running 10 warmup iterations, then 10 measurement iterations...
    Warmup: 10 iterations... 10/10 done.
    Measurement: 10 iterations... 10/10 done.
  Mean latency: 169.87 ms
  P95 latency: 177.41 ms
  Throughput: 5.89 docs/sec

Saving results to /workspaces/resume-ner-azureml/outputs/benchmarking/local/disti

2026-01-13 16:49:11,505 - evaluation.benchmarking.utils - INFO - [Benchmark Run Name] Building run name: trial_id=973c6e7b3da2a400, root_dir=/workspaces/resume-ner-azureml, config_dir=/workspaces/resume-ner-azureml/config
2026-01-13 16:49:11,531 - infrastructure.naming.mlflow.config - INFO - [Auto-Increment Config] Loading from config_dir=/workspaces/resume-ner-azureml/config, raw_auto_inc_config={'enabled': True, 'processes': {'hpo': True, 'benchmarking': True}, 'format': '{base}.{version}'}
2026-01-13 16:49:11,531 - infrastructure.naming.mlflow.config - INFO - [Auto-Increment Config] Validated config: {'enabled': True, 'processes': {'hpo': True, 'benchmarking': True}, 'format': '{base}.{version}'}, process_type=benchmarking
2026-01-13 16:49:11,532 - orchestration.jobs.tracking.index.version_counter - INFO - [Reserve Version] Starting reservation: counter_key=resume-ner:benchmarking:47c10fbe6cb075fe409f85d70cdde9c16820..., root_dir=/workspaces/resume-ner-azureml, config_dir=/workspace

üèÉ View run local_distilbert_benchmark_study-28b7f255_trial-973c6e7b_bench-8f19ff84_1 at: https://germanywestcentral.api.azureml.ms/mlflow/v2.0/subscriptions/50c06ef8-627b-46d5-b779-d07c9b398f75/resourceGroups/resume_ner_2026-01-02-16-47-05/providers/Microsoft.MachineLearningServices/workspaces/resume-ner-ws/#/experiments/29716cbc-2f1e-485a-87be-3ef5c2f931dd/runs/6e95634e-b833-45ff-b40a-9560bd5ca7d0
üß™ View experiment at: https://germanywestcentral.api.azureml.ms/mlflow/v2.0/subscriptions/50c06ef8-627b-46d5-b779-d07c9b398f75/resourceGroups/resume_ner_2026-01-02-16-47-05/providers/Microsoft.MachineLearningServices/workspaces/resume-ner-ws/#/experiments/29716cbc-2f1e-485a-87be-3ef5c2f931dd


2026-01-13 16:49:15,589 - evaluation.benchmarking.orchestrator - INFO - Benchmark completed: /workspaces/resume-ner-azureml/outputs/benchmarking/local/distilbert/study-28b7f255/trial-973c6e7b/bench-8f19ff84/benchmark.json
2026-01-13 16:49:15,589 - evaluation.benchmarking.orchestrator - INFO - Benchmarking complete. 1/1 trials benchmarked.



‚úì Benchmarking complete. Results saved to MLflow experiment: resume_ner_baseline-benchmark


## Step 7: Best Model Selection

Query MLflow benchmark runs (created by `01_orchestrate_training_colab.ipynb` or Step 6 above using `evaluation.benchmarking.benchmark_best_trials`), join to training runs via grouping tags, and select the best model using normalized composite scoring.

**Note**: Benchmark runs must exist in MLflow before running this step. If no benchmark runs are found, either:
- Set `RUN_BENCHMARKING=True` in Step 6 above, or
- Go back to `01_orchestrate_training_colab.ipynb` and run the benchmarking step.


In [None]:
from selection.mlflow_selection import find_best_model_from_mlflow
from selection.artifact_acquisition import acquire_best_model_checkpoint
from pathlib import Path
from typing import Optional, Callable, Dict, Any

# Validate experiments
if benchmark_experiment is None:
    raise ValueError(f"Benchmark experiment '{benchmark_experiment_name}' not found. Run benchmark jobs first.")
if not hpo_experiments:
    raise ValueError(f"No HPO experiments found. Run HPO jobs first.")

# Check if we should reuse cached selection
run_mode = selection_config.get("run", {}).get("mode", "reuse_if_exists")
best_model = None
cache_data = None

print(f"\nüìã Best Model Selection Mode: {run_mode}")

if run_mode == "reuse_if_exists":
    from selection.cache import load_cached_best_model

    tracking_uri = mlflow.get_tracking_uri()
    cache_data = load_cached_best_model(
        root_dir=ROOT_DIR,
        config_dir=CONFIG_DIR,
        experiment_name=experiment_name,
        selection_config=selection_config,
        tags_config=tags_config,
        benchmark_experiment_id=benchmark_experiment["id"],
        tracking_uri=tracking_uri,
    )

    if cache_data:
        best_model = cache_data["best_model"]
        # Success message already printed by load_cached_best_model
    else:
        print(f"\n‚Ñπ Cache not available or invalid - will query MLflow for fresh selection")
elif run_mode == "force_new":
    print(f"  Mode is 'force_new' - skipping cache, querying MLflow...")
else:
    print(f"  ‚ö† Unknown run mode '{run_mode}', defaulting to querying MLflow...")

if best_model is None:
    # Find best model
    best_model = find_best_model_from_mlflow(
        benchmark_experiment=benchmark_experiment,
        hpo_experiments=hpo_experiments,
        tags_config=tags_config,
        selection_config=selection_config
    )

    if best_model is None:
        # Provide diagnostic information
        from mlflow.tracking import MlflowClient
        from infrastructure.naming.mlflow.tags_registry import load_tags_registry

        client = MlflowClient()
        tags_registry = load_tags_registry(CONFIG_DIR)
        study_key_tag = tags_registry.key("grouping", "study_key_hash")
        trial_key_tag = tags_registry.key("grouping", "trial_key_hash")

        # Check benchmark experiment
        benchmark_runs = client.search_runs(
            experiment_ids=[benchmark_experiment["id"]],
            filter_string="",
            max_results=100,
        )
        finished_benchmark_runs = [r for r in benchmark_runs if r.info.status == "FINISHED"]

        # Check HPO experiments
        hpo_run_counts = {}
        hpo_trial_runs = []
        hpo_refit_runs = []
        stage_tag = tags_registry.key("process", "stage")

        for backbone, exp_info in hpo_experiments.items():
            hpo_runs = client.search_runs(
                experiment_ids=[exp_info["id"]],
                filter_string="",
                max_results=100,
            )
            finished_hpo_runs = [r for r in hpo_runs if r.info.status == "FINISHED"]
            hpo_run_counts[backbone] = len(finished_hpo_runs)

            # Separate trial and refit runs
            for run in finished_hpo_runs:
                stage = run.data.tags.get(stage_tag, "")
                if stage == "hpo" or stage == "hpo_trial":
                    hpo_trial_runs.append(run)
                elif stage == "hpo_refit":
                    hpo_refit_runs.append(run)

        # Collect unique (study_hash, trial_hash) pairs from benchmark runs
        benchmark_pairs = set()
        for run in finished_benchmark_runs:
            study_hash = run.data.tags.get(study_key_tag)
            trial_hash = run.data.tags.get(trial_key_tag)
            if study_hash and trial_hash:
                benchmark_pairs.add((study_hash, trial_hash))

        # Collect unique (study_hash, trial_hash) pairs from HPO trial runs
        hpo_trial_pairs = set()
        for run in hpo_trial_runs:
            study_hash = run.data.tags.get(study_key_tag)
            trial_hash = run.data.tags.get(trial_key_tag)
            if study_hash and trial_hash:
                hpo_trial_pairs.add((study_hash, trial_hash))

        # Collect unique (study_hash, trial_hash) pairs from HPO refit runs
        hpo_refit_pairs = set()
        for run in hpo_refit_runs:
            study_hash = run.data.tags.get(study_key_tag)
            trial_hash = run.data.tags.get(trial_key_tag)
            if study_hash and trial_hash:
                hpo_refit_pairs.add((study_hash, trial_hash))

        # Find matching pairs
        matching_pairs = benchmark_pairs & hpo_trial_pairs

        error_msg = (
            "Could not find best model from MLflow.\n\n"
            "Diagnostics:\n"
            f"  - Benchmark experiment '{benchmark_experiment['name']}': "
            f"{len(finished_benchmark_runs)} finished run(s) found\n"
            f"    - Unique (study_hash, trial_hash) pairs: {len(benchmark_pairs)}\n"
        )

        if hpo_run_counts:
            error_msg += "  - HPO experiments:\n"
            for backbone, count in hpo_run_counts.items():
                error_msg += f"    - {backbone}: {count} finished run(s) found\n"
            error_msg += (
                f"    - HPO trial runs: {len(hpo_trial_runs)} with {len(hpo_trial_pairs)} unique (study_hash, trial_hash) pairs\n"
                f"    - HPO refit runs: {len(hpo_refit_runs)} with {len(hpo_refit_pairs)} unique (study_hash, trial_hash) pairs\n"
            )

        error_msg += (
            f"\n  - Matching pairs: {len(matching_pairs)} out of {len(benchmark_pairs)} benchmark pairs\n"
        )

        if len(matching_pairs) == 0 and len(benchmark_pairs) > 0 and len(hpo_trial_pairs) > 0:
            # Show sample hashes for debugging
            error_msg += "\n  Sample benchmark (study_hash, trial_hash) pairs:\n"
            for i, (s, t) in enumerate(list(benchmark_pairs)[:3]):
                error_msg += f"    {i+1}. study={s[:16]}..., trial={t[:16]}...\n"

            error_msg += "\n  Sample HPO trial (study_hash, trial_hash) pairs:\n"
            for i, (s, t) in enumerate(list(hpo_trial_pairs)[:3]):
                error_msg += f"    {i+1}. study={s[:16]}..., trial={t[:16]}...\n"

            error_msg += (
                "\n  ‚ö†Ô∏è  Hash mismatch detected! This usually means:\n"
                "     - Benchmark runs were created from different trials than current HPO runs\n"
                "     - Study or trial hashes changed between runs (e.g., Phase 2 migration)\n"
                "     - Solution: Re-run benchmarking on champions (Step 6) to create new benchmark runs\n"
            )

        error_msg += (
            "\nPossible causes:\n"
            "  1. No benchmark runs have been executed yet. Run benchmark jobs first.\n"
            "  2. Benchmark runs exist but are missing required metrics or grouping tags.\n"
            "  3. HPO runs exist but are missing required metrics or grouping tags.\n"
            "  4. No matching runs found between benchmark and HPO experiments (hash mismatch).\n"
            "\nCheck the logs above for detailed information about what was found."
        )

        raise ValueError(error_msg)

    # Save to cache
    from selection.cache import save_best_model_cache

    tracking_uri = mlflow.get_tracking_uri()
    inputs_summary = {}

    timestamped_file, latest_file, index_file = save_best_model_cache(
        root_dir=ROOT_DIR,
        config_dir=CONFIG_DIR,
        best_model=best_model,
        experiment_name=experiment_name,
        selection_config=selection_config,
        tags_config=tags_config,
        benchmark_experiment=benchmark_experiment,
        hpo_experiments=hpo_experiments,
        tracking_uri=tracking_uri,
        inputs_summary=inputs_summary,
    )
    print(f"‚úì Saved best model selection to cache")

# Extract lineage information from best_model for final training tags
from training_exec import extract_lineage_from_best_model
lineage = extract_lineage_from_best_model(best_model)

# Acquire checkpoint
best_checkpoint_dir = acquire_best_model_checkpoint(
    best_run_info=best_model,
    root_dir=ROOT_DIR,
    config_dir=CONFIG_DIR,
    acquisition_config=acquisition_config,
    selection_config=selection_config,
    platform=PLATFORM,
    restore_from_drive=restore_from_drive if "restore_from_drive" in locals() else None,
    drive_store=drive_store if "drive_store" in locals() else None,
    in_colab=IN_COLAB,
)

print(f"\n‚úì Best model checkpoint available at: {best_checkpoint_dir}")


In [None]:
# Check if selected run is already final training (skip retraining if so)
stage_tag = tags_config.key("process", "stage")
trained_on_full_data_tag = tags_config.key("training", "trained_on_full_data")

is_final_training = best_model["tags"].get(stage_tag) == "final_training"
used_full_data = (
    best_model["tags"].get(trained_on_full_data_tag) == "true" or
    best_model["params"].get("use_combined_data", "false").lower() == "true"
)

SKIP_FINAL_TRAINING = is_final_training and used_full_data

if SKIP_FINAL_TRAINING:
    final_checkpoint_dir = best_checkpoint_dir


## Step 8: Final Training

Run final training with best configuration if needed.


In [None]:
if not SKIP_FINAL_TRAINING:
    print("üîÑ Starting final training with best configuration...")
    from training_exec import execute_final_training
    # Execute final training (uses final_training.yaml via load_final_training_config)
    # Will automatically reuse existing complete runs if run.mode: reuse_if_exists in final_training.yaml
    final_checkpoint_dir = execute_final_training(
        root_dir=ROOT_DIR,
        config_dir=CONFIG_DIR,
        best_model=best_model,
        experiment_config=experiment_config,
        lineage=lineage,
        training_experiment_name=training_experiment_name,
        platform=PLATFORM,
    )
else:
    print("‚úì Skipping final training - using selected checkpoint")

# Backup final checkpoint to Google Drive if in Colab
if IN_COLAB and drive_store and final_checkpoint_dir:
    checkpoint_path = Path(final_checkpoint_dir).resolve()
    # Check if checkpoint is already in Drive
    if str(checkpoint_path).startswith("/content/drive"):
        print(f"\n‚úì Final training checkpoint is already in Google Drive")
        print(f"  Drive path: {checkpoint_path}")
    else:
        try:
            print(f"\nüì¶ Backing up final training checkpoint to Google Drive...")
            result = drive_store.backup(checkpoint_path, expect="dir")
            if result.ok:
                print(f"‚úì Successfully backed up final checkpoint to Google Drive")
                print(f"  Drive path: {result.dst}")
            else:
                print(f"‚ö† Drive backup failed: {result.reason}")
                if result.error:
                    print(f"  Error: {result.error}")
        except Exception as e:
            print(f"‚ö† Drive backup error: {e}")
            print(f"  Checkpoint is still available locally at: {final_checkpoint_dir}")

## Step 9: Model Conversion & Optimization

Convert the final trained model to ONNX format with optimization.

In [None]:
# Extract parent training information for conversion
from common.shared.json_cache import load_json
from pathlib import Path

# Load metadata from final training output directory
final_training_metadata_path = final_checkpoint_dir.parent / "metadata.json"

if not final_training_metadata_path.exists():
    raise ValueError(
        f"Metadata file not found: {final_training_metadata_path}\n"
        "Please ensure final training completed successfully."
    )

metadata = load_json(final_training_metadata_path)
parent_spec_fp = metadata.get("spec_fp")
parent_exec_fp = metadata.get("exec_fp")
parent_training_run_id = metadata.get("mlflow", {}).get("run_id")

if not parent_spec_fp or not parent_exec_fp:
    raise ValueError(
        f"Missing required fingerprints in metadata: spec_fp={parent_spec_fp}, exec_fp={parent_exec_fp}\n"
        "Please ensure final training completed successfully."
    )

if parent_training_run_id:
    print(f"‚úì Parent training: spec_fp={parent_spec_fp[:8]}..., exec_fp={parent_exec_fp[:8]}..., run_id={parent_training_run_id[:12]}...")
else:
    print(f"‚úì Parent training: spec_fp={parent_spec_fp[:8]}..., exec_fp={parent_exec_fp[:8]}... (run_id not found)")

# Get parent training output directory (checkpoint parent)
parent_training_output_dir = final_checkpoint_dir.parent

print(f"\nüîÑ Starting model conversion...")
from conversion import execute_conversion

# Execute conversion (uses conversion.yaml via load_conversion_config)
conversion_output_dir = execute_conversion(
    root_dir=ROOT_DIR,
    config_dir=CONFIG_DIR,
    parent_training_output_dir=parent_training_output_dir,
    parent_spec_fp=parent_spec_fp,
    parent_exec_fp=parent_exec_fp,
    experiment_config=experiment_config,
    conversion_experiment_name=conversion_experiment_name,
    platform=PLATFORM,
    parent_training_run_id=parent_training_run_id,  # May be None, that's OK
)

# Find ONNX model file (search recursively, as model may be in onnx_model/ subdirectory)
onnx_files = list(conversion_output_dir.rglob("*.onnx"))
if onnx_files:
    onnx_model_path = onnx_files[0]
    print(f"\n‚úì Conversion completed successfully!")
    print(f"  ONNX model: {onnx_model_path}")
    print(f"  Model size: {onnx_model_path.stat().st_size / (1024 * 1024):.2f} MB")
else:
    print(f"\n‚ö† Warning: No ONNX model file found in {conversion_output_dir} (searched recursively)")

# Backup conversion output to Google Drive if in Colab
if IN_COLAB and drive_store and conversion_output_dir:
    output_path = Path(conversion_output_dir).resolve()
    # Check if output is already in Drive
    if str(output_path).startswith("/content/drive"):
        print(f"\n‚úì Conversion output is already in Google Drive")
        print(f"  Drive path: {output_path}")
    else:
        try:
            print(f"\nüì¶ Backing up conversion output to Google Drive...")
            result = drive_store.backup(output_path, expect="dir")
            if result.ok:
                print(f"‚úì Successfully backed up conversion output to Google Drive")
                print(f"  Drive path: {result.dst}")
            else:
                print(f"‚ö† Drive backup failed: {result.reason}")
                if result.error:
                    print(f"  Error: {result.error}")
        except Exception as e:
            print(f"‚ö† Drive backup error: {e}")
            print(f"  Output is still available locally at: {conversion_output_dir}")
