# Infrastructure Setup

This notebook provisions and validates Azure infrastructure for the Resume NER training pipeline.

## Overview

- **Step 1**: Load Configuration
- **Step 2**: Validate Environment Variables
- **Step 3**: Create/Verify Azure ML Workspace
- **Step 4**: Create/Verify Storage Account and Containers
- **Step 5**: Create/Verify Compute Clusters
- **Step 6**: (Optional) Validate Infrastructure

## Prerequisites

1. **Authenticate with Azure** (via `DefaultAzureCredential`):
   - Azure CLI: `az login`
   - VS Code Azure extension
   - Managed Identity
   - Service Principal environment variables

2. **Install dependencies**:
   ```bash
   pip install -r setup/requirements.txt
   ```

3. **Configure environment variables**:
   ```bash
   cp config.env.example config.env
   # Edit config.env with your values
   ```

## Configuration

Edit `config/infrastructure.yaml` to customize resource names, VM sizes, and auto-scale settings.

## Notes

- Operations are idempotent (safe to run multiple times)
- Compute clusters auto-scale to 0 when idle
- Infrastructure must exist before running orchestration notebook


## Step 1: Load Configuration

In [2]:
import sys
from pathlib import Path

# Bootstrap: Find repository root and add src/ to Python path
# This must happen before importing from common or infrastructure
def find_repo_root() -> Path:
    """Find repository root by searching for config/ and src/ directories."""
    current_dir = Path.cwd()
    # Check current directory first
    if (current_dir / "config").exists() and (current_dir / "src").exists():
        return current_dir
    # Search up the directory tree
    for parent in current_dir.parents:
        if (parent / "config").exists() and (parent / "src").exists():
            return parent
    raise ValueError(f"Could not find repository root. Searched from: {current_dir}")

# Find repo root and add src to path
ROOT_DIR = find_repo_root()
SRC_DIR = ROOT_DIR / "src"
if str(SRC_DIR) not in sys.path:
    sys.path.insert(0, str(SRC_DIR))

# Now we can import from common and infrastructure
from common.shared.notebook_setup import setup_notebook_paths
from infrastructure.setup import (
    load_infrastructure_config,
    validate_environment_variables,
)

# Setup notebook paths (already have ROOT_DIR, but this ensures consistency)
paths = setup_notebook_paths(root_dir=ROOT_DIR, add_src_to_path=True)
ROOT_DIR = paths.root_dir
CONFIG_DIR = paths.config_dir

CONFIG_PATH = CONFIG_DIR / "infrastructure.yaml"
ENV_PATH = ROOT_DIR / "config.env"

print(f"✓ Repository root: {ROOT_DIR}")
print(f"✓ Config directory: {CONFIG_DIR}")
print(f"✓ Config path: {CONFIG_PATH} (exists: {CONFIG_PATH.exists()})")
print(f"✓ Env path: {ENV_PATH} (exists: {ENV_PATH.exists()})")
print()

# Load infrastructure configuration
config = load_infrastructure_config(CONFIG_PATH, ENV_PATH)

print(f"Configuration loaded:")
print(f"  Subscription ID: {config['azure']['subscription_id'][:8]}...")
print(f"  Resource Group: {config['azure']['resource_group']}")
print(f"  Location: {config['azure']['location']}")
print()


  from .autonotebook import tqdm as notebook_tqdm


✓ Repository root: /workspaces/resume-ner-azureml
✓ Config directory: /workspaces/resume-ner-azureml/config
✓ Config path: /workspaces/resume-ner-azureml/config/infrastructure.yaml (exists: True)
✓ Env path: /workspaces/resume-ner-azureml/config.env (exists: False)

Configuration loaded:
  Subscription ID: ...
  Resource Group: 
  Location: southeastasia



## Step 2: Validate Environment Variables

Ensure required environment variables are set.


In [None]:
# Validate required environment variables are set
validate_environment_variables()
print("✓ Environment variables validated")


## Step 3: Create/Verify Azure ML Workspace

Create or retrieve the Azure ML Workspace.


In [None]:
from infrastructure.setup import create_or_get_workspace

# Create or retrieve Azure ML Workspace
ml_client = create_or_get_workspace(config)
print("✓ Azure ML Workspace ready")


## Step 4: Create/Verify Storage Account and Containers

Create or retrieve Azure Blob Storage account and required containers.


In [None]:
from infrastructure.setup import create_or_get_storage

# Create or retrieve Azure Blob Storage account and containers
blob_client = create_or_get_storage(config)
print("✓ Storage account and containers ready")


## Step 5: Create/Verify Compute Clusters

Create or retrieve GPU and CPU compute clusters.


In [None]:
from infrastructure.setup import create_or_get_compute_clusters

# Create or retrieve compute clusters
create_or_get_compute_clusters(ml_client, config)
print("✓ Compute clusters ready")

## Step 6: (Optional) Validate Infrastructure

Validate that all infrastructure components exist and are accessible.


In [None]:
from infrastructure.setup import validate_compute, validate_storage, validate_workspace

# Validate all infrastructure components
all_errors = []
_, errors = validate_workspace(config)
all_errors.extend(errors)
_, errors = validate_storage(config)
all_errors.extend(errors)
_, errors = validate_compute(config)
all_errors.extend(errors)

if all_errors:
    raise ValueError(f"Validation failed: {', '.join(all_errors)}")
else:
    print("✓ All infrastructure components validated successfully")


Overriding of current TracerProvider is not allowed
Overriding of current LoggerProvider is not allowed
Overriding of current MeterProvider is not allowed
Attempting to instrument while already instrumented
Attempting to instrument while already instrumented
Attempting to instrument while already instrumented
Attempting to instrument while already instrumented
Attempting to instrument while already instrumented
Overriding of current TracerProvider is not allowed
Overriding of current LoggerProvider is not allowed
Overriding of current MeterProvider is not allowed
Attempting to instrument while already instrumented
Attempting to instrument while already instrumented
Attempting to instrument while already instrumented
Attempting to instrument while already instrumented
Attempting to instrument while already instrumented
