# Azure ML Registry Registration Debug Notebook

This notebook investigates why Azure ML model registration works in notebooks but fails in pipelines when using the same managed identity.

## Purpose
- Compare authentication contexts between notebook and pipeline environments
- Test model registration scenarios with managed identity
- Identify differences that cause pipeline failures

## 1. Import Required Libraries
Import Azure ML SDK, authentication libraries, and other required packages for model registration.

In [None]:
# Import Azure ML SDK and authentication libraries
import os
from pathlib import Path
from azure.identity import DefaultAzureCredential, ManagedIdentityCredential, AzureCliCredential
from azure.ai.ml import MLClient
from azure.ai.ml.entities import Model
from azure.ai.ml.constants import AssetTypes

# Print environment info for debugging
print("Environment Variables:")
print(f"DEFAULT_IDENTITY_CLIENT_ID: {os.environ.get('DEFAULT_IDENTITY_CLIENT_ID', 'NOT SET')}")
print(f"MSI_ENDPOINT: {os.environ.get('MSI_ENDPOINT', 'NOT SET')}")
print(f"IDENTITY_ENDPOINT: {os.environ.get('IDENTITY_ENDPOINT', 'NOT SET')}")

## 2. Set Up Azure ML Client with Managed Identity
Configure ManagedIdentityCredential and create MLClient instances for workspace and registry access using compute managed identity.

This section replicates the authentication approach used in the pipeline to identify context differences.

In [None]:
# Set up authentication - using same approach as pipeline
print("Setting up authentication...")

# Get managed identity client ID (same as pipeline)
msi_client_id = os.environ.get("DEFAULT_IDENTITY_CLIENT_ID")
print(f"Using MSI Client ID: {msi_client_id}")

# Create credential (same as pipeline)
credential = ManagedIdentityCredential(client_id=msi_client_id)

# Test credential access
try:
    token = credential.get_token("https://management.azure.com/.default")
    print("‚úÖ Managed identity authentication successful")
except Exception as e:
    print(f"‚ùå Managed identity authentication failed: {e}")

# Set up workspace connection details
subscription_id = "5784b6a5-de3f-4fa4-8b8f-e5bb70ff6b25"
resource_group = "rgamlcc001"
workspace_name = "amldevcc001"
registry_name = "amlrdevcc001"

print(f"Workspace: {workspace_name}")
print(f"Registry: {registry_name}")

# Create MLClient for workspace
ml_client_workspace = MLClient(
    credential=credential,
    subscription_id=subscription_id,
    resource_group_name=resource_group,
    workspace_name=workspace_name
)

# Create MLClient for registry
ml_client_registry = MLClient(
    credential=credential,
    registry_name=registry_name
)

print("‚úÖ MLClients created successfully")

## 3. Test Registry Access
Test the same registry access patterns used in the pipeline to identify differences.

In [None]:
# Test registry access - same as pipeline code
print("Testing registry access...")

try:
    # Test registry listing (same as pipeline)
    models_list = list(ml_client_registry.models.list())
    print(f"‚úÖ Registry access successful - found {len(models_list)} models")
    
    # Print first few models for verification
    for i, model in enumerate(models_list[:3]):
        print(f"  Model {i+1}: {model.name} (v{model.version})")
        
except Exception as e:
    print(f"‚ùå Registry access failed: {e}")

# Test workspace access for comparison
print("\nTesting workspace access...")
try:
    workspace_models = list(ml_client_workspace.models.list())
    print(f"‚úÖ Workspace access successful - found {len(workspace_models)} models")
except Exception as e:
    print(f"‚ùå Workspace access failed: {e}")

## 4. Test Model Registration to Registry
Create a simple test model and attempt to register it to the registry using the same logic as the pipeline.

**Key Question**: Does this work in the notebook context but fail in pipeline context?

In [None]:
# Create a test model registration using local model folder path
# This matches the pipeline scenario using local assets

print("üß™ Testing registry registration with local model folder...")

# Define the local model path - using your actual model location
model_path = "assets_sharing/artifacts/model"

print(f"Using local model path: {model_path}")

# Check if the model path exists
import os
if os.path.exists(model_path):
    print(f"‚úÖ Model path exists: {model_path}")
    # List contents if it's a directory
    if os.path.isdir(model_path):
        contents = os.listdir(model_path)
        print(f"   Contents: {contents[:5]}...")  # Show first 5 items
        
        # Check for MLflow model files
        if "MLmodel" in contents:
            print(f"   ‚úÖ MLflow model file found!")
        if "model.pkl" in contents or any(f.endswith('.pkl') for f in contents):
            print(f"   ‚úÖ Model pickle file found!")
        if "conda.yaml" in contents:
            print(f"   ‚úÖ Conda environment file found!")
else:
    print(f"‚ùå Model path does not exist: {model_path}")
    print(f"Current working directory: {os.getcwd()}")
    print(f"Available files/folders:")
    try:
        for item in os.listdir("."):
            print(f"   {item}")
        
        # Check if assets_sharing folder exists
        if os.path.exists("assets_sharing"):
            print(f"\n   assets_sharing folder contents:")
            for item in os.listdir("assets_sharing"):
                print(f"     {item}")
                if item == "artifacts" and os.path.isdir("assets_sharing/artifacts"):
                    print(f"       artifacts folder contents:")
                    for subitem in os.listdir("assets_sharing/artifacts"):
                        print(f"         {subitem}")
    except Exception as e:
        print(f"   Could not list directories: {e}")

try:
    # Create a new Model object for registry registration (same as pipeline logic)
    model_for_registry = Model(
        path=model_path,  # Use your actual model folder
        name="notebook-test-assets-model",  # Test name
        description="Test model registered from notebook using assets_sharing folder with managed identity",
        type=AssetTypes.MLFLOW_MODEL
    )
    
    print(f"Attempting to register model: {model_for_registry.name}")
    print(f"Model path: {model_path}")
    
    # This is the exact same call that fails in the pipeline
    registered_model = ml_client_registry.models.create_or_update(model_for_registry)
    print(f"‚úÖ SUCCESS: Model registered to registry from notebook!")
    print(f"   Registered model: {registered_model.name} (v{registered_model.version})")
    
except Exception as e:
    print(f"‚ùå Registry registration failed: {e}")
    print(f"Error type: {type(e).__name__}")
    
    # Print detailed error info
    if hasattr(e, 'error_code'):
        print(f"Error code: {e.error_code}")
    if hasattr(e, 'message'):
        print(f"Error message: {e.message}")
    
    # Additional debugging - check if it's a path issue
    print(f"\nüîç Debugging model path access:")
    try:
        # Try to create the model object without registering
        test_model = Model(
            path=model_path,
            name="test-path-only",
            description="Test path access only",
            type=AssetTypes.MLFLOW_MODEL
        )
        print(f"‚úÖ Model object creation successful")
    except Exception as path_error:
        print(f"‚ùå Model object creation failed: {path_error}")

# Test with workspace registration first (to verify the model works)
print(f"\nüîÑ Testing workspace registration with same local model...")
try:
    model_for_workspace = Model(
        path=model_path,
        name="notebook-test-workspace-assets",
        description="Test model registered to workspace from notebook using assets_sharing folder",
        type=AssetTypes.MLFLOW_MODEL
    )
    
    workspace_registered = ml_client_workspace.models.create_or_update(model_for_workspace)
    print(f"‚úÖ Workspace registration successful: {workspace_registered.name} (v{workspace_registered.version})")
    
except Exception as ws_error:
    print(f"‚ùå Workspace registration also failed: {ws_error}")
    print(f"This suggests a model path or format issue, not a registry-specific problem")

print(f"\nüí° Using your model path: {model_path}")
print(f"   This should match exactly what your pipeline uses!")

## 5. Environment Context Analysis
Compare the execution environment between notebook and pipeline contexts to identify potential differences.

In [None]:
# Analyze environment differences between notebook and pipeline
print("=== ENVIRONMENT ANALYSIS ===")

print(f"\nüìç Execution Context:")
print(f"  Current working directory: {os.getcwd()}")
print(f"  Python executable: {os.sys.executable}")
print(f"  Platform: {os.sys.platform}")

print(f"\nüîê Authentication Context:")
print(f"  MSI Client ID: {os.environ.get('DEFAULT_IDENTITY_CLIENT_ID', 'NOT SET')}")
print(f"  MSI Endpoint: {os.environ.get('MSI_ENDPOINT', 'NOT SET')}")
print(f"  Identity Endpoint: {os.environ.get('IDENTITY_ENDPOINT', 'NOT SET')}")
print(f"  Azure Client ID: {os.environ.get('AZURE_CLIENT_ID', 'NOT SET')}")

print(f"\nüåê Network Context:")
import socket
hostname = socket.gethostname()
print(f"  Hostname: {hostname}")

# Check if we're in a pipeline context
print(f"\nüîç Pipeline Context Detection:")
pipeline_vars = [
    'AZUREML_RUN_ID',
    'AZUREML_EXPERIMENT_ID', 
    'AZUREML_ROOT_RUN_ID',
    'AZUREML_RUN_TOKEN',
    'AZUREML_ARM_SUBSCRIPTION',
    'AZUREML_ARM_RESOURCEGROUP',
    'AZUREML_ARM_WORKSPACE_NAME'
]

for var in pipeline_vars:
    value = os.environ.get(var, 'NOT SET')
    print(f"  {var}: {value}")

# Check Azure ML context
print(f"\nüìä Azure ML Context:")
try:
    from azureml.core import Run
    run = Run.get_context()
    if hasattr(run, 'id'):
        print(f"  Run ID: {run.id}")
        print(f"  Experiment: {run.experiment.name}")
        print(f"  Run Type: {type(run).__name__}")
    else:
        print("  Not in an Azure ML Run context")
except:
    print("  Could not get Azure ML Run context")

In [None]:
# Check if compute cluster is configured for image builds
print("=== IMAGE BUILD CONFIGURATION CHECK ===")

try:
    # Get workspace details to check image build compute setting
    workspace_details = ml_client_workspace.workspaces.get(workspace_name)
    
    # Check if imageBuildCompute is configured
    if hasattr(workspace_details, 'image_build_compute') and workspace_details.image_build_compute:
        print(f"‚úÖ Image build compute configured: {workspace_details.image_build_compute}")
    else:
        print("‚ùå No image build compute configured - will use serverless")
        
    # List available compute targets
    print(f"\nüìä Available Compute Targets:")
    compute_list = list(ml_client_workspace.compute.list())
    for compute in compute_list:
        print(f"  - {compute.name} ({compute.type}) - State: {compute.provisioning_state}")
        
    # Check specifically for cpu-cluster-uami
    cpu_cluster = None
    try:
        cpu_cluster = ml_client_workspace.compute.get("cpu-cluster-uami")
        print(f"\nüéØ cpu-cluster-uami Details:")
        print(f"  Type: {cpu_cluster.type}")
        print(f"  State: {cpu_cluster.provisioning_state}")
        print(f"  VM Size: {cpu_cluster.size}")
        print(f"  Min Nodes: {cpu_cluster.scale_settings.min_node_count}")
        print(f"  Max Nodes: {cpu_cluster.scale_settings.max_node_count}")
        
        # Check if this cluster can be used for image builds
        if cpu_cluster.provisioning_state == "Succeeded":
            print(f"  ‚úÖ Cluster is ready and can be used for image builds")
        else:
            print(f"  ‚ö†Ô∏è Cluster state: {cpu_cluster.provisioning_state}")
            
    except Exception as e:
        print(f"‚ùå Could not get cpu-cluster-uami details: {e}")
        
except Exception as e:
    print(f"‚ùå Error checking workspace configuration: {e}")

# Alternative check using Azure CLI if available
print(f"\nüîç Alternative Check - Using Azure CLI:")
try:
    import subprocess
    result = subprocess.run([
        "az", "ml", "workspace", "show", 
        "--name", workspace_name,
        "--resource-group", resource_group,
        "--query", "imageBuildCompute",
        "--output", "tsv"
    ], capture_output=True, text=True, timeout=30)
    
    if result.returncode == 0:
        image_build_compute = result.stdout.strip()
        if image_build_compute and image_build_compute != "None":
            print(f"‚úÖ CLI confirms image build compute: {image_build_compute}")
        else:
            print("‚ùå CLI shows no image build compute configured")
    else:
        print(f"‚ö†Ô∏è CLI check failed: {result.stderr}")
        
except Exception as e:
    print(f"‚ö†Ô∏è Could not run Azure CLI check: {e}")

print(f"\nüí° Summary:")
print(f"- If image build compute is configured ‚Üí Your cluster will handle image builds")
print(f"- If not configured ‚Üí Azure ML uses serverless compute for image preparation")
print(f"- The 'prepare image' job you see is normal and expected!")

## 6. Conclusions and Next Steps

Based on the results above, we can determine:

### Key Findings:
1. **Authentication**: Does managed identity work the same in both contexts?
2. **Registry Access**: Can we list models from registry in notebook vs pipeline?
3. **Model Registration**: Does registry registration succeed in notebook but fail in pipeline?

### Potential Differences:
- **Network Context**: Different network routes or proxy settings
- **Environment Variables**: Missing or different environment variables in pipeline
- **Azure ML Context**: Different Azure ML Run contexts affecting model path resolution
- **File Path Resolution**: Different working directories or model path handling

### Next Actions:
If registry registration works in the notebook but fails in the pipeline with the same managed identity, the issue is likely:
1. **Path Resolution**: Pipeline and notebook resolve model paths differently
2. **Network Routing**: Different network paths to registry storage
3. **Azure ML Context**: Pipeline context affects how model URLs are constructed

Run this notebook on your compute instance to compare results with the pipeline execution!