# Iris Classification with Azure ML

This notebook demonstrates a complete Azure ML workflow:
1. Connect to Azure ML Workspace
2. Define and submit a training job
3. Monitor the job
4. Retrieve and analyze results

The actual training runs on Azure ML Compute using the `src/train.py` script.

## 1. Setup and Connect to Workspace

In [None]:
from azure.ai.ml import MLClient, command, Input
from azure.identity import DefaultAzureCredential

# TODO: Update with your workspace details
subscription_id = "<your-subscription-id>"
resource_group = "rg-ml-playground"
workspace_name = "mlw-playground"

# Connect to workspace
ml_client = MLClient(
    credential=DefaultAzureCredential(),
    subscription_id=subscription_id,
    resource_group_name=resource_group,
    workspace_name=workspace_name
)

print(f"✓ Connected to workspace: {workspace_name}")

## 2. Define Training Job

We'll submit a job that:
- Runs `src/train.py` on Azure ML Compute
- Uses a pre-built scikit-learn environment
- Logs metrics with MLflow
- Registers the trained model

In [None]:
# Define the training job
job = command(
    code="../src",  # Local path to source code
    command="python train.py --n_estimators 100 --max_depth 5 --test_size 0.2",
    environment="AzureML-sklearn-1.0-ubuntu20.04-py38-cpu@latest",
    compute="cpu-cluster",  # Created in 00_setup_connection.ipynb
    display_name="iris-random-forest",
    experiment_name="iris-classification",
    description="Train Random Forest classifier on Iris dataset",
)

print("Job configuration:")
print(f"  Experiment: {job.experiment_name}")
print(f"  Display name: {job.display_name}")
print(f"  Compute: {job.compute}")
print(f"  Command: {job.command}")

## 3. Submit Job to Azure ML

In [None]:
# Submit the job
returned_job = ml_client.jobs.create_or_update(job)

print(f"\n✓ Job submitted successfully!")
print(f"  Job name: {returned_job.name}")
print(f"  Status: {returned_job.status}")
print(f"\nView job in Azure ML Studio:")
print(f"  {returned_job.studio_url}")

## 4. Monitor Job (Optional)

Stream logs from the running job. This will block until the job completes.

In [None]:
# Stream job logs
ml_client.jobs.stream(returned_job.name)

## 5. Check Job Status

In [None]:
# Get job details
job_details = ml_client.jobs.get(returned_job.name)

print(f"Job status: {job_details.status}")
print(f"Created: {job_details.creation_context.created_at}")

if job_details.status == "Completed":
    print("\n✓ Job completed successfully!")
elif job_details.status == "Failed":
    print("\n✗ Job failed. Check logs in Azure ML Studio.")
else:
    print(f"\nJob is still {job_details.status.lower()}...")

## 6. Retrieve Metrics from MLflow

After the job completes, we can retrieve logged metrics:

In [None]:
import mlflow
from azure.ai.ml.entities import Run

# Set MLflow tracking URI to Azure ML workspace
mlflow.set_tracking_uri(ml_client.workspaces.get(workspace_name).mlflow_tracking_uri)

# Get run details
run = mlflow.get_run(returned_job.name)

print("Logged Parameters:")
for key, value in run.data.params.items():
    print(f"  {key}: {value}")

print("\nLogged Metrics:")
for key, value in run.data.metrics.items():
    print(f"  {key}: {value}")

## 7. List Registered Models

In [None]:
# List all models in the workspace
models = ml_client.models.list()

print("Registered models:")
for model in models:
    print(f"  - {model.name} (version {model.version})")
    if model.name == "iris_random_forest":
        print(f"    Created: {model.creation_context.created_at}")
        print(f"    Tags: {model.tags}")

## 8. Experiment with Different Hyperparameters

Let's run multiple experiments with different hyperparameters:

In [None]:
# Different hyperparameter configurations to try
experiments = [
    {"n_estimators": 50, "max_depth": 3},
    {"n_estimators": 100, "max_depth": 5},
    {"n_estimators": 200, "max_depth": 10},
]

submitted_jobs = []

for i, params in enumerate(experiments, 1):
    job = command(
        code="../src",
        command=f"python train.py --n_estimators {params['n_estimators']} --max_depth {params['max_depth']}",
        environment="AzureML-sklearn-1.0-ubuntu20.04-py38-cpu@latest",
        compute="cpu-cluster",
        display_name=f"iris-rf-experiment-{i}",
        experiment_name="iris-classification",
    )
    
    returned_job = ml_client.jobs.create_or_update(job)
    submitted_jobs.append(returned_job)
    print(f"✓ Submitted experiment {i}: n_estimators={params['n_estimators']}, max_depth={params['max_depth']}")

print(f"\n{len(submitted_jobs)} experiments submitted!")
print("View all experiments in Azure ML Studio under 'iris-classification' experiment.")

## Summary

In this notebook, we:
1. ✅ Connected to Azure ML Workspace
2. ✅ Defined a training job using `src/train.py`
3. ✅ Submitted the job to Azure ML Compute
4. ✅ Monitored job execution
5. ✅ Retrieved metrics from MLflow
6. ✅ Ran multiple experiments with different hyperparameters

## Next Steps

- Modify `src/train.py` to work with your own dataset
- Add more hyperparameters to experiment with
- Try different ML algorithms
- Deploy the best model as a web service
- Set up CI/CD pipelines for automated training

## Useful Links

- [Azure ML Studio](https://ml.azure.com)
- [Azure ML SDK Documentation](https://learn.microsoft.com/en-us/python/api/overview/azure/ai-ml-readme)
- [MLflow Tracking](https://mlflow.org/docs/latest/tracking.html)