# Deploy Best Model to an Online Endpoint

Use this notebook to convert the best trained model into an MLflow artifact, register it with Azure ML, and deploy it to a managed online endpoint.

## 0. Prerequisites

Make sure you have:

- Run `run_pipeline.py` (or the pipeline from `main.ipynb`) so that `outputs/model_output/<model_name>_model.pkl` exists locally.
- `azure-ai-ml>=1.14.0`, `mlflow`, and `azure-identity` installed in the current environment.
- `config.env` populated with your workspace and data asset settings.

If you are on a compute instance, these requirements should already be satisfied.


### Register Required Resource Providers

If Azure returns `ResourceOperationFailure: Resource provider [N/A] isn't registered with Subscription [N/A]`, register the missing providers before proceeding:

- Azure Portal → **Subscriptions** → pick the target subscription
- **Settings** → **Resource providers** → register anything marked `NotRegistered`
- Confirm at least `Microsoft.MachineLearningServices`, `Microsoft.PolicyInsights`, `Microsoft.Cdn`, `Microsoft.ContainerRegistry`, `Microsoft.Storage`, `Microsoft.KeyVault`, and `Microsoft.ManagedIdentity` are registered
- Wait a few minutes for propagation, then rerun the notebook



## Workflow at a Glance

1. Locate the freshest MLflow bundle produced by the training pipeline.
2. Connect to the Azure ML workspace using the credentials defined in `config.env`.
3. Register the model so deployments can reference a versioned asset.
4. Create (or recycle) a managed endpoint, deploy the model, and route traffic.
5. Invoke the endpoint for a smoke test, then delete it if you only needed a temporary environment.

Use the numbered sections below to walk through each stage sequentially.


## 1. Set Up Environment

Initialize paths and ensure the notebook runs from the repo root so that source imports and artifacts resolve correctly.


In [2]:
from __future__ import annotations

import os
import sys
import time
from pathlib import Path

from dotenv import load_dotenv
from azure.identity import DefaultAzureCredential, InteractiveBrowserCredential
from azure.ai.ml import MLClient
from azure.ai.ml.entities import Model, ManagedOnlineEndpoint, ManagedOnlineDeployment
from azure.ai.ml.constants import AssetTypes

NOTEBOOK_ROOT = Path.cwd().resolve()
PROJECT_ROOT = NOTEBOOK_ROOT if (NOTEBOOK_ROOT / "src").exists() else NOTEBOOK_ROOT.parent
os.chdir(PROJECT_ROOT)

if str(PROJECT_ROOT) not in sys.path:
    sys.path.append(str(PROJECT_ROOT))

from src.utils import load_azure_config

print(f"Project root: {PROJECT_ROOT}")


Project root: /workspaces/customer-churn-prediction-azureml


## 2. Locate Model Artifacts

Pick the latest MLflow bundle (or honor AML_MLFLOW_BUNDLE_PATH) and record the resource names used later.


**Quick reference**

- `AML_MLFLOW_BUNDLE_PATH`: override to point at a specific exported bundle.
- `AML_DEPLOY_MODEL_NAME`: registered model name in Azure ML.
- `AML_ONLINE_ENDPOINT_NAME`: reuse an existing endpoint name instead of the timestamp default.
- `AML_ONLINE_DEPLOYMENT_NAME`: deployment slot (e.g., `blue`, `green`).

Set these before executing this notebook if you need deterministic values.


In [3]:
# User Inputs
OUTPUTS_DIR = PROJECT_ROOT / "outputs"

MLFLOW_MODEL_DIR = None
env_mlflow_path = os.getenv("AML_MLFLOW_BUNDLE_PATH")
if env_mlflow_path:
    MLFLOW_MODEL_DIR = Path(env_mlflow_path)
elif OUTPUTS_DIR.exists():
    mlflow_bundles = sorted(
        [d for d in OUTPUTS_DIR.iterdir() if d.is_dir() and d.name.endswith("_mlflow")],
        key=lambda p: p.stat().st_mtime if p.exists() else 0,
        reverse=True,
    )
    if mlflow_bundles:
        MLFLOW_MODEL_DIR = mlflow_bundles[0]

if not MLFLOW_MODEL_DIR:
    available = "\n".join(str(p) for p in OUTPUTS_DIR.glob("*_mlflow")) or "(no *_mlflow bundles found)"
    raise FileNotFoundError(
        "No MLflow bundles (*_mlflow/) were found.\n"
        "Run run_pipeline.py locally or download the best-model artifacts from the pipeline run, "
        "then place them inside outputs/ or set AML_MLFLOW_BUNDLE_PATH.\n"
        f"Current outputs listing:\n{available}"
    )

# Names for Azure resources
MODEL_NAME = os.getenv("AML_DEPLOY_MODEL_NAME", "bank-churn-best-model")
ENDPOINT_NAME = os.getenv("AML_ONLINE_ENDPOINT_NAME", f"churn-endpoint-{int(time.time())}")
DEPLOYMENT_NAME = os.getenv("AML_ONLINE_DEPLOYMENT_NAME", "blue")

print(f"✓ Using MLflow bundle: {MLFLOW_MODEL_DIR}")
print(f"Model asset name: {MODEL_NAME}")
print(f"Endpoint name: {ENDPOINT_NAME}")
print(f"Deployment name: {DEPLOYMENT_NAME}")


✓ Using MLflow bundle: /workspaces/customer-churn-prediction-azureml/outputs/xgboost_mlflow
Model asset name: bank-churn-best-model
Endpoint name: churn-endpoint-1763549164
Deployment name: blue


## 3. Connect to Azure ML

Load workspace settings and create an MLClient with default credentials (falling back to interactive auth if needed).


In [4]:
# Connect to Azure ML workspace
load_dotenv(PROJECT_ROOT / "config.env")

azure_cfg = load_azure_config()

try:
    credential = DefaultAzureCredential()
    credential.get_token("https://management.azure.com/.default")
except Exception:
    credential = InteractiveBrowserCredential()

ml_client = MLClient(
    credential,
    subscription_id=azure_cfg["subscription_id"],
    resource_group_name=azure_cfg["resource_group"],
    workspace_name=azure_cfg["workspace_name"],
)
print(
    f"Connected to workspace: {ml_client.workspace_name} | "
    f"resource group: {ml_client.resource_group_name}"
)


Class DeploymentTemplateOperations: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.


Connected to workspace: churn-ml-workspace | resource group: rg-churn-ml-project-2025-11-15


## 4. Register the MLflow Model

Upload the MLflow directory as a managed model asset so deployments can reference it.


**Endpoint hygiene tips**

- Azure may leave a previous endpoint in `Failed` or `Deleting`; delete it before recreating to avoid BadRequest errors.
- CLI fallback when SDK delete hangs:
  - `az ml online-endpoint delete --name <endpoint> --yes`
  - `az ml online-endpoint show --name <endpoint> --query provisioning_state`
- Generation of a new endpoint name is cheap—prefer that when the resource is stuck in an unrecoverable state.

Run the cell below only after confirming the subscription resource providers are registered.


In [5]:
# Register MLflow model asset
model_asset = Model(
    name=MODEL_NAME,
    path=str(MLFLOW_MODEL_DIR),
    type=AssetTypes.MLFLOW_MODEL,
    description="Best churn model exported from training pipeline",
)
registered_model = ml_client.models.create_or_update(model_asset)
print(f"Registered model: {registered_model.name}:{registered_model.version}")


Registered model: bank-churn-best-model:3


## 5. Prepare the Managed Endpoint

Delete any failed endpoint with the same name, then create a fresh managed online endpoint.


**Deployment knobs**

| Env var | Default | Purpose |
| --- | --- | --- |
| `AML_ONLINE_INSTANCE_TYPE` | `Standard_D2as_v4` | VM SKU for scoring. Switch to a smaller size if you hit quota limits. |
| `AML_ONLINE_INSTANCE_COUNT` | `1` | Number of replicas. Scale out only after validating cost. |

If you see `ImageBuildFailure` or quota errors, revisit the environment definition or lower the SKU before re-running.


In [6]:
# Create or update managed online endpoint
# Check if endpoint already exists and delete if in failed state
try:
    existing_endpoint = ml_client.online_endpoints.get(ENDPOINT_NAME)
    if existing_endpoint.provisioning_state in ["Failed", "Canceled"]:
        print(f"Endpoint {ENDPOINT_NAME} is in {existing_endpoint.provisioning_state} state. Deleting...")
        ml_client.online_endpoints.begin_delete(ENDPOINT_NAME).result()
        print(f"Deleted failed endpoint {ENDPOINT_NAME}")
        time.sleep(5)  # Wait for deletion to propagate
except Exception:
    # Endpoint doesn't exist or other error - proceed with creation
    pass

endpoint = ManagedOnlineEndpoint(
    name=ENDPOINT_NAME,
    auth_mode="key",
    description="Online endpoint serving the churn model",
)

endpoint = ml_client.begin_create_or_update(endpoint).result()
print(f"Endpoint ready: {endpoint.name}")


Endpoint ready: churn-endpoint-1763549164


## 6. Deploy the Model

Create an online deployment referencing the registered model and preferred compute size.


In [7]:
# Deploy the model
deployment = ManagedOnlineDeployment(
    name=DEPLOYMENT_NAME,
    endpoint_name=ENDPOINT_NAME,
    model=registered_model,
    instance_type=os.getenv("AML_ONLINE_INSTANCE_TYPE", "Standard_D2as_v4"),
    instance_count=int(os.getenv("AML_ONLINE_INSTANCE_COUNT", "1")),
)

ml_client.online_deployments.begin_create_or_update(deployment).result()
print(f"Deployment '{DEPLOYMENT_NAME}' is live")


Check: endpoint churn-endpoint-1763549164 exists


......................................................................................Deployment 'blue' is live


## 7. Route Traffic

Send 100% of endpoint traffic to the new deployment once provisioning succeeds.


In [8]:
# Route traffic to the deployment
endpoint.traffic = {DEPLOYMENT_NAME: 100}
ml_client.begin_create_or_update(endpoint).result()
print(f"Endpoint traffic updated: {endpoint.traffic}")


Readonly attribute principal_id will be ignored in class <class 'azure.ai.ml._restclient.v2022_05_01.models._models_py3.ManagedServiceIdentity'>
Readonly attribute tenant_id will be ignored in class <class 'azure.ai.ml._restclient.v2022_05_01.models._models_py3.ManagedServiceIdentity'>


Endpoint traffic updated: {'blue': 100}


## 8. Invoke the Endpoint

Create a JSON file (e.g., sample-data.json) that matches the model schema, then call the managed endpoint to verify predictions.


**Invocation checklist**

1. Encode categorical fields exactly as the training pipeline expects (see `sample-data.json`).
2. Request payload must follow the MLflow pandas structure: `{"input_data": {"columns": [...], "data": [[...]]}}`.
3. For quick manual tests, update the JSON file and re-run this cell; for automated smoke tests, use `az ml online-endpoint invoke` with the same payload.

In [12]:
REQUEST_FILE = PROJECT_ROOT / "sample-data.json"

if not REQUEST_FILE.exists():
    raise FileNotFoundError(f"{REQUEST_FILE} not found.")

response = ml_client.online_endpoints.invoke(
    endpoint_name=ENDPOINT_NAME,
    deployment_name=DEPLOYMENT_NAME,
    request_file=str(REQUEST_FILE),
)
print("Raw response:", response)


Raw response: [0, 0, 1, 1]


## 9. Clean Up the Endpoint

Delete the managed online endpoint when you are done testing to avoid ongoing compute charges.


In [None]:
# Uncomment to delete the managed endpoint when no longer needed
# ml_client.online_endpoints.begin_delete(name=ENDPOINT_NAME)
# print(f"Deleted endpoint {ENDPOINT_NAME}")