# Automated ML

TODO: Import Dependencies. In the cell below, import all the dependencies that you will need to complete the project.

In [1]:
!pip install kagglehub
!pip uninstall psutil -y
!pip install psutil==5.9.0
!pip uninstall -y ipywidgets azureml-widgets
!pip install ipywidgets==7.6.5 azureml-widgets --no-cache-dir

Found existing installation: psutil 5.2.2
Uninstalling psutil-5.2.2:
  Successfully uninstalled psutil-5.2.2
Collecting psutil==5.9.0
  Downloading psutil-5.9.0-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (281 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m281.4/281.4 kB[0m [31m4.6 MB/s[0m eta [36m0:00:00[0m00:01[0m
[?25hInstalling collected packages: psutil
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
jupyterlab-nvdashboard 0.13.0 requires jupyterlab>=4, but you have jupyterlab 3.6.8 which is incompatible.
dask-sql 2024.5.0 requires dask[dataframe]>=2024.4.1, but you have dask 2023.2.0 which is incompatible.
dask-sql 2024.5.0 requires distributed>=2024.4.1, but you have distributed 2023.2.0 which is incompatible.
azureml-training-tabular 1.60.0 requires scipy<1.11.0,>=

In [13]:
import os
import shutil
import logging

import kagglehub
import pandas as pd

from azureml.core import Workspace, Experiment, Dataset, Environment, ScriptRunConfig
from azureml.core.authentication import AzureCliAuthentication
from azureml.core.compute import AmlCompute
from azureml.core.compute import ComputeTarget
from azureml.core.compute_target import ComputeTargetException
from azureml.train.automl import AutoMLConfig
from azureml.core.model import Model

## Dataset

### Overview
TODO: In this markdown cell, give an overview of the dataset you are using. Also mention the task you will be performing.


TODO: Get data. In the cell below, write code to access the data you will be using in this project. Remember that the dataset needs to be external.

In [3]:
# Fetch data from Kaggle and move it to src folder of project
source_path = kagglehub.dataset_download("blastchar/telco-customer-churn")
print("Download Path to dataset files:", source_path)

shutil.move(source_path, os.path.join(os.getcwd(),'data'))
print("New Path to dataset files:", os.path.join(os.getcwd(),'data'))

data_file = 'WA_Fn-UseC_-Telco-Customer-Churn.csv'

ws = Workspace.from_config()


default_ds = ws.get_default_datastore()
default_ds.upload_files(files=[os.path.join(os.getcwd(),'data',data_file)],
                        target_path='data', # Directory on the datastore
                        overwrite=True, # Overwrite if a file with the same name exists
                        show_progress=True)

unregistered_tabular_data = Dataset.Tabular.from_delimited_files(
    path=[(default_ds, os.path.join('data',data_file))],
    validate=True, # Validates data schema during creation
    separator=','  # Specify your CSV separator
)                        


"datastore.upload_files" is deprecated after version 1.0.69. Please use "FileDatasetFactory.upload_directory" instead. See Dataset API change notice at https://aka.ms/dataset-deprecation.


Uploading an estimated of 1 files
Uploading /mnt/batch/tasks/shared/LS_root/mounts/clusters/irafayabdul1/code/Users/irafayabdul/Capstone/data/WA_Fn-UseC_-Telco-Customer-Churn.csv
Uploaded /mnt/batch/tasks/shared/LS_root/mounts/clusters/irafayabdul1/code/Users/irafayabdul/Capstone/data/WA_Fn-UseC_-Telco-Customer-Churn.csv, 1 files out of an estimated total of 1
Uploaded 1 files


In [4]:
unregistered_tabular_data = unregistered_tabular_data.drop_columns(columns=['customerID'])

try:
    registered_dataset = unregistered_tabular_data.register(
        workspace=ws,
        name='ibm-telco-data',
        description="Cleaned data for churn prediction AutoML",
        create_new_version=True
    )
    print(f"Dataset '{registered_dataset.name}' registered successfully.")
    print(f"Version: {registered_dataset.version}")
    print(f"Registered Dataset ID: {registered_dataset.id}")

except Exception as e:
    print(f"Error registering dataset: {e}")
    print("Attempting to get existing dataset instead...")
    try:
        registered_dataset = Dataset.get_by_name(ws, name='ibm-telco-data', version='1')
        print(f"Using existing dataset '{registered_dataset.name}', version {registered_dataset.version}")
    except Exception as get_e:
        print(f"Failed to get existing dataset after registration error: {get_e}")

Dataset 'ibm-telco-data' registered successfully.
Version: 1
Registered Dataset ID: 09ec2bcf-0943-4881-b2d6-25af8a02bad4


## AutoML Configuration

TODO: Explain why you chose the automl settings and cofiguration you used below.

In [5]:
amlcompute_cluster_name = "mycompute"

try:
    compute_target = ComputeTarget(workspace=ws, name=amlcompute_cluster_name)
    print('Found existing cluster, use it.')
except ComputeTargetException:
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2',
                                                           max_nodes=6,
                                                           min_nodes=1)
    compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, compute_config)

compute_target.wait_for_completion(show_output=True, min_node_count = 0, timeout_in_minutes = 10)

# choose a name for experiment
experiment_name = 'churn-auto-ml'

experiment=Experiment(ws, experiment_name)

# # # TODO: Put your automl settings here
automl_settings = {
    "experiment_timeout_minutes": 30,
    "max_concurrent_iterations": 5,
    "primary_metric" : 'accuracy',
    "n_cross_validations": 5
}
# # TODO: Put your automl config here
automl_config = AutoMLConfig(
    compute_target="mycompute",
    task="classification",
    training_data=registered_dataset,
    label_column_name='Churn',
    iterations=30,
    iteration_timeout_minutes=5,
    enable_early_stopping= True,
    featurization= 'auto',
    enable_onnx_compatible_models=True,
    verbosity=logging.INFO,
    debug_log = "automl_errors.log",
    **automl_settings)

Found existing cluster, use it.
Succeeded
AmlCompute wait for completion finished

Minimum number of nodes requested have been provisioned


In [6]:
# TODO: Submit your experiment
remote_run = experiment.submit(automl_config, show_output=True)

Submitting remote run.
No run_configuration provided, running on mycompute with default configuration
Running on remote compute: mycompute


Experiment,Id,Type,Status,Details Page,Docs Page
churn-auto-ml,AutoML_ab84b96b-97cf-4bf0-9ca5-0a7111a78b78,automl,NotStarted,Link to Azure Machine Learning studio,Link to Documentation



Current status: FeaturesGeneration. Generating features for the dataset.
Current status: DatasetFeaturization. Beginning to fit featurizers and featurize the dataset.
Current status: DatasetCrossValidationSplit. Generating individually featurized CV splits.
Current status: ModelSelection. Beginning model selection.

********************************************************************************************
DATA GUARDRAILS: 

TYPE:         Class balancing detection
STATUS:       PASSED
DESCRIPTION:  Your inputs were analyzed, and all classes are balanced in your training data.
              Learn more about imbalanced data: https://aka.ms/AutomatedMLImbalancedData

********************************************************************************************

TYPE:         Missing feature values imputation
STATUS:       DONE
DESCRIPTION:  If the missing values are expected, let the run complete. Otherwise cancel the current run and use a script to customize the handling of missing featu

## Run Details

OPTIONAL: Write about the different models trained and their performance. Why do you think some models did better than others?

TODO: In the cell below, use the `RunDetails` widget to show the different experiments.

In [7]:
from azureml.widgets import RunDetails

RunDetails(remote_run).show()
remote_run.wait_for_completion(show_output=True)

2025-06-22 18:33:34.380902: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1750617215.630703    3066 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1750617216.063119    3066 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1750617219.584197    3066 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1750617219.584241    3066 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1750617219.584243    3066 computation_placer.cc:177] computation placer alr

_AutoMLWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', 's…

Experiment,Id,Type,Status,Details Page,Docs Page
churn-auto-ml,AutoML_ab84b96b-97cf-4bf0-9ca5-0a7111a78b78,automl,Completed,Link to Azure Machine Learning studio,Link to Documentation




********************************************************************************************
DATA GUARDRAILS: 

TYPE:         Class balancing detection
STATUS:       PASSED
DESCRIPTION:  Your inputs were analyzed, and all classes are balanced in your training data.
              Learn more about imbalanced data: https://aka.ms/AutomatedMLImbalancedData

********************************************************************************************

TYPE:         Missing feature values imputation
STATUS:       DONE
DESCRIPTION:  If the missing values are expected, let the run complete. Otherwise cancel the current run and use a script to customize the handling of missing feature values that may be more appropriate based on the data type and business requirement.
              Learn more about missing value imputation: https://aka.ms/AutomatedMLFeaturization
DETAILS:      
+------------------------------+------------------------------+------------------------------+
|Column name          

{'runId': 'AutoML_ab84b96b-97cf-4bf0-9ca5-0a7111a78b78',
 'target': 'mycompute',
 'status': 'Completed',
 'startTimeUtc': '2025-06-22T17:55:23.208687Z',
 'endTimeUtc': '2025-06-22T18:32:21.72999Z',
 'services': {},
 'properties': {'num_iterations': '30',
  'training_type': 'TrainFull',
  'acquisition_function': 'EI',
  'primary_metric': 'accuracy',
  'train_split': '0',
  'acquisition_parameter': '0',
  'num_cross_validation': '5',
  'target': 'mycompute',
  'DataPrepJsonString': '{\\"training_data\\": {\\"datasetId\\": \\"09ec2bcf-0943-4881-b2d6-25af8a02bad4\\"}, \\"datasets\\": 0}',
  'EnableSubsampling': 'False',
  'runTemplate': 'AutoML',
  'azureml.runsource': 'automl',
  'display_task_type': 'classification',
  'dependencies_versions': '{"azureml-opendatasets": "1.60.0", "azureml-contrib-server": "1.60.0", "azureml-interpret": "1.60.0", "azureml-inference-server-http": "1.4.0", "azureml-train": "1.60.0", "azureml-contrib-fairness": "1.60.0", "azureml-training-tabular": "1.60.0", 

## Best Model

TODO: In the cell below, get the best model from the automl experiments and display all the properties of the model.



In [11]:
best_run, fitted_model = remote_run.get_output()

print(f"Best AutoML run ID: {best_run.id}")
print(f"Best model pipeline type: {type(fitted_model)}")

# Get metrics from the best run
best_run_metrics = best_run.get_metrics()
parameter_values = best_run.get_details()['runDefinition']['arguments']

print('Best Run Id: ', best_run.id)
print('\n Accuracy:', best_run_metrics['accuracy'])
# print('\n Parameters:',best_run.get_details())

Best AutoML run ID: AutoML_ab84b96b-97cf-4bf0-9ca5-0a7111a78b78_29
Best model pipeline type: <class 'azureml.training.tabular.models.pipeline_with_ytransformations.PipelineWithYTransformations'>
Best Run Id:  AutoML_ab84b96b-97cf-4bf0-9ca5-0a7111a78b78_29

 Accuracy: 0.8088904929350281


In [16]:
#TODO: Save the best model
model_file_name = 'automlBestModel.pkl'
remote_model_path = f'outputs/model.pkl'
local_model_path = os.path.join('./training', model_file_name)

print(f"Downloading model from run {best_run.id} to {local_model_path}...")
best_run.download_file(name=remote_model_path, output_file_path=local_model_path)
print("Model downloaded successfully.")

print(f"Registering model '{model_file_name}' from run {best_run.id}...")
model = Model.register(workspace=ws,
                       model_path=local_model_path,
                       model_name="best_automl_model",
                       tags={'run_id': best_run.id, 'accuracy': best_run.get_metrics().get('accuracy', 'N/A')},
                       description="Best model tuned with automl")

print(f"Model registered successfully with name: {model.name}, version: {model.version}")

Downloading model from run AutoML_ab84b96b-97cf-4bf0-9ca5-0a7111a78b78_29 to ./training/automlBestModel.pkl...
Model downloaded successfully.
Registering model 'automlBestModel.pkl' from run AutoML_ab84b96b-97cf-4bf0-9ca5-0a7111a78b78_29...
Registering model best_automl_model
Model registered successfully with name: best_automl_model, version: 2


## Model Deployment

Remember you have to deploy only one of the two models you trained but you still need to register both the models. Perform the steps in the rest of this notebook only if you wish to deploy this model.

TODO: In the cell below, register the model, create an inference config and deploy the model as a web service.

In [17]:
from azureml.core.model import InferenceConfig
from azureml.core.webservice import AciWebservice, Webservice
from azureml.core.conda_dependencies import CondaDependencies
from azureml.exceptions import WebserviceException

inference_env = Environment.from_conda_specification(name="model-inference-env", file_path="conda_inf_dependencies.yml")

inference_config = InferenceConfig(
    entry_script="score.py",
    source_directory=".",
    environment=inference_env
)

aci_config = AciWebservice.deploy_configuration(
    cpu_cores=1.8,
    memory_gb=4,
    description='Web service for Telco Customer Churn Prediction',
    enable_app_insights=True,
    auth_enabled=True,
    ssl_enabled=False,
)

service = Model.deploy(
        workspace=ws,
        name="telco-churn-model-aci-service",
        models=[model],
        inference_config=inference_config,
        deployment_config=aci_config,
        overwrite=True 
    )
    
service.wait_for_deployment(True)

Tips: You can try get_logs(): https://aka.ms/debugimage#dockerlog or local deployment: https://aka.ms/debugimage#debug-locally to debug if deployment takes longer than 10 minutes.
Running
2025-06-22 21:10:40+00:00 Creating Container Registry if not exists.
2025-06-22 21:10:40+00:00 Registering the environment.
2025-06-22 21:10:41+00:00 Use the existing image.
2025-06-22 21:10:43+00:00 Submitting deployment to compute.
2025-06-22 21:10:46+00:00 Checking the status of deployment telco-churn-model-aci-service..
2025-06-22 21:14:32+00:00 Checking the status of inference endpoint telco-churn-model-aci-service.
Succeeded
ACI service creation operation finished, operation "Succeeded"


TODO: In the cell below, send a request to the web service you deployed to test it.

In [22]:
import requests
import json

# URL for the web service, should be similar to:
scoring_uri = "http://c71e6a1c-b260-4e47-82e2-91d75847ca26.eastus2.azurecontainer.io/score"

# If the service is authenticated, set the key or token
key = "YVDG9KJobRQnUdXgZ9cbLxLzIDbyQB1e"

data = {
  "Inputs": {
    "data": [
      {
        "gender": "Female",
        "SeniorCitizen": 0,
        "Partner": 1,
        "Dependents": 1,
        "tenure": 2,
        "PhoneService": 0,
        "MultipleLines": "No phone service",
        "InternetService": "DSL",
        "OnlineSecurity": "No",
        "OnlineBackup": "No",
        "DeviceProtection": "No",
        "TechSupport": "No",
        "StreamingTV": "No",
        "StreamingMovies": "No",
        "Contract": "Month-to-month",
        "PaperlessBilling": 1,
        "PaymentMethod": "Electronic check",
        "MonthlyCharges": 200,
        "TotalCharges": 4000
      }
    ]
  },
  "GlobalParameters": {
    "method": "predict"
  }
}

# Convert to JSON string
input_data = json.dumps(data)
with open("data.json", "w") as _f:
    _f.write(input_data)

# Set the content type
headers = {"Content-Type": "application/json"}
# If authentication is enabled, set the authorization header
headers["Authorization"] = f"Bearer {key}"

# Make the request and display the response
resp = requests.post(scoring_uri, input_data, headers=headers)
print(resp.json())


{'Results': [True]}


TODO: In the cell below, print the logs of the web service and delete the service

In [23]:
# Set with the deployment name
name = "telco-churn-model-aci-service"

# load existing web service
service = Webservice(name=name, workspace=ws)
logs = service.get_logs()

for line in logs.split('\n'):
    print(line)

2025-06-22T21:12:32,751232850+00:00 - rsyslog/run 
2025-06-22T21:12:32,757774677+00:00 - gunicorn/run 
2025-06-22T21:12:32,759860522+00:00 | gunicorn/run | 
2025-06-22T21:12:32,763552651+00:00 | gunicorn/run | ###############################################
2025-06-22T21:12:32,765738412+00:00 - nginx/run 
2025-06-22T21:12:32,767470068+00:00 | gunicorn/run | AzureML Container Runtime Information
2025-06-22T21:12:32,771618066+00:00 | gunicorn/run | ###############################################
2025-06-22T21:12:32,776551837+00:00 | gunicorn/run | 
2025-06-22T21:12:32,778937561+00:00 | gunicorn/run | 
2025-06-22T21:12:32,783019998+00:00 | gunicorn/run | AzureML image information: openmpi4.1.0-ubuntu20.04, Materializaton Build:20250224.v1
2025-06-22T21:12:32,785043858+00:00 | gunicorn/run | 
2025-06-22T21:12:32,786808355+00:00 | gunicorn/run | 
2025-06-22T21:12:32,788835311+00:00 | gunicorn/run | PATH environment variable: /azureml-envs/azureml_6f959dafbc85957c5bfe7fc508812c41/bin:/opt/mi

**Submission Checklist**
- Done. I have registered the model.
- Done. I have deployed the model with the best accuracy as a webservice.
- Done. I have tested the webservice by sending a request to the model endpoint.
- Done. I have deleted the webservice and shutdown all the computes that I have used.
- Done. I have taken a screenshot showing the model endpoint as active.
- Done. The project includes a file containing the environment details.
