## Automated ML

Importing all needed dependencies to complete the project.

In [5]:
import requests
import json
import logging
import joblib
from pprint import pprint
import pandas as pd
from sklearn.model_selection import train_test_split

import azureml.core
from azureml.core.workspace import Workspace
from azureml.core.experiment import Experiment
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.train.automl import AutoMLConfig
from azureml.core.compute_target import ComputeTargetException
from azureml.core.dataset import Dataset
from azureml.data.dataset_factory import TabularDatasetFactory
from azureml.widgets import RunDetails
from azureml.core.model import Model, InferenceConfig
from azureml.core.webservice import AciWebservice
from azureml.automl.core.shared import constants
from azureml.core.environment import Environment

## Workspace and Experiment

In [6]:
ws = Workspace.from_config()
# choose a name for experiment
experiment_name = "automl_maternal_health_experiment"
experiment = Experiment(ws,experiment_name)

## AmlCompute cluster

In [7]:
# Choose a name for the cluster
cpu_cluster_name = 'cluster-yum'

try:
    compute_target = ComputeTarget(workspace=ws, name=cpu_cluster_name)
    print('Found existing cluster, use it.')
except ComputeTargetException:
    print('Creating a new compute cluster...')
    compute_config = AmlCompute.provisioning_configuration(vm_size='Standard_D2_V2', min_nodes=1, max_nodes=4)
    compute_target = ComputeTarget.create(ws, cpu_cluster_name, compute_config)

compute_target.wait_for_completion(show_output=True)

Found existing cluster, use it.
Succeeded
AmlCompute wait for completion finished

Minimum number of nodes requested have been provisioned


## Dataset

### Overview
TODO: In this markdown cell, give an overview of the dataset you are using. Also mention the task you will be performing.

TODO: Get data. In the cell below, write code to access the data you will be using in this project. Remember that the dataset needs to be external.

In [8]:
ws.datasets.keys()

KeysView({})

In [25]:
# validate dataset is in store
key = "Churn"
description_text = "Churn"


if key in ws.datasets.keys():
    dataset = ws.datasets[key]
    print('The Dataset was found')
# else:
#     # Create AML Dataset and register it into Workspace
#     data_url = "https://archive.ics.uci.edu/ml/machine-learning-databases/00592/Churn_Dateset.csv"
#     dataset = Dataset.Tabular.from_delimited_files(data_url)
#     #Register Dataset in Workspace
#     dataset = dataset.register(workspace = ws,name = key,description = description_text)

df = dataset.to_pandas_dataframe()
print(df.head())

   Anonymous Customer ID  Call  Failure  Complains  Subscription  Length  \
0                   1.00           8.00       0.00                 38.00   
1                   2.00           0.00       0.00                 39.00   
2                   3.00          10.00       0.00                 37.00   
3                   4.00          10.00       0.00                 38.00   
4                   5.00           3.00       0.00                 38.00   

   Charge  Amount  Seconds of Use  Frequency of use  Frequency of SMS  \
0            0.00         4370.00             71.00              5.00   
1            0.00          318.00              5.00              7.00   
2            0.00         2453.00             60.00            359.00   
3            0.00         4198.00             66.00              1.00   
4            0.00         2393.00             58.00              2.00   

   Distinct Called Numbers  Age Group  Tariff Plan  Status  Churn  \
0                    17.00       3.

## AutoML Configuration

In [11]:
# TODO: Put your automl settings here
automl_settings = {
    "experiment_timeout_minutes" : 30,
    "enable_early_stopping" : True,
    "primary_metric":'accuracy',
    "max_concurrent_iterations":4,
    "verbosity": logging.INFO
}
# TODO: Put your automl config here
automl_config = AutoMLConfig(compute_target = compute_target,
                            task = 'classification',
                            training_data=dataset,                         
                            label_column_name="Churn",
                            **automl_settings
                            )

In [12]:
# Submit your experiment
run = experiment.submit(automl_config, show_output=True)
#run.wait_for_completion()

Submitting remote run.
No run_configuration provided, running on cluster-yum with default configuration
Running on remote compute: cluster-yum


Experiment,Id,Type,Status,Details Page,Docs Page
automl_maternal_health_experiment,AutoML_a772be5d-e991-4976-b0af-c6a054c14d46,automl,NotStarted,Link to Azure Machine Learning studio,Link to Documentation



Current status: FeaturesGeneration. Generating features for the dataset.
Current status: DatasetBalancing. Performing class balancing sweeping
Current status: DatasetCrossValidationSplit. Generating individually featurized CV splits.
Current status: ModelSelection. Beginning model selection.

********************************************************************************************
DATA GUARDRAILS: 

TYPE:         Cross validation
STATUS:       DONE
DESCRIPTION:  In order to accurately evaluate the model(s) trained by AutoML, we leverage a dataset that the model is not trained on. Hence, if the user doesn't provide an explicit validation dataset, a part of the training dataset is used to achieve this. For smaller datasets (fewer than 20,000 samples), cross-validation is leveraged, else a single hold-out set is split from the training data to serve as the validation dataset. Hence, for your input data we leverage cross-validation with 10 folds, if the number of training samples are f

In [13]:
print("Run Status: ",run.get_status())

Run Status:  Completed


## Run Details

In the cell below, use the `RunDetails` widget to show the different experiments.

In [14]:
RunDetails(run).show()
run.wait_for_completion (show_output = True)

_AutoMLWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', 's…

Experiment,Id,Type,Status,Details Page,Docs Page
automl_maternal_health_experiment,AutoML_a772be5d-e991-4976-b0af-c6a054c14d46,automl,Completed,Link to Azure Machine Learning studio,Link to Documentation




********************************************************************************************
DATA GUARDRAILS: 

TYPE:         Cross validation
STATUS:       DONE
DESCRIPTION:  In order to accurately evaluate the model(s) trained by AutoML, we leverage a dataset that the model is not trained on. Hence, if the user doesn't provide an explicit validation dataset, a part of the training dataset is used to achieve this. For smaller datasets (fewer than 20,000 samples), cross-validation is leveraged, else a single hold-out set is split from the training data to serve as the validation dataset. Hence, for your input data we leverage cross-validation with 10 folds, if the number of training samples are fewer than 1000, and 3 folds in all other cases.
              Learn more about cross validation: https://aka.ms/AutomatedMLCrossValidation
DETAILS:      
+------------------------------+
|Number of folds               |
|3                             |
+------------------------------+

******

{'runId': 'AutoML_a772be5d-e991-4976-b0af-c6a054c14d46',
 'target': 'cluster-yum',
 'status': 'Completed',
 'startTimeUtc': '2023-01-26T21:52:30.598407Z',
 'endTimeUtc': '2023-01-26T22:03:07.503483Z',
 'services': {},
   'message': 'No scores improved over last 10 iterations, so experiment stopped early. This early stopping behavior can be disabled by setting enable_early_stopping = False in AutoMLConfig for notebook/python SDK runs.'}],
 'properties': {'num_iterations': '1000',
  'training_type': 'TrainFull',
  'acquisition_function': 'EI',
  'primary_metric': 'accuracy',
  'train_split': '0',
  'acquisition_parameter': '0',
  'num_cross_validation': None,
  'target': 'cluster-yum',
  'DataPrepJsonString': '{\\"training_data\\": {\\"datasetId\\": \\"0632e8bb-979f-4deb-af23-04ab012fc328\\"}, \\"datasets\\": 0}',
  'EnableSubsampling': None,
  'runTemplate': 'AutoML',
  'azureml.runsource': 'automl',
  'display_task_type': 'classification',
  'dependencies_versions': '{"azureml-widgets"

## Best Model

In the cell below, get the best model from the automl experiments and display all the properties of the model.

In [15]:
best_run, fitted_model = run.get_output()

Package:azureml-automl-runtime, training version:1.48.0.post1, current version:1.47.0
Package:azureml-core, training version:1.48.0, current version:1.47.0
Package:azureml-dataprep, training version:4.8.3, current version:4.5.7
Package:azureml-dataprep-rslex, training version:2.15.1, current version:2.11.4
Package:azureml-dataset-runtime, training version:1.48.0, current version:1.47.0
Package:azureml-defaults, training version:1.48.0, current version:1.47.0
Package:azureml-interpret, training version:1.48.0, current version:1.47.0
Package:azureml-mlflow, training version:1.48.0, current version:1.47.0
Package:azureml-pipeline-core, training version:1.48.0, current version:1.47.0
Package:azureml-responsibleai, training version:1.48.0, current version:1.47.0
Package:azureml-telemetry, training version:1.48.0, current version:1.47.0
Package:azureml-train-automl-client, training version:1.48.0, current version:1.47.0
Package:azureml-train-automl-runtime, training version:1.48.0.post1, cur

In [16]:
# Save the best model
# best_run.register_model(model_name = 'automl-best-model-yum.pkl',model_path = './outputs/')
joblib.dump(fitted_model, filename="auto_model.joblib")

['auto_model.joblib']

In [17]:
# best_run.get_file_names()
# # Download the yaml file that includes the environment dependencies
# best_run.download_file('outputs/conda_env_v_1_0_0.yml', 'env.yml')
# # Download the model file
# best_run.download_file('outputs/model.pkl', 'Automl_model.pkl')

model_name = best_run.properties['model_name']
model_name

'AutoMLa772be5de36'

## Model Deployment

Remember you have to deploy only one of the two models you trained but you still need to register both the models. Perform the steps in the rest of this notebook only if you wish to deploy this model.

In the cell below, register the model, create an inference config and deploy the model as a web service.

In [18]:
environment = best_run.get_environment().save_to_directory(path='environment')
entry_script='inference/scoring.py'
best_run.download_file('outputs/scoring_file_v_1_0_0.py', entry_script)

model = run.register_model(model_name=model_name)

inference_config = InferenceConfig(entry_script = entry_script, environment = environment)

deployment_config = AciWebservice.deploy_configuration(cpu_cores = 1, 
                                                    memory_gb = 1)

service = Model.deploy(ws, "churnservice", [model], inference_config, deployment_config)
service.wait_for_deployment(show_output = True)

Tips: You can try get_logs(): https://aka.ms/debugimage#dockerlog or local deployment: https://aka.ms/debugimage#debug-locally to debug if deployment takes longer than 10 minutes.
Running
2023-01-26 22:17:11+00:00 Creating Container Registry if not exists.
2023-01-26 22:17:13+00:00 Generating deployment configuration.
2023-01-26 22:17:14+00:00 Submitting deployment to compute.
2023-01-26 22:17:18+00:00 Checking the status of deployment churnservice..
2023-01-26 22:21:25+00:00 Checking the status of inference endpoint churnservice.
Succeeded
ACI service creation operation finished, operation "Succeeded"


In [20]:
# Getting the service state
# The scorig URI & the primary authentication key are copied to the endpoint.py file in order to test the deployed service.
# The Swagger URI can be used in Swagger UI: https://petstore.swagger.io/ For more info, please see the relevant part in the README file.

# # Authentication is enabled, so I use the get_keys method to retrieve the primary and secondary authentication keys:
# primary, secondary = service.get_keys()

print('Service state: ' + service.state)
print('Service scoring URI: ' + service.scoring_uri)
print('Service Swagger URI: ' + service.swagger_uri)


Service state: Healthy
Service scoring URI: http://05653210-0151-4d4c-9f09-01832b360570.southcentralus.azurecontainer.io/score
Service Swagger URI: http://05653210-0151-4d4c-9f09-01832b360570.southcentralus.azurecontainer.io/swagger.json


In the cell below, send a request to the web service you deployed to test it.

In [27]:
import requests
import json

data = {"data":
        [{"Anonymous Customer ID": 1.0, "Call  Failure": 8, "Complains": 0,  "Subscription  Length": 38, "Charge  Amount": 0, "Seconds of Use": 4370, "Frequency of use": 71, "Frequency of SMS": 5, "Distinct Called Numbers": 17,  "Age Group": 3, "Tariff Plan": 1, "Status": 1,  "Customer Value": 132.6}      
      ]
    }
# Convert to JSON string
input_data = json.dumps(data)
input_data
# Set the content type
headers = {'Content-Type': 'application/json'}

# Make the request and display the response
resp = requests.post(service.scoring_uri, input_data, headers=headers)
print(resp.json())

{"result": [0.0]}


TODO: In the cell below, print the logs of the web service and delete the service

In [28]:
# Printing the logs
print(service.get_logs())

2023-01-26T22:21:12,991932815+00:00 - iot-server/run 
2023-01-26T22:21:12,992822915+00:00 - rsyslog/run 
2023-01-26T22:21:12,996354615+00:00 - gunicorn/run 
2023-01-26T22:21:13,002108015+00:00 | gunicorn/run | 
2023-01-26T22:21:13,006818515+00:00 | gunicorn/run | ###############################################
2023-01-26T22:21:13,016402015+00:00 | gunicorn/run | AzureML Container Runtime Information
2023-01-26T22:21:13,025316115+00:00 | gunicorn/run | ###############################################
2023-01-26T22:21:13,026870515+00:00 | gunicorn/run | 
2023-01-26T22:21:13,028548015+00:00 | gunicorn/run | 
2023-01-26T22:21:13,105965215+00:00 | gunicorn/run | AzureML image information: openmpi3.1.2-ubuntu18.04, Materializaton Build:20230103.v4
2023-01-26T22:21:13,109076215+00:00 | gunicorn/run | 
2023-01-26T22:21:13,115448215+00:00 | gunicorn/run | 
2023-01-26T22:21:13,117070815+00:00 | gunicorn/run | PATH environment variable: /azureml-envs/azureml_cc96bffb210e59c36dafe3fe23d9a95e/bin:/o

## Deleting the service
Putting the deletion of the service in a separate cell to avoid accidentally running the cell before finishing the tasks

In [73]:
service.delete()

**Submission Checklist**
- I have registered the model.
- I have deployed the model with the best accuracy as a webservice.
- I have tested the webservice by sending a request to the model endpoint.
- I have deleted the webservice and shutdown all the computes that I have used.
- I have taken a screenshot showing the model endpoint as active.
- The project includes a file containing the environment details.
