# Automated ML



In [1]:
from azureml.core import Workspace, Experiment
from azureml.core.environment import Environment
from azureml.core import Model 
from azureml.train.automl import AutoMLConfig
from azureml.widgets import RunDetails
from azureml.core.compute import ComputeTarget, AmlCompute
from pprint import pprint 
import joblib
import os

## Dataset

### Overview
In this project, I attempt to build a model that will help a bank manager predict customers that are likely to churn. The manager's intention is to proactively engage these customers with a view to preventing churn.

The data set contains details of credit card customers of a bank. There are 22 columns and 10000 rows. The last 2 columns were advised to be discarded by the data set provider which I have done in this project.

In [2]:
ws = Workspace.from_config()

# choose a name for experiment
experiment_name = 'credit_card_churn'

experiment=Experiment(ws, experiment_name)

In [3]:
#I'm fetching the data from my github repo
from azureml.data.dataset_factory import TabularDatasetFactory

cc_data = "https://raw.githubusercontent.com/obinnaonyema/CreditCardChurn_UdacityAZMLCapstone/main/BankChurners.csv"
cc_cust = TabularDatasetFactory.from_delimited_files(path=cc_data, separator=',')
columns_not_needed = ['Naive_Bayes_Classifier_Attrition_Flag_Card_Category_Contacts_Count_12_mon_Dependent_count_Education_Level_Months_Inactive_12_mon_1','Naive_Bayes_Classifier_Attrition_Flag_Card_Category_Contacts_Count_12_mon_Dependent_count_Education_Level_Months_Inactive_12_mon_2','CLIENTNUM']
cc_final = cc_cust.drop_columns(columns_not_needed)


In [4]:
#set up compute cluster
cluster_name = "compute-obi-3"
try:
    cluster = ComputeTarget(ws, cluster_name)
    print(f"{ccname} exists or is in use!, choose a different name")
except:
    cluster_config = AmlCompute.provisioning_configuration(vm_size="Standard_D4_V2", min_nodes=1, max_nodes=5)
    cluster = ComputeTarget.create(ws, cluster_name, cluster_config)
cluster.wait_for_completion(show_output=True)

Creating
Succeeded.......................
AmlCompute wait for completion finished

Minimum number of nodes requested have been provisioned


## AutoML Configuration

Timeout value is set to 60 minutes so that the project doesn't run for too long. Similarly, we set the porject to complete  a maximum 100 iterations with maximum concurrent iterations set to 5.Primary metric is accuracy

In [5]:
automl_settings = {
    "experiment_timeout_minutes": 60,
    "max_concurrent_iterations": 5,
    "experiment_exit_score": 0.95,
    "iterations":100,
    "primary_metric" : 'accuracy'
}


#auto ml config
automl_config = AutoMLConfig(
        task='classification',
        compute_target=cluster,
        training_data=cc_final,
        label_column_name='Attrition_Flag',
        n_cross_validations=5,
        **automl_settings    
)

In [6]:
#submit experiment
remote_run = experiment.submit(automl_config)

Running on remote.


## Run Details

OPTIONAL: Write about the different models trained and their performance. Why do you think some models did better than others?

TODO: In the cell below, use the `RunDetails` widget to show the different experiments.

In [7]:
RunDetails(remote_run).show()
remote_run.wait_for_completion(show_output=True)

_AutoMLWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', 's…


Current status: FeaturesGeneration. Generating features for the dataset.
Current status: DatasetCrossValidationSplit. Generating individually featurized CV splits.
Current status: ModelSelection. Beginning model selection.

****************************************************************************************************
DATA GUARDRAILS: 

TYPE:         Class balancing detection
STATUS:       ALERTED
DESCRIPTION:  To decrease model bias, please cancel the current run and fix balancing problem.
              Learn more about imbalanced data: https://aka.ms/AutomatedMLImbalancedData
DETAILS:      Imbalanced data can lead to a falsely perceived positive effect of a model's accuracy because the input data has bias towards one class.
+---------------------------------+---------------------------------+--------------------------------------+
|Size of the smallest class       |Name/Label of the smallest class |Number of samples in the training data|
|1627                             |Attri

{'runId': 'AutoML_daf0de03-e926-4526-b480-78e79179b016',
 'target': 'compute-obi-3',
 'status': 'Completed',
 'startTimeUtc': '2021-03-06T18:06:32.798248Z',
 'endTimeUtc': '2021-03-06T18:51:18.97507Z',
 'properties': {'num_iterations': '100',
  'training_type': 'TrainFull',
  'acquisition_function': 'EI',
  'primary_metric': 'accuracy',
  'train_split': '0',
  'acquisition_parameter': '0',
  'num_cross_validation': '5',
  'target': 'compute-obi-3',
  'DataPrepJsonString': '{\\"training_data\\": \\"{\\\\\\"blocks\\\\\\": [{\\\\\\"id\\\\\\": \\\\\\"7857c713-5787-4aa9-97fe-628ad5d997c1\\\\\\", \\\\\\"type\\\\\\": \\\\\\"Microsoft.DPrep.GetFilesBlock\\\\\\", \\\\\\"arguments\\\\\\": {\\\\\\"isArchive\\\\\\": false, \\\\\\"path\\\\\\": {\\\\\\"target\\\\\\": 4, \\\\\\"resourceDetails\\\\\\": [{\\\\\\"path\\\\\\": \\\\\\"https://raw.githubusercontent.com/obinnaonyema/CreditCardChurn_UdacityAZMLCapstone/main/BankChurners.csv\\\\\\"}]}}, \\\\\\"localData\\\\\\": {}, \\\\\\"isEnabled\\\\\\": tr

## Best Model



In [8]:
# obtaining best run and fitted model
best_run, fitted_model = remote_run.get_output()

# Print the best run
print(best_run)

# Get all metrics of the best run model
best_run_metrics = best_run.get_metrics()

# Print all metrics of the best run model
for metric_name in best_run_metrics:
    metric = best_run_metrics[metric_name]
    print(metric_name, metric)

Package:azureml-automl-runtime, training version:1.23.0, current version:1.22.0
Package:azureml-core, training version:1.23.0, current version:1.22.0
Package:azureml-dataprep, training version:2.10.1, current version:2.9.1
Package:azureml-dataprep-native, training version:30.0.0, current version:29.0.0
Package:azureml-dataprep-rslex, training version:1.8.0, current version:1.7.0
Package:azureml-dataset-runtime, training version:1.23.0, current version:1.22.0
Package:azureml-defaults, training version:1.23.0, current version:1.22.0
Package:azureml-interpret, training version:1.23.0, current version:1.22.0
Package:azureml-mlflow, training version:1.23.0, current version:1.22.0
Package:azureml-pipeline-core, training version:1.23.0, current version:1.22.0
Package:azureml-telemetry, training version:1.23.0, current version:1.22.0
Package:azureml-train-automl-client, training version:1.23.0, current version:1.22.0
Package:azureml-train-automl-runtime, training version:1.23.0, current versio

Run(Experiment: credit_card_churn,
Id: AutoML_daf0de03-e926-4526-b480-78e79179b016_0,
Type: azureml.scriptrun,
Status: Completed)
recall_score_macro 0.9373816189176543
AUC_weighted 0.9927565543404157
f1_score_macro 0.9446648454178048
precision_score_macro 0.9524679585847062
AUC_macro 0.9927565543404157
balanced_accuracy 0.9373816189176543
AUC_micro 0.9961979014514751
f1_score_weighted 0.9704187864550274
precision_score_micro 0.9706722240503091
recall_score_micro 0.9706722240503091
average_precision_score_weighted 0.9937302761047684
average_precision_score_micro 0.9962803112500429
matthews_correlation 0.8896876314833069
norm_macro_recall 0.8747632378353082
accuracy 0.9706722240503091
log_loss 0.07810623062866096
weighted_accuracy 0.9829847485935721
f1_score_micro 0.9706722240503091
precision_score_weighted 0.9703606281943566
recall_score_weighted 0.9706722240503091
average_precision_score_macro 0.9835230537427873
confusion_matrix aml://artifactId/ExperimentRun/dcid.AutoML_daf0de03-e926-

In [10]:
def print_model(model, prefix=""):
    for step in model.steps:
        print(prefix + step[0])
        if hasattr(step[1], 'estimators') and hasattr(step[1], 'weights'):
            pprint({'estimators': list(
                e[0] for e in step[1].estimators), 'weights': step[1].weights})
            print()
            for estimator in step[1].estimators:
                print_model(estimator[1], estimator[0] + ' - ')
        else:
            pprint(step[1].get_params())
            print()

print_model(fitted_model)

datatransformer
{'enable_dnn': None,
 'enable_feature_sweeping': None,
 'feature_sweeping_config': None,
 'feature_sweeping_timeout': None,
 'featurization_config': None,
 'force_text_dnn': None,
 'is_cross_validation': None,
 'is_onnx_compatible': None,
 'logger': None,
 'observer': None,
 'task': None,
 'working_dir': None}

MaxAbsScaler
{'copy': True}

LightGBMClassifier
{'boosting_type': 'gbdt',
 'class_weight': None,
 'colsample_bytree': 1.0,
 'importance_type': 'split',
 'learning_rate': 0.1,
 'max_depth': -1,
 'min_child_samples': 20,
 'min_child_weight': 0.001,
 'min_split_gain': 0.0,
 'n_estimators': 100,
 'n_jobs': 1,
 'num_leaves': 31,
 'objective': None,
 'random_state': None,
 'reg_alpha': 0.0,
 'reg_lambda': 0.0,
 'silent': True,
 'subsample': 1.0,
 'subsample_for_bin': 200000,
 'subsample_freq': 0,
 'verbose': -10}



In [11]:
joblib.dump(value=fitted_model, filename='model.pkl')


['model.pkl']

In [12]:
#TODO: Save the best model
automl_best = best_run.register_model(model_path='outputs/model.pkl', model_name='cc_best_model',
                        tags={'Method of execution':'Auto ML'},
                        properties={'Accuracy': best_run_metrics['accuracy']})

print(automl_best)

Model(workspace=Workspace.create(name='quick-starts-ws-139947', subscription_id='61c5c3f0-6dc7-4ed9-a7f3-c704b20e3b30', resource_group='aml-quickstarts-139947'), name=cc_best_model, id=cc_best_model:1, version=1, tags={'Method of execution': 'Auto ML'}, properties={'Accuracy': '0.9706722240503091'})


## Model Deployment

Remember you have to deploy only one of the two models you trained.. Perform the steps in the rest of this notebook only if you wish to deploy this model.

TODO: In the cell below, register the model, create an inference config and deploy the model as a web service.

In [13]:
# Download scoring file 
best_run.download_file('outputs/scoring_file_v_1_0_0.py', 'scoring.py')

# Download environment file
best_run.download_file('outputs/conda_env_v_1_0_0.yml', 'enviro.yml')

In [14]:
from azureml.core.environment import Environment
from azureml.core.model import InferenceConfig

inf_config = InferenceConfig(entry_script='scoring.py',
                                    environment=Environment.from_conda_specification(name='myenv',file_path='enviro.yml'))

# deploying the model via WebService
from azureml.core.webservice import AciWebservice

dep_config = AciWebservice.deploy_configuration(cpu_cores = 1, memory_gb = 1)
webservice = Model.deploy(ws, "webcc", [automl_best], inf_config, dep_config)
webservice.wait_for_deployment(show_output = True)
print(webservice.state)

print(webservice.scoring_uri)

print(webservice.swagger_uri)

Tips: You can try get_logs(): https://aka.ms/debugimage#dockerlog or local deployment: https://aka.ms/debugimage#debug-locally to debug if deployment takes longer than 10 minutes.
Running................................................................................................................................................................................
Succeeded
ACI service creation operation finished, operation "Succeeded"
Healthy
http://3fe89a3f-bbc6-4c7e-aff5-815f6104467c.southcentralus.azurecontainer.io/score
http://3fe89a3f-bbc6-4c7e-aff5-815f6104467c.southcentralus.azurecontainer.io/swagger.json


TODO: In the cell below, send a request to the web service you deployed to test it.

In [None]:
import json

#Importing the data set for testing 
test_cc = cc_final.to_pandas_dataframe().head() 
print(test_cc)
cc_lbl = test_cc.pop('Attrition_Flag')

test_data_cc = json.dumps({'data': test_cc.to_dict(orient='records')})

print(test_data_cc)

In [None]:
print(test_data_cc)

In [None]:
import requests

headers = {'Content-type': 'application/json'}

response = requests.post(webservice.scoring_uri, test_data_cc, headers=headers)

print(response.text)

In [None]:
#check previous label
print(cc_lbl)

TODO: In the cell below, print the logs of the web service and delete the service

In [None]:
print(webservice.get_logs())


In [32]:
#delete resources
webservice.delete()
cluster.delete()
