# Automated ML



In [1]:
from azureml.core import Workspace, Experiment
from azureml.core import Model 
from azureml.train.automl import AutoMLConfig
from azureml.widgets import RunDetails
from azureml.core.compute import ComputeTarget, AmlCompute
from pprint import pprint 
import joblib
import os

## Dataset

### Overview
In this project, I attempt to build a model that will help a bank manager predict customers that are likely to churn. The manager's intention is to proactively engage these customers with a view to preventing churn.

The data set contains details of credit card customers of a bank. There are 22 columns and 10000 rows. The last 2 columns were advised to be discarded by the data set provider which I have done in this project.

In [2]:
ws = Workspace.from_config()

# choose a name for experiment
experiment_name = 'credit_card_churn'

experiment=Experiment(ws, experiment_name)

In [3]:
#I'm fetching the data from my github repo
from azureml.data.dataset_factory import TabularDatasetFactory

cc_data = "https://raw.githubusercontent.com/obinnaonyema/CreditCardChurn_UdacityAZMLCapstone/main/BankChurners.csv"
cc_cust = TabularDatasetFactory.from_delimited_files(path=cc_data, separator=',')
columns_not_needed = ['Naive_Bayes_Classifier_Attrition_Flag_Card_Category_Contacts_Count_12_mon_Dependent_count_Education_Level_Months_Inactive_12_mon_1','Naive_Bayes_Classifier_Attrition_Flag_Card_Category_Contacts_Count_12_mon_Dependent_count_Education_Level_Months_Inactive_12_mon_2','CLIENTNUM']
cc_final = cc_cust.drop_columns(columns_not_needed)


In [4]:
#set up compute cluster
cluster_name = "compute-obi-3"
try:
    cluster = ComputeTarget(ws, cluster_name)
    print(f"{ccname} exists or is in use!, choose a different name")
except:
    cluster_config = AmlCompute.provisioning_configuration(vm_size="Standard_D4_V2", min_nodes=1, max_nodes=5)
    cluster = ComputeTarget.create(ws, cluster_name, cluster_config)
cluster.wait_for_completion(show_output=True)

Creating
Succeeded.......................
AmlCompute wait for completion finished

Minimum number of nodes requested have been provisioned


## AutoML Configuration

Timeout value is set to 60 minutes so that the project doesn't run for too long. Similarly, we set the porject to complete  a maximum 100 iterations with maximum concurrent iterations set to 5.Primary metric is accuracy

In [5]:
automl_settings = {
    "experiment_timeout_minutes": 60,
    "max_concurrent_iterations": 5,
    "experiment_exit_score": 0.963,
    "iterations":100,
    "primary_metric" : 'accuracy'
}


#auto ml config
automl_config = AutoMLConfig(
        task='classification',
        compute_target=cluster,
        training_data=cc_final,
        label_column_name='Attrition_Flag',
        n_cross_validations=5,
        **automl_settings    
)

In [6]:
#submit experiment
remote_run = experiment.submit(automl_config)

Running on remote.


## Run Details

OPTIONAL: Write about the different models trained and their performance. Why do you think some models did better than others?

TODO: In the cell below, use the `RunDetails` widget to show the different experiments.

In [7]:
RunDetails(remote_run).show()
remote_run.wait_for_completion(show_output=True)

_AutoMLWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', 's…


Current status: FeaturesGeneration. Generating features for the dataset.
Current status: DatasetBalancing. Performing class balancing sweeping
Current status: DatasetCrossValidationSplit. Generating individually featurized CV splits.
Current status: ModelSelection. Beginning model selection.

****************************************************************************************************
DATA GUARDRAILS: 

TYPE:         Class balancing detection
STATUS:       ALERTED
DESCRIPTION:  To decrease model bias, please cancel the current run and fix balancing problem.
              Learn more about imbalanced data: https://aka.ms/AutomatedMLImbalancedData
DETAILS:      Imbalanced data can lead to a falsely perceived positive effect of a model's accuracy because the input data has bias towards one class.
+---------------------------------+---------------------------------+--------------------------------------+
|Size of the smallest class       |Name/Label of the smallest class |Number of 

        60   StandardScalerWrapper LightGBM                 0:00:44       0.9389    0.9533
        61   StandardScalerWrapper XGBoostClassifier        0:00:52       0.9265    0.9533
        62   SparseNormalizer XGBoostClassifier             0:00:53       0.8805    0.9533
        63   SparseNormalizer XGBoostClassifier             0:01:02       0.8949    0.9533
        64   StandardScalerWrapper XGBoostClassifier        0:00:57       0.9499    0.9533
        65   StandardScalerWrapper XGBoostClassifier        0:00:55       0.9419    0.9533
        66   StandardScalerWrapper XGBoostClassifier        0:00:47       0.9234    0.9533
        67   MaxAbsScaler LightGBM                          0:00:43       0.9331    0.9533
        69   StandardScalerWrapper LightGBM                 0:00:47       0.9392    0.9533
        72   StandardScalerWrapper XGBoostClassifier        0:00:48       0.9392    0.9533
        68   SparseNormalizer XGBoostClassifier             0:01:12       0.8984    0.9533

{'runId': 'AutoML_2e6c69b0-1fa0-4548-823c-38194d3b44b6',
 'target': 'compute-obi-3',
 'status': 'Completed',
 'startTimeUtc': '2021-03-04T21:44:18.372516Z',
 'endTimeUtc': '2021-03-04T23:01:54.728361Z',
 'properties': {'num_iterations': '100',
  'training_type': 'TrainFull',
  'acquisition_function': 'EI',
  'primary_metric': 'accuracy',
  'train_split': '0',
  'acquisition_parameter': '0',
  'num_cross_validation': '5',
  'target': 'compute-obi-3',
  'DataPrepJsonString': '{\\"training_data\\": \\"{\\\\\\"blocks\\\\\\": [{\\\\\\"id\\\\\\": \\\\\\"f7b4655c-2dd4-42f0-8441-12fe197c7b3b\\\\\\", \\\\\\"type\\\\\\": \\\\\\"Microsoft.DPrep.GetFilesBlock\\\\\\", \\\\\\"arguments\\\\\\": {\\\\\\"isArchive\\\\\\": false, \\\\\\"path\\\\\\": {\\\\\\"target\\\\\\": 4, \\\\\\"resourceDetails\\\\\\": [{\\\\\\"path\\\\\\": \\\\\\"https://raw.githubusercontent.com/obinnaonyema/CreditCardChurn_UdacityAZMLCapstone/main/BankChurners.csv\\\\\\"}]}}, \\\\\\"localData\\\\\\": {}, \\\\\\"isEnabled\\\\\\": t

## Best Model



In [8]:
# obtaining best run and fitted model
best_run, fitted_model = remote_run.get_output()

# Print the best run
print(best_run)

# Get all metrics of the best run model
best_run_metrics = best_run.get_metrics()

# Print all metrics of the best run model
for metric_name in best_run_metrics:
    metric = best_run_metrics[metric_name]
    print(metric_name, metric)

Package:azureml-automl-runtime, training version:1.23.0, current version:1.22.0
Package:azureml-core, training version:1.23.0, current version:1.22.0
Package:azureml-dataprep, training version:2.10.1, current version:2.9.1
Package:azureml-dataprep-native, training version:30.0.0, current version:29.0.0
Package:azureml-dataprep-rslex, training version:1.8.0, current version:1.7.0
Package:azureml-dataset-runtime, training version:1.23.0, current version:1.22.0
Package:azureml-defaults, training version:1.23.0, current version:1.22.0
Package:azureml-interpret, training version:1.23.0, current version:1.22.0
Package:azureml-mlflow, training version:1.23.0, current version:1.22.0
Package:azureml-pipeline-core, training version:1.23.0, current version:1.22.0
Package:azureml-telemetry, training version:1.23.0, current version:1.22.0
Package:azureml-train-automl-client, training version:1.23.0, current version:1.22.0
Package:azureml-train-automl-runtime, training version:1.23.0, current versio

AttributeError: /anaconda/envs/azureml_py36/lib/libxgboost.so: undefined symbol: XGBoosterUnserializeFromBuffer

In [9]:
def print_model(model, prefix=""):
    for step in model.steps:
        print(prefix + step[0])
        if hasattr(step[1], 'estimators') and hasattr(step[1], 'weights'):
            pprint({'estimators': list(
                e[0] for e in step[1].estimators), 'weights': step[1].weights})
            print()
            for estimator in step[1].estimators:
                print_model(estimator[1], estimator[0] + ' - ')
        else:
            pprint(step[1].get_params())
            print()

print_model(fitted_model)

datatransformer
{'enable_dnn': None,
 'enable_feature_sweeping': None,
 'feature_sweeping_config': None,
 'feature_sweeping_timeout': None,
 'featurization_config': None,
 'force_text_dnn': None,
 'is_cross_validation': None,
 'is_onnx_compatible': None,
 'logger': None,
 'observer': None,
 'task': None,
 'working_dir': None}

prefittedsoftvotingclassifier
{'estimators': ['190',
                '155',
                '110',
                '179',
                '109',
                '187',
                '125',
                '82',
                '194',
                '90'],
 'weights': [0.07142857142857142,
             0.07142857142857142,
             0.07142857142857142,
             0.14285714285714285,
             0.14285714285714285,
             0.07142857142857142,
             0.07142857142857142,
             0.07142857142857142,
             0.14285714285714285,
             0.14285714285714285]}

190 - maxabsscaler
{'copy': True}

190 - lightgbmclassifier
{'boosting

In [10]:
joblib.dump(value=fitted_model, filename='model.pkl')


['model.pkl']

In [12]:
#TODO: Save the best model
automl_best = best_run.register_model(model_path='outputs/model.pkl', model_name='cc_best_model',
                        tags={'Method of execution':'Auto ML'},
                        properties={'Accuracy': best_run_metrics['accuracy']})

print(automl_best)

Model(workspace=Workspace.create(name='quick-starts-ws-139586', subscription_id='610d6e37-4747-4a20-80eb-3aad70a55f43', resource_group='aml-quickstarts-139586'), name=cc_best_model, id=cc_best_model:2, version=2, tags={'Method of execution': 'Auto ML'}, properties={'Accuracy': '0.9601262503370677'})


## Model Deployment

Remember you have to deploy only one of the two models you trained.. Perform the steps in the rest of this notebook only if you wish to deploy this model.

TODO: In the cell below, register the model, create an inference config and deploy the model as a web service.

In [13]:
# Download scoring file 
best_run.download_file('outputs/scoring_file_v_1_0_0.py', 'scoring.py')

# Download environment file
best_run.download_file('outputs/conda_env_v_1_0_0.yml', 'enviro.yml')

In [14]:
from azureml.core.environment import Environment
from azureml.core.model import InferenceConfig

inf_config = InferenceConfig(entry_script='scoring.py',
                                    environment=Environment.from_conda_specification(name='myenv',file_path='enviro.yml'))

# deploying the model via WebService
from azureml.core.webservice import AciWebservice

dep_config = AciWebservice.deploy_configuration(cpu_cores = 1, memory_gb = 1)
webservice = Model.deploy(ws, "webcc", [automl_best], inf_config, dep_config)
webservice.wait_for_deployment(show_output = True)
print(webservice.state)

print(webservice.scoring_uri)

print(webservice.swagger_uri)

Tips: You can try get_logs(): https://aka.ms/debugimage#dockerlog or local deployment: https://aka.ms/debugimage#debug-locally to debug if deployment takes longer than 10 minutes.
Running.......................................................................................................................................................................................................
Succeeded
ACI service creation operation finished, operation "Succeeded"
Healthy
http://40820628-40be-4700-a529-bff791b775b4.southcentralus.azurecontainer.io/score
http://40820628-40be-4700-a529-bff791b775b4.southcentralus.azurecontainer.io/swagger.json


TODO: In the cell below, send a request to the web service you deployed to test it.

In [20]:
import json

#Importing the data set for testing 
test_cc = cc_final.to_pandas_dataframe().head() 
print(test_cc)
cc_lbl = test_cc.pop('Attrition_Flag')

test_data_cc = json.dumps({'data': test_cc.to_dict(orient='records')})

print(test_data_cc)

      Attrition_Flag  Customer_Age Gender  Dependent_count Education_Level  \
0  Existing Customer            45      M                3     High School   
1  Existing Customer            49      F                5        Graduate   
2  Existing Customer            51      M                3        Graduate   
3  Existing Customer            40      F                4     High School   
4  Existing Customer            40      M                3      Uneducated   

  Marital_Status Income_Category Card_Category  Months_on_book  \
0        Married     $60K - $80K          Blue              39   
1         Single  Less than $40K          Blue              44   
2        Married    $80K - $120K          Blue              36   
3        Unknown  Less than $40K          Blue              34   
4        Married     $60K - $80K          Blue              21   

   Total_Relationship_Count  Months_Inactive_12_mon  Contacts_Count_12_mon  \
0                         5                       1     

KeyError: "['Attrition_Flag'] not found in axis"

In [22]:
print(test_data_cc)

{"data": [{"Customer_Age": 45, "Gender": "M", "Dependent_count": 3, "Education_Level": "High School", "Marital_Status": "Married", "Income_Category": "$60K - $80K", "Card_Category": "Blue", "Months_on_book": 39, "Total_Relationship_Count": 5, "Months_Inactive_12_mon": 1, "Contacts_Count_12_mon": 3, "Credit_Limit": 12691.0, "Total_Revolving_Bal": 777, "Avg_Open_To_Buy": 11914.0, "Total_Amt_Chng_Q4_Q1": 1.335, "Total_Trans_Amt": 1144, "Total_Trans_Ct": 42, "Total_Ct_Chng_Q4_Q1": 1.625, "Avg_Utilization_Ratio": 0.061}, {"Customer_Age": 49, "Gender": "F", "Dependent_count": 5, "Education_Level": "Graduate", "Marital_Status": "Single", "Income_Category": "Less than $40K", "Card_Category": "Blue", "Months_on_book": 44, "Total_Relationship_Count": 6, "Months_Inactive_12_mon": 1, "Contacts_Count_12_mon": 2, "Credit_Limit": 8256.0, "Total_Revolving_Bal": 864, "Avg_Open_To_Buy": 7392.0, "Total_Amt_Chng_Q4_Q1": 1.541, "Total_Trans_Amt": 1291, "Total_Trans_Ct": 33, "Total_Ct_Chng_Q4_Q1": 3.714, "A

In [23]:
import requests

headers = {'Content-type': 'application/json'}

response = requests.post(webservice.scoring_uri, test_data_cc, headers=headers)

print(response.text)

"{\"result\": [\"Existing Customer\", \"Existing Customer\", \"Existing Customer\", \"Existing Customer\", \"Existing Customer\"]}"


In [26]:
#check previous label
print(cc_lbl)

0    Existing Customer
1    Existing Customer
2    Existing Customer
3    Existing Customer
4    Existing Customer
Name: Attrition_Flag, dtype: object


TODO: In the cell below, print the logs of the web service and delete the service

In [27]:
print(webservice.get_logs())


2021-02-27T22:37:42,285329100+00:00 - gunicorn/run 
2021-02-27T22:37:42,305648600+00:00 - rsyslog/run 
2021-02-27T22:37:42,307382300+00:00 - iot-server/run 
2021-02-27T22:37:42,306466500+00:00 - nginx/run 
/usr/sbin/nginx: /azureml-envs/azureml_6f3791fe7434448b4ebe2b0fd691d644/lib/libcrypto.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_6f3791fe7434448b4ebe2b0fd691d644/lib/libcrypto.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_6f3791fe7434448b4ebe2b0fd691d644/lib/libssl.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_6f3791fe7434448b4ebe2b0fd691d644/lib/libssl.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_6f3791fe7434448b4ebe2b0fd691d644/lib/libssl.so.1.0.0: no version information available (required by /usr/sbin/nginx)
rsyslogd

In [32]:
#delete resources
webservice.delete()
cluster.delete()
