# Automated ML

TODO: Import Dependencies. In the cell below, import all the dependencies that you will need to complete the project.

In [1]:
import numpy as np
import pandas as pd
import requests
import joblib
import json
import logging

from sklearn.metrics import confusion_matrix
from sklearn.utils import resample
from sklearn.model_selection import train_test_split


from azureml.core.environment import Environment
from azureml.core.conda_dependencies import CondaDependencies
from azureml.core.experiment import Experiment
from azureml.core.workspace import Workspace
from azureml.core.dataset import Dataset
from azureml.data.dataset_factory import TabularDatasetFactory
from azureml.core.compute import AmlCompute
from azureml.core.compute import ComputeTarget
from azureml.core.compute_target import ComputeTargetException
from azureml.core.model import Model
from azureml.widgets import RunDetails
from azureml.train.automl import AutoMLConfig
from azureml.core.webservice import LocalWebservice
from azureml.core.webservice import Webservice, AciWebservice
from azureml.core.model import InferenceConfig
from azureml.automl.core.shared import constants


## Dataset

### Overview

The credit approval data is taken from UCI data repository. We will use the data to train the model and predict to approve credit or not.

All attribute names and values have been changed to meaningless symbols to protect confidentiality of the data. 

This dataset is interesting because there is a good mix of attributes -- continuous, nominal with small numbers of values, and nominal with larger numbers of values. There are also a few missing values.

Attribute Information:


- A1: b, a.
- A2: continuous. 
- A3: continuous. 
- A4: u, y, l, t. 
- A5: g, p, gg. 
- A6: c, d, cc, i, j, k, m, r, q, w, x, e, aa, ff. 
- A7: v, h, bb, j, n, z, dd, ff, o. 
- A8: continuous. 
- A9: t, f. 
- A10: t, f. 
- A11: continuous. 
- A12: t, f. 
- A13: g, p, s. 
- A14: continuous. 
- A15: continuous. 
- A16: +,- (class attribute)

Data has 296 approved and 357 rejected applications.

This dataset is public and available for research.
Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.


In [2]:
def get_data():
    columns = ['A'+str(i) for i in range(1,17)]
    df = pd.read_csv('data.csv', header = None, names = columns )
    df.dropna(inplace = True)

    # Removing Null
    for column in df.columns:
        df = df[df[column] != '?']

    # Converting attributes to binary
    df['A16'] = df['A16'].map({'+': 1, '-':0})    
    df['A1']  = df['A1'].map({'b': 1, 'a':0})
    df['A9']  = df['A9'].map({'t': 1, 'f':0})
    df['A10'] = df['A10'].map({'t': 1, 'f':0})
    df['A12'] = df['A12'].map({'t': 0, 'f':1})
    df['A13'] = df['A13'].map({'g': 'mm', 'p':'kk'})

    # Conversting categorical data into onehot encoding
    cat_columns = ['A4', 'A5', 'A6', 'A7', 'A13']
    for column in cat_columns:
        dummies = pd.get_dummies(df[column])
        df[dummies.columns] = dummies
        df.drop(columns = column,inplace = True)
    df = df.astype(float)
    df['A16'] = df['A16'].astype(int)
    return df

In [3]:
ws = Workspace.from_config()

# choose a name for experiment
experiment_name = 'credit_automl'

experiment=Experiment(ws, experiment_name)

In [4]:
dataset_name = 'credit_approval_data'

# Check Registry
if dataset_name in ws.datasets.keys(): 
    dataset = ws.datasets[dataset_name] 

    print(f'Located {dataset_name} in Registry.')

# Read from UCI ML Repository
else:
    df = get_data()
    
    # Register the dataset
    datastore = ws.get_default_datastore()

    dataset = Dataset.Tabular.register_pandas_dataframe(
        dataframe=df, 
        name=dataset_name, 
        description='A dataset of credit approved and rejected applications',
        target=datastore
    )
    

Located credit_approval_data in Registry.


In [5]:
# Spliting the DataFrame into training and testing datasets
df = get_data()
train, test = train_test_split(df, random_state=99, shuffle=True, test_size=0.2)
datastore = ws.get_default_datastore()
train = Dataset.Tabular.register_pandas_dataframe(
        dataframe=train, 
        name='df_train', 
        description='training dataset',
        target=datastore
    )

Validating arguments.
Arguments validated.
Successfully obtained datastore reference and path.
Uploading file to managed-dataset/fce9acc5-f4e2-4b79-a175-4d8587cd7136/
Successfully uploaded file to datastore.
Creating and registering a new dataset.
Successfully created and registered a new dataset.


## AutoML Configuration

- The experiment_timeout_minutes (20 mins), iterations (50) and enable_early_stopping to reduce time taken for model training. 
- Enabling early stopping allows the training process to conclude if there is no considerable improvement to the primary_metric.
- cross_validations:5 is set to prevent overfitting.
- The max_concurrent_iterations: 4 is used to allow jobs to be run in parallel on each node.

In [6]:
automl_settings = {
    "experiment_timeout_minutes":20,
    "enable_early_stopping":True,    
    "primary_metric":'accuracy',
    "n_cross_validations":5,
    "iterations":50,
    "max_concurrent_iterations":4,
    "verbosity": logging.INFO
}

compute_target = 'ml-compute'

automl_config = AutoMLConfig(
    compute_target=compute_target, 
    task='classification', 
    training_data=train,
    label_column_name='A16',
    **automl_settings)


In [7]:
# TODO: Submit your experiment
remote_run = experiment.submit(automl_config, show_output=True)

Submitting remote run.
No run_configuration provided, running on ml-compute with default configuration
Running on remote compute: ml-compute


Experiment,Id,Type,Status,Details Page,Docs Page
credit_automl,AutoML_4c70625d-e9c1-483e-991e-d70b703d9005,automl,NotStarted,Link to Azure Machine Learning studio,Link to Documentation



Current status: FeaturesGeneration. Generating features for the dataset.
Current status: DatasetCrossValidationSplit. Generating individually featurized CV splits.
Current status: ModelSelection. Beginning model selection.

********************************************************************************************
DATA GUARDRAILS: 

TYPE:         Class balancing detection
STATUS:       PASSED
DESCRIPTION:  Your inputs were analyzed, and all classes are balanced in your training data.
              Learn more about imbalanced data: https://aka.ms/AutomatedMLImbalancedData

********************************************************************************************

TYPE:         Missing feature values imputation
STATUS:       PASSED
DESCRIPTION:  No feature missing values were detected in the training data.
              Learn more about missing value imputation: https://aka.ms/AutomatedMLFeaturization

**********************************************************************************

## Run Details

OPTIONAL: Write about the different models trained and their performance. Why do you think some models did better than others?

TODO: In the cell below, use the `RunDetails` widget to show the different experiments.

In [24]:
RunDetails(remote_run).show()

_AutoMLWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', 's…

## Best Model

TODO: In the cell below, get the best model from the automl experiments and display all the properties of the model.



In [9]:
status = remote_run.wait_for_completion()


In [12]:
best_run, fitted_model = remote_run.get_output()


# Best Run
print("Best Run: ")
print("")

# Accuracy
print("Accuracy: " + str(best_run.get_metrics()['accuracy']))
print("")

# Metrics
print("Metrics: " + str(best_run.get_metrics()))
print("")

# Name
print("Name: " + str(best_run.properties['model_name']))
print("")

# Tags
print("Tags: " + str(best_run.get_tags()))
print("")

Best Run: 

Accuracy: 0.9023992673992675

Metrics: {'f1_score_micro': 0.9023992673992675, 'weighted_accuracy': 0.9036463020905654, 'matthews_correlation': 0.8009939870832745, 'precision_score_weighted': 0.9050706239371443, 'accuracy': 0.9023992673992675, 'precision_score_micro': 0.9023992673992675, 'recall_score_weighted': 0.9023992673992675, 'f1_score_macro': 0.8993724630951572, 'f1_score_weighted': 0.9026710543703869, 'average_precision_score_weighted': 0.9437113652458711, 'average_precision_score_micro': 0.9434021307495362, 'AUC_macro': 0.9443100281396278, 'AUC_micro': 0.9474915921990098, 'AUC_weighted': 0.9443100281396278, 'precision_score_macro': 0.8999629238159503, 'recall_score_micro': 0.9023992673992675, 'balanced_accuracy': 0.9011168380184564, 'norm_macro_recall': 0.8022336760369132, 'log_loss': 0.3487608996965151, 'average_precision_score_macro': 0.9419454119187088, 'recall_score_macro': 0.9011168380184564, 'confusion_matrix': 'aml://artifactId/ExperimentRun/dcid.AutoML_4c706

In [15]:
#Save the best model
joblib.dump(fitted_model, filename='automl_best_model.joblib')

# Register the best model
model_name = best_run.properties['model_name']
model = remote_run.register_model(model_name=model_name)


## Model Deployment

Remember you have to deploy only one of the two models you trained but you still need to register both the models. Perform the steps in the rest of this notebook only if you wish to deploy this model.

TODO: In the cell below, register the model, create an inference config and deploy the model as a web service.

In [16]:
best_run.get_file_names()

['accuracy_table',
 'automl_driver.py',
 'confusion_matrix',
 'explanation/d46091fb/classes.interpret.json',
 'explanation/d46091fb/eval_data_viz.interpret.json',
 'explanation/d46091fb/expected_values.interpret.json',
 'explanation/d46091fb/features.interpret.json',
 'explanation/d46091fb/global_names/0.interpret.json',
 'explanation/d46091fb/global_rank/0.interpret.json',
 'explanation/d46091fb/global_values/0.interpret.json',
 'explanation/d46091fb/local_importance_values.interpret.json',
 'explanation/d46091fb/per_class_names/0.interpret.json',
 'explanation/d46091fb/per_class_rank/0.interpret.json',
 'explanation/d46091fb/per_class_values/0.interpret.json',
 'explanation/d46091fb/rich_metadata.interpret.json',
 'explanation/d46091fb/true_ys_viz.interpret.json',
 'explanation/d46091fb/visualization_dict.interpret.json',
 'explanation/d46091fb/ys_pred_proba_viz.interpret.json',
 'explanation/d46091fb/ys_pred_viz.interpret.json',
 'explanation/eabb39a8/classes.interpret.json',
 'expl

In [17]:
best_run.download_file('outputs/scoring_file_v_1_0_0.py','score.py')

In [18]:
environment = best_run.get_environment().save_to_directory(path='environment')
entry_script= 'score.py'
inference_config = InferenceConfig(entry_script = entry_script, environment=environment)
deployment_config = AciWebservice.deploy_configuration(cpu_cores=1, memory_gb=1)

service = Model.deploy(workspace=ws,
                    name="credit-webservice",
                    models=[model],
                    inference_config=inference_config,
                    deployment_config=deployment_config)

service.wait_for_deployment(show_output=True)

Tips: You can try get_logs(): https://aka.ms/debugimage#dockerlog or local deployment: https://aka.ms/debugimage#debug-locally to debug if deployment takes longer than 10 minutes.
Running
2023-02-09 00:57:15+00:00 Creating Container Registry if not exists..
2023-02-09 01:07:16+00:00 Use the existing image.
2023-02-09 01:07:17+00:00 Submitting deployment to compute..
2023-02-09 01:07:22+00:00 Checking the status of deployment credit-webservice..
2023-02-09 01:09:26+00:00 Checking the status of inference endpoint credit-webservice.
Succeeded
ACI service creation operation finished, operation "Succeeded"


TODO: In the cell below, send a request to the web service you deployed to test it.

In [19]:
url = service.scoring_uri
print(url)

test_payloads = [
    {
            "A1": 1,
            "A2": 29.58,
            "A3": 4.5,
            "A8": 7.5,
            "A9": 1,
            "A10": 1,
            "A11": 2,
            "A12": 0,
            "A14": 330,
            "A15": 0,
            "l": 0,
            "u": 1,
            "y": 0,
            "g": 1,
            "gg": 0,
            "p": 0,
            "aa": 0,
            "c": 0,
            "cc": 0,
            "d": 0,
            "e": 0,
            "ff": 0,
            "i": 0,
            "j": 0,
            "k": 0,
            "m": 0,
            "q": 0,
            "r": 0,
            "w": 1,
            "x": 0,
            "bb": 0,
            "dd": 0,
            "h": 0,
            "n": 0,
            "o": 0,
            "v": 1,
            "z": 0,
            "kk": 0,
            "mm": 1
    },
    {
            "A1": 1,
            "A2": 27.42,
            "A3": 12.5,
            "A8": 0.25,
            "A9": 0,
            "A10": 0,
            "A11": 0,
            "A12": 0,
            "A14": 720,
            "A15": 0,
            "l": 0,
            "u": 1,
            "y": 0,
            "g": 1,
            "gg": 0,
            "p": 0,
            "aa": 1,
            "c": 0,
            "cc": 0,
            "d": 0,
            "e": 0,
            "ff": 0,
            "i": 0,
            "j": 0,
            "k": 0,
            "m": 0,
            "q": 0,
            "r": 0,
            "w": 0,
            "x": 0,
            "bb": 1,
            "dd": 0,
            "h": 0,
            "n": 0,
            "o": 0,
            "v": 0,
            "z": 0,
            "kk": 0,
            "mm": 1
    }
]

for payload in test_payloads:

    test_payload = {"data":[payload]}
    input_data = json.dumps(test_payload)
    print(input_data)

    output = requests.post(url, input_data, headers={'Content-Type': 'application/json'})
    print(output.json())

http://964115c1-61c1-4586-8597-376db82ab1ec.southcentralus.azurecontainer.io/score
{"data": [{"A1": 1, "A2": 29.58, "A3": 4.5, "A8": 7.5, "A9": 1, "A10": 1, "A11": 2, "A12": 0, "A14": 330, "A15": 0, "l": 0, "u": 1, "y": 0, "g": 1, "gg": 0, "p": 0, "aa": 0, "c": 0, "cc": 0, "d": 0, "e": 0, "ff": 0, "i": 0, "j": 0, "k": 0, "m": 0, "q": 0, "r": 0, "w": 1, "x": 0, "bb": 0, "dd": 0, "h": 0, "n": 0, "o": 0, "v": 1, "z": 0, "kk": 0, "mm": 1}]}
{"result": [1]}
{"data": [{"A1": 1, "A2": 27.42, "A3": 12.5, "A8": 0.25, "A9": 0, "A10": 0, "A11": 0, "A12": 0, "A14": 720, "A15": 0, "l": 0, "u": 1, "y": 0, "g": 1, "gg": 0, "p": 0, "aa": 1, "c": 0, "cc": 0, "d": 0, "e": 0, "ff": 0, "i": 0, "j": 0, "k": 0, "m": 0, "q": 0, "r": 0, "w": 0, "x": 0, "bb": 1, "dd": 0, "h": 0, "n": 0, "o": 0, "v": 0, "z": 0, "kk": 0, "mm": 1}]}
{"result": [0]}


TODO: In the cell below, print the logs of the web service and delete the service

In [20]:
print(service.get_logs())



2023-02-09T01:09:15,393255500+00:00 - rsyslog/run 
2023-02-09T01:09:15,392776200+00:00 - iot-server/run 
2023-02-09T01:09:15,401935300+00:00 - gunicorn/run 
2023-02-09T01:09:15,404157400+00:00 | gunicorn/run | 
2023-02-09T01:09:15,410723300+00:00 | gunicorn/run | ###############################################
2023-02-09T01:09:15,413012900+00:00 | gunicorn/run | AzureML Container Runtime Information
2023-02-09T01:09:15,419517600+00:00 | gunicorn/run | ###############################################
2023-02-09T01:09:15,421969600+00:00 | gunicorn/run | 
2023-02-09T01:09:15,434857300+00:00 - nginx/run 
2023-02-09T01:09:15,437778100+00:00 | gunicorn/run | 
2023-02-09T01:09:15,452772600+00:00 | gunicorn/run | AzureML image information: openmpi4.1.0-ubuntu20.04, Materializaton Build:20230120.v2
2023-02-09T01:09:15,458755900+00:00 | gunicorn/run | 
2023-02-09T01:09:15,469358300+00:00 | gunicorn/run | 
2023-02-09T01:09:15,474262800+00:00 | gunicorn/run | PATH environment variable: /azureml-env

In [26]:
service.delete()

**Submission Checklist**
- I have registered the model.
- I have deployed the model with the best accuracy as a webservice.
- I have tested the webservice by sending a request to the model endpoint.
- I have deleted the webservice and shutdown all the computes that I have used.
- I have taken a screenshot showing the model endpoint as active.
- The project includes a file containing the environment details.
