# Automated ML

## Overview

For this notebook we will be importing the dataset, creating an experiment, creating a compute cluster, running the AutoML Experiment and selecting the best AutoML Model

## Dependencies

Importing all the needed dependencies to complete the project

In [2]:
import logging
import os
import csv

from matplotlib import pyplot as plt
import numpy as np
import pandas as pd
from sklearn import datasets
import pkg_resources

import azureml.core
from azureml.core.experiment import Experiment
from azureml.core.workspace import Workspace
from azureml.train.automl import AutoMLConfig
from azureml.core.dataset import Dataset

from azureml.pipeline.steps import AutoMLStep

# Check core SDK version number
print("SDK version:", azureml.core.VERSION)

SDK version: 1.27.0


## Workspace Configuration

In this cell we import the workspace configuration and create an experiment that we will use later.

In [3]:
ws = Workspace.from_config()

print('Workspace name: ' + ws.name, 
      'Azure region: ' + ws.location, 
      'Subscription id: ' + ws.subscription_id, 
      'Resource group: ' + ws.resource_group, sep = '\n')

# choose a name for experiment
experiment_name = 'automlcovid'

experiment=Experiment(ws, experiment_name)
experiment

Workspace name: quick-starts-ws-144102
Azure region: southcentralus
Subscription id: a24a24d5-8d87-4c8a-99b6-91ed2d2df51f
Resource group: aml-quickstarts-144102


Name,Workspace,Report Page,Docs Page
automlcovid,quick-starts-ws-144102,Link to Azure Machine Learning studio,Link to Documentation


## Compute Cluster creation
In this cell a cpu cluster is created for running our experiments, it checks if a compute cluster with the same name exists, if it exists then uses if, if not it creates it.

In [4]:
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException

compute_cluster_name = "cpu-cluster"

try:
    compute_target = ComputeTarget(workspace=ws, name=compute_cluster_name)
    print("Found existing compute cluster...")
except:
    print("Creating new compute cluster...")
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2', max_nodes=4)
    compute_target = ComputeTarget.create(ws, compute_cluster_name, compute_config)
    
compute_target.wait_for_completion(show_output=True)
print("Cluster details: ", compute_target.get_status().serialize())

Found existing compute cluster...
Succeeded
AmlCompute wait for completion finished

Minimum number of nodes requested have been provisioned
Cluster details:  {'currentNodeCount': 0, 'targetNodeCount': 0, 'nodeStateCounts': {'preparingNodeCount': 0, 'runningNodeCount': 0, 'idleNodeCount': 0, 'unusableNodeCount': 0, 'leavingNodeCount': 0, 'preemptedNodeCount': 0}, 'allocationState': 'Steady', 'allocationStateTransitionTime': '2021-05-06T16:14:42.705000+00:00', 'errors': None, 'creationTime': '2021-05-06T16:14:40.188357+00:00', 'modifiedTime': '2021-05-06T16:14:55.535988+00:00', 'provisioningState': 'Succeeded', 'provisioningStateTransitionTime': None, 'scaleSettings': {'minNodeCount': 0, 'maxNodeCount': 4, 'nodeIdleTimeBeforeScaleDown': 'PT120S'}, 'vmPriority': 'Dedicated', 'vmSize': 'STANDARD_D2_V2'}


## Dataset

### Overview
We are going to use the Mexican Government's COVID-19 data, once it is uploaded to ML Studio we will consume it using TabularDataset.

To consume the dataset we will import it from the Datastore tab in ML Studio, we have to specify the location of the csv and then we will import it.

In [5]:
from azureml.data.dataset_factory import TabularDatasetFactory
from azureml.core import Datastore

datastore = Datastore.get(ws,'workspaceblobstore')
ds = TabularDatasetFactory.from_delimited_files(path=(datastore, 'UI/05-06-2021_035443_UTC/210505COVID19MEXICO.csv'))
df = ds.to_pandas_dataframe()
print(df)

        FECHA_ACTUALIZACION ID_REGISTRO  ORIGEN  SECTOR  ENTIDAD_UM  SEXO  \
0                2021-05-05      z482b8       1      12           9     2   
1                2021-05-05      z49a69       1      12          23     1   
2                2021-05-05      z23d9d       1      12          22     2   
3                2021-05-05      z24953       1      12           9     1   
4                2021-05-05      zz8e77       1      12           9     2   
...                     ...         ...     ...     ...         ...   ...   
6697731          2021-05-05      977f22       2      12           9     2   
6697732          2021-05-05      9a7125       2       4          19     1   
6697733          2021-05-05      a86b4e       1       4          31     2   
6697734          2021-05-05      58438f       2       4          11     2   
6697735          2021-05-05      6c86ad       2       4          19     2   

         ENTIDAD_NAC  ENTIDAD_RES  MUNICIPIO_RES  TIPO_PACIENTE  ...  \
0  

### Train/test split

We will divide the dataset between train and test

In [6]:
from sklearn.model_selection import train_test_split
train_data, test_data = train_test_split(df,test_size = 0.2, random_state = 42)
label = 'UCI'

### Converting Train Data to Tabular Dataset
Since the train data is in the pandas Dataframe format, we have to convert it to Tabular Dataset to be used in AutoML

In [7]:
#Convert train_data (Which are in pandas DataFrame format) to TabularDataset format.
try:
    os.makedirs('./data', exist_ok=True)
except OSError as error:
    print('New directory cannot be created')
    
path = 'data/train.csv'
train_data.to_csv(path)

datastore = ws.get_default_datastore()
datastore.upload(src_dir='data', target_path='data')

train_data = TabularDatasetFactory.from_delimited_files(path=[(datastore, ('data/train.csv'))])
print("Successfully converted the dataset to TabularDataset format.")

Uploading an estimated of 1 files
Target already exists. Skipping upload for data/train.csv
Uploaded 0 files
Successfully converted the dataset to TabularDataset format.


In [8]:
print(type(train_data))

<class 'azureml.data.tabular_dataset.TabularDataset'>


## AutoML Configuration

TODO: Explain why you chose the automl settings and cofiguration you used below.

In [9]:
# TODO: Put your automl settings here
automl_settings = {
    "experiment_timeout_minutes": 30,
    "max_concurrent_iterations": 5,
    "primary_metric" : 'accuracy'
}

# TODO: Put your automl config here
automl_config = AutoMLConfig(
    compute_target=compute_target,
    task = "classification",
    training_data=train_data,
    label_column_name=label,   
    enable_early_stopping= True,
    featurization= 'auto',
    debug_log = "automl_errors.log",
    **automl_settings
)

In [10]:
# TODO: Submit your experiment
remote_run = experiment.submit(automl_config, show_output = True)

Submitting remote run.
No run_configuration provided, running on cpu-cluster with default configuration
Running on remote compute: cpu-cluster


Experiment,Id,Type,Status,Details Page,Docs Page
automlcovid,AutoML_a0888a02-8cec-4c9d-9962-5d41b4f0c373,automl,NotStarted,Link to Azure Machine Learning studio,Link to Documentation



Something went wrong while printing the experiment progress but the run is still executing on the compute target. 
Please check portal for updated status: https://ml.azure.com/runs/AutoML_a0888a02-8cec-4c9d-9962-5d41b4f0c373?wsid=/subscriptions/a24a24d5-8d87-4c8a-99b6-91ed2d2df51f/resourcegroups/aml-quickstarts-144102/workspaces/quick-starts-ws-144102&tid=660b3398-b80e-49d2-bc5b-ac1dc93b5254


## Run Details

OPTIONAL: Write about the different models trained and their performance. Why do you think some models did better than others?

TODO: In the cell below, use the `RunDetails` widget to show the different experiments.

## Best Model

TODO: In the cell below, get the best model from the automl experiments and display all the properties of the model.



In [None]:
#TODO: Save the best model

## Model Deployment

Remember you have to deploy only one of the two models you trained.. Perform the steps in the rest of this notebook only if you wish to deploy this model.

TODO: In the cell below, register the model, create an inference config and deploy the model as a web service.

TODO: In the cell below, send a request to the web service you deployed to test it.

TODO: In the cell below, print the logs of the web service and delete the service