# Using Automated Machine Learning

There are many kinds of machine learning algorithm that you can use to train a model, and sometimes it's not easy to determine the most effective algorithm for your particular data and prediction requirements. Additionally, you can significantly affect the predictive performance of a model by preprocessing the training data, using techniques such as normalization, missing feature imputation, and others. In your quest to find the *best* model for your requirements, you may need to try many combinations of algorithms and preprocessing transformations; which takes a lot of time and compute resources.

Azure Machine Learning enables you to automate the comparison of models trained using different algorithms and preprocessing options. You can use the visual interface in [Azure Machine Learning studio](https://ml/azure.com) or the SDK to leverage this capability. The SDK gives you greater control over the settings for the automated machine learning experiment, but the visual interface is easier to use. In this lab, you'll explore automated machine learning using the SDK.

## Connect to Your Workspace

The first thing you need to do is to connect to your workspace using the Azure ML SDK.

> **Note**: If the authenticated session with your Azure subscription has expired since you completed the previous exercise, you'll be prompted to reauthenticate.

In [None]:
! conda env list

In [None]:
! pip install azureml-sdk
! pip install -r /Users/mariuszrokita/anaconda/envs/automl/lib/python3.6/site-packages/azureml/automl/core/validated_darwin_requirements.txt

In [None]:
#! pip uninstall numpy==1.19.3

In [None]:
# fix
! pip install -I azure-mgmt-resource==10.2.0
! pip install -I cryptography==3.1.1
! pip install -I numpy==1.18.5

In [1]:
import azureml.core
from azureml.core import Workspace
from azureml.core.authentication import InteractiveLoginAuthentication

# Load the workspace from the saved config file
auth = InteractiveLoginAuthentication(tenant_id='')
ws = Workspace.from_config(path='.', auth=auth)
print('Ready to use Azure ML {} to work with {}'.format(azureml.core.VERSION, ws.name))

Ready to use Azure ML 1.17.0 to work with AMLWorkspace


## Prepare Data for Automated Machine Learning

You don't need to create a training script for automated machine learning, but you do need to load the training data. In this case, you'll create a dataset containing details of diabetes patients (just as you did in previous labs), and then split this into two datasets: one for training, and another for model validation.

In [None]:
import pandas as pd

df = pd.read_csv('./data/diabetes.csv')
df.head()

In [2]:
from azureml.core import Dataset

default_ds = ws.get_default_datastore()

if 'diabetes dataset' not in ws.datasets:
    default_ds.upload_files(
        files=['./data/diabetes.csv', './data/diabetes2.csv'], # Upload the diabetes csv files in /data
        target_path='diabetes-data/', # Put it in a folder path in the datastore
        overwrite=True, # Replace existing files of the same name
        show_progress=True)

    #Create a tabular dataset from the path on the datastore (this may take a short while)
    tab_data_set = Dataset.Tabular.from_delimited_files(path=(default_ds, 'diabetes-data/*.csv'))

    # Register the tabular dataset
    try:
        tab_data_set = tab_data_set.register(workspace=ws, 
                                name='diabetes dataset',
                                description='diabetes data',
                                tags = {'format':'CSV'},
                                create_new_version=True)
        print('Dataset registered.')
    except Exception as ex:
        print(ex)
else:
    print('Dataset already registered.')


# Split the dataset into training and validation subsets
diabetes_ds = ws.datasets.get("diabetes dataset")
train_ds, test_ds = diabetes_ds.random_split(percentage=0.7, seed=123)
print("Data ready!")

Dataset already registered.
Data ready!


## Configure Automated Machine Learning

Now you're ready to configure the automated machine learning experiment. To do this, you'll need an AutoML configuration that specifies options like the data to use, how many combinations to try, which metric to use when evaluating models, and so on.

> **Note**: In this example, you'll restrict the experiment to 6 iterations to reduce the amount of time taken. In reality, you'd likely try many more iterations.

In [3]:
from azureml.core.compute import ComputeTarget
from azureml.train.automl import AutoMLConfig

compute_target = 'local'
#compute_target = ComputeTarget(workspace=ws, name='cpu-comp-cluster')

automl_config = AutoMLConfig(
    name='Automated ML Experiment',
    task='classification',
    compute_target=compute_target,
    training_data = train_ds,
    validation_data = test_ds,
    label_column_name='Diabetic',
    iterations=6,
    primary_metric = 'AUC_weighted',
    featurization='auto',
    # keeping reasonable costs
    max_cores_per_iteration=-1,  # use all cores to (theoretically) speed up computations
    max_concurrent_iterations=16, # works only on DSVM
    experiment_timeout_hours=1.0,  # default is 6 days = 144 hours
    experiment_exit_score=0.99,  # terminate experiments, when target score is reached
    enable_early_stopping=True
)

print("Ready for Auto ML run.")

Ready for Auto ML run.


## Run an Automated Machine Learning Experiment

OK, you're ready to go. Let's run the automated machine learning experiment.

> **Note**: This may take some time!

In [None]:
# The Python environment dedicated to AutoML (location: azureml/automl/core/validated_darwin_requirements.txt) 
# does not contain all packages required by this Jupyter Notebook.
#! pip install azureml-widgets==1.16.0

In [4]:
from azureml.core.experiment import Experiment
#from azureml.widgets import RunDetails

print('Submitting Auto ML experiment...')
automl_experiment = Experiment(ws, 'diabetes_automl')
automl_run = automl_experiment.submit(automl_config)
#RunDetails(automl_run).show()
automl_run.wait_for_completion(show_output=True)

Submitting Auto ML experiment...




OSError: dlopen(/Users/mariuszrokita/anaconda/envs/automl/lib/python3.7/site-packages/lightgbm/lib_lightgbm.so, 6): Library not loaded: /usr/local/opt/libomp/lib/libomp.dylib
  Referenced from: /Users/mariuszrokita/anaconda/envs/automl/lib/python3.7/site-packages/lightgbm/lib_lightgbm.so
  Reason: image not found

## Determine the Best Performing Model

When the experiment has completed, view the output in the widget, and click the run that produced the best result to see its details.
Then click the link to view the experiment details in the Azure portal and view the overall experiment details before viewing the details for the individual run that produced the best result. There's lots of information here about the performance of the model generated.

Let's get the best run and the model that it produced.

In [None]:
best_run, fitted_model = automl_run.get_output()
print(best_run)
print(fitted_model)
best_run_metrics = best_run.get_metrics()
for metric_name in best_run_metrics:
    metric = best_run_metrics[metric_name]
    print(metric_name, metric)

Automated machine learning includes the option to try preprocessing the data, which is accomplished through the use of [Scikit-Learn transformation pipelines](https://scikit-learn.org/stable/modules/compose.html#combining-estimators) (not to be confused with Azure Machine Learning pipelines!). These produce models that include steps to transform the data before inferencing. You can view the steps in a model like this:

In [None]:
for step in fitted_model.named_steps:
    print(step)

Finally, having found the best performing model, you can register it.

In [None]:
from azureml.core import Model

# Register model
best_run.register_model(model_path='outputs/model.pkl', model_name='diabetes_model_automl',
                        tags={'Training context':'Auto ML'},
                        properties={'AUC': best_run_metrics['AUC_weighted'], 'Accuracy': best_run_metrics['accuracy']})

# List registered models
for model in Model.list(ws):
    print(model.name, 'version:', model.version)
    for tag_name in model.tags:
        tag = model.tags[tag_name]
        print ('\t',tag_name, ':', tag)
    for prop_name in model.properties:
        prop = model.properties[prop_name]
        print ('\t',prop_name, ':', prop)
    print('\n')

> **More Information**: For more information Automated Machine Learning, see the [Azure ML documentation](https://docs.microsoft.com/azure/machine-learning/how-to-configure-auto-train).