# Automate machine learning model selection with Azure Machine Learning

## Intro
*Automated Machine Learning* enables you to try multiple algorithms and preprocessing transformations with your data. This, combined with scalable cloud-based compute makes it possible to find the best performing model for your data without the huge amount of time-consuming manual trial and error that would otherwise be required.

Azure ML includes support for automated machine learning through a visual interface in Azure Machine Learning studio for *Enterprise* edition workspaces only. You can use the Azure ML SDK to run automated machine learning experiments in either *Basic* or *Enterprise* edition workspaces

## Learning objectives

In this module, you will learn how to:

- Use Azure Machine Learning's automated machine learning capabilities to determine the best performing algorithm for your data.
- Use automated machine learning to preprocess data for training.
- Run an automated machine learning experiment.

# Automated machine learning tasks and algorithms

[Find the extensive list of Classification, Regression and Time Series Algorithms here](https://docs.microsoft.com/en-us/learn/modules/automate-model-selection-with-azure-automl/2-automl-algorithms)

## Restricting Algorithm Selection
By default, automated machine learning will randomly select from the full range of algorithms for the specified task. You can choose to block individual algorithms from being selected; which can be useful if you know that your data is not suited to a particular type of algorithm, or you have to comply with a policy that restricts the type of machine learning algorithms you can use in your organization.

Further, more detail can be found in the [documentation](https://aka.ms/AA70rrr)

# Preprocessing and featurization

As well as trying a selection of algorithms, automated machine learning can apply preprocessing transformations to your data; improving the performance of the model.

## Scaling and Normalization
Automated machine learning applies scaling and normalization to numeric data automatically, helping prevent any large-scale features from dominating training. During an automated machine learning experiment, multiple scaling or normalization techniques will be applied.

## Optional Featurization
You can choose to have automated machine learning apply preprocessing transformations such as:

- Missing value imputation to eliminate nulls in the training dataset.
- Categorical encoding to convert categorical features to numeric indicators.
- Dropping high-cardinality features, such as record IDs.
- Feature engineering (for example, deriving individual date parts from DateTime features)
- Others...

More Information: For more information about the preprocessing support in automated machine learning , see What is automated machine learning in the documentation.



[What is AutoML](https://docs.microsoft.com/en-us/azure/machine-learning/concept-automated-ml#preprocessing)

# Configuring an AutoML Experiment
Use the `AutoMLConfig` class as shown:

from azureml.train.automl import AutoMLConfig

automl_run_config = RunConfiguration(framework='python')
automl_config = AutoMLConfig(name='Automated ML Experiment',
                             task='classification',
                             primary_metric = 'AUC_weighted',
                             compute_target=aml_compute,
                             training_data = train_dataset,
                             validation_data = test_dataset,
                             label_column_name='Label',
                             featurization='auto',
                             iterations=12,
                             max_concurrent_iterations=4)

## Specifying Data for Training
Automated machine learning is designed to enable you to simply bring your data, and have Azure Machine Learning figure out how best to train a model from it.

When using the Automated Machine Learning user interface in Azure Machine Learning studio, you can create or select an Azure Machine Learning dataset to be used as the input for your automated machine learning experiment.

When using the SDK to run an automated machine learning experiment, you can submit the data in the following ways:

- Specify a dataset or dataframe of training data that includes features and the label to be predicted.
- Optionally, specify a second validation data dataset or dataframe that will be used to validate the trained model. if this is not provided, Azure Machine Learning will apply cross-validation using the training data.

Alternatively:

- Specify a dataset, dataframe, or numpy array of X values containing the training features, with a corresponding y array of label values.
- Optionally, specify X_valid and y_valid datasets, dataframes, or numpy arrays of X_valid values to be used for validation.

## Specifying the Primary Metric
One of the most important settings you must specify is the `primary_metric`. This is the target performance metric for which the optimal model will be determined. Azure Machine Learning supports a set of named metrics for each type of task. To retrieve the list of metrics available for a particular task type, you can use the `get_primary_metrics` function as shown here:

In [None]:
from azureml.train.automl.utilities import get_primary_metrics

get_primary_metrics('classification')

Find a full list of primary metrics and their definitions in [Understand Automated machine learning results](https://aka.ms/AA70rrw)

## Submitting an Automated Machine Learning Experiment
You can submit an automated machine learning experiment like any other SDK-based experiment:

In [None]:
from azureml.core.experiment import Experiment

automl_experiment = Experiment(ws, 'automl_experiment')
automl_run = automl_experiment.submit(automl_config)

## Retrieving the Best Run and its Model
You can easily identify the best run in Azure Machine Learning studio, and download or deploy the model it generated. To accomplish this programmatically with the SDK, you can use code like the following example:

In [None]:
best_run, fitted_model = automl_run.get_output()
best_run_metrics = best_run.get_metrics()
for metric_name in best_run_metrics:
    metric = best_run_metrics[metric_name]
    print(metric_name, metric)

## Exploring Preprocessing Steps
Automated machine learning uses scikit-learn pipelines to encapsulate preprocessing steps with the model. You can view the steps in the fitted model you obtained from the best run using the code above like this:

In [None]:
for step_ in fitted_model.named_steps:
    print(step)