# Using Automated Machine Learning
There are many kinds of machine learning algorithm that you can use to train a model, and sometimes it's not easy to determine the most effective algorithm for your particular data and prediction requirements. Additionally, you can significantly affect the predictive performance of a model by preprocessing the training data, using techniques such as normalization, missing feature imputation, and others. In your quest to find the best model for your requirements, you may need to try many combinations of algorithms and preprocessing transformations; which takes a lot of time and compute resources.

Azure Machine Learning enables you to automate the comparison of models trained using different algorithms and preprocessing options. You can use the visual interface in Azure Machine Learning studio or the SDK to leverage this capability. His SDK gives you greater control over the settings for the automated machine learning experiment, but the visual interface is easier to use. In this notebook, you'll explore automated machine learning using the SDK.

## Imports

In [None]:
import pandas as pd
import warnings
warnings.filterwarnings('ignore')

from azureml.core import Workspace
from azureml.core.experiment import Experiment
from azureml.widgets import RunDetails
from azureml.train.automl import AutoMLConfig
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, roc_auc_score, recall_score, f1_score

## Initialize Workspace

The first thing you need to do is to connect to your workspace using the Azure ML SDK.

Note: If the authenticated session with your Azure subscription has expired since you completed the previous exercise, you'll be prompted to reauthenticate.

In [None]:
ws = Workspace.from_config("../notebooks-settings/config.json")
print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\n')

### Prepare Data for Automated Machine Learning
You don't need to create a training script for automated machine learning, but you do need to load the training data. In this case, you'll create a dataset containing details of heart-disease patients (just as you did in previous labs), and then split this into two datasets: one for training, and another for model validation.

In [None]:
SPLIT_SEED = 42
SPLIT_PERCENTAGE = 0.7
DATASET_PATH = '../../dataset/uci_dataset.csv'

df = pd.read_csv(DATASET_PATH)
train_ds, test_ds = train_test_split(
        df, test_size=SPLIT_PERCENTAGE, random_state=SPLIT_SEED)

## Configure Automated Machine Learning

Now you're ready to configure the automated machine learning experiment. To do this, you'll need a run configuration that includes the required packages for the experiment environment, and a set of configuration settings that specifies how many combinations to try, which metric to use when evaluating models, and so on.

Note: In this example, you'll run the automated machine learning experiment on local compute to avoid waiting for a cluster to start. This will cause each iteration (child-run) to run serially rather than in parallel. For this reason, we're restricting the experiment to 6 iterations to reduce the amount of time taken. In reality, you'd likely try many more iterations on a compute cluster.

In [None]:
automl_config = AutoMLConfig(name='Automated ML Experiment',
                                 task='classification',
                                 training_data=train_ds,
                                 validation_data=test_ds,
                                 label_column_name='target',
                                 iterations=3,
                                 primary_metric='AUC_weighted',
                                 max_concurrent_iterations=3,
                                 featurization='auto',
                                 model_explainability=True
                                 )

## Run an Automated Machine Learning Experiment
OK, you're ready to go. Let's run the automated machine learning experiment.

In [None]:
automl_experiment = Experiment(ws, 'automl-classification')
automl_run = automl_experiment.submit(automl_config)
automl_run.wait_for_completion(show_output=True)
RunDetails(automl_run).show()

## Determine the Best Performing Model
When the experiment has completed, view the output in the widget, and click the run that produced the best result to see its details. Then click the link to view the experiment details in the Azure portal and view the overall experiment details before viewing the details for the individual run that produced the best result. There's lots of information here about the performance of the model generated.

Let's get the best run and the model that it produced.

In [None]:
best_run, fitted_model = automl_run.get_output()
best_run_metrics = best_run.get_metrics()

In [None]:
def test_model(fitted_model, test_ds):
    y = test_ds['target']
    X = test_ds.drop(['target'], axis=1)
    
    y_pred = fitted_model.predict(X)
    
    return {
        'AUC': roc_auc_score(y, y_pred),
        'Accuracy': accuracy_score(y, y_pred),
        'Recall': recall_score(y, y_pred),
        'F1': f1_score(y, y_pred),
    }

In [None]:
local_metrics = test_model(fitted_model, test_ds)

In [None]:
print("AutoML metrics")
for metric_name in best_run_metrics:
    print(f'{metric_name}: {best_run_metrics[metric_name]}')

print("\nLocal test metrics")
for local_metric in local_metrics:
    print(f'{local_metric}: {local_metrics[local_metric]}')