---
title: Use automated ML in ML pipelines 
titleSuffix: Azure Machine Learning
description: The AutoMLStep allows you to use automated machine learning in your pipelines.
services: machine-learning
ms.service: machine-learning
ms.subservice: core
ms.topic: conceptual
ms.author: laobri
author: lobrien
manager: cgronlun
ms.date: 04/28/2020

---

# Use automated ML in an Azure Machine Learning pipeline in Python
[!INCLUDE [applies-to-skus](../../includes/aml-applies-to-basic-enterprise-sku.md)]

Azure Machine Learning's automated ML capability helps you discover high-performing models without you reimplementing every possible approach. Combined with Azure Machine Learning pipelines, you can create deployable workflows that can quickly discover the algorithm that works best for your data. This article will show you how to efficiently join a data preparation step to an automated ML step. Automated ML can quickly discover the algorithm that works best for your data, while putting you on the road to MLOps and model lifecycle operationalization with pipelines.

## Prerequisites

* An Azure subscription. If you don't have an Azure subscription, create a free account before you begin. Try the [free or paid version of Azure Machine Learning](https://aka.ms/AMLFree) today.

* An Azure Machine Learning workspace. See [Create an Azure Machine Learning workspace](how-to-manage-workspace.md).  

* Basic familiarity with Azure's [automated machine learning](concept-automated-ml.md) and [machine learning pipelines](concept-ml-pipelines.md) facilities and SDK.

## Review automated ML's central classes

Automated ML in a pipeline is represented by an `AutoMLStep` object. The `AutoMLStep` class is a subclass of `PipelineStep`. A graph of `PipelineStep` objects defines a `Pipeline`.

There are several subclasses of `PipelineStep`. In addition to the `AutoMLStep`, this article will show a `PythonScriptStep` for data preparation and another for registering the model.

The preferred way to initially move data _into_ an ML pipeline is with `Dataset` objects. To move data _between_ steps, the preferred way is with `PipelineData` objects. To be used with `AutoMLStep`, the `PipelineData` object must be transformed into a `PipelineOutputTabularDataset` object. For more information, see [Input and output data from ML pipelines](how-to-move-data-in-out-of-pipelines.md).

The `AutoMLStep` is configured via an `AutoMLConfig` object. `AutoMLConfig` is a flexible class, as discussed in [Configure automated ML experiments in Python](https://docs.microsoft.com/azure/machine-learning/how-to-configure-auto-train#configure-your-experiment-settings). 

A `Pipeline` runs in an `Experiment`. The pipeline `Run` has, for each step, a child `StepRun`. The outputs of the automated ML `StepRun` are the training metrics and highest-performing model.

To make things concrete, this article creates a simple pipeline for a classification task. The task is predicting Titanic survival, but we won't be discussing the data or task except in passing.

## Get started

### Retrieve initial dataset

Often, an ML workflow starts with pre-existing baseline data. This is a good scenario for a registered dataset. Datasets are visible across the workspace, support versioning, and can be interactively explored. There are many ways to create and populate a dataset, as discussed in [Create Azure Machine Learning datasets](how-to-create-register-datasets.md). Since we'll be using the Python SDK to create our pipeline, use the SDK to download baseline data and register it with the name 'titanic_ds'.

In [2]:
from azureml.core import Workspace, Dataset
from azureml.core.authentication import InteractiveLoginAuthentication
import os

ws = Workspace.from_config(auth=InteractiveLoginAuthentication(tenant_id=os.environ["AML_TENANT_ID"]))
#ws = Workspace.from_config()
if not 'titanic_ds' in ws.datasets.keys() :
    # create a TabularDataset from Titanic training data
    web_paths = ['https://dprepdata.blob.core.windows.net/demo/Titanic.csv',
                 'https://dprepdata.blob.core.windows.net/demo/Titanic2.csv']
    titanic_ds = Dataset.Tabular.from_delimited_files(path=web_paths)

    titanic_ds.register(workspace = ws,
                                     name = 'titanic_ds',
                                     description = 'Titanic baseline data',
                                     create_new_version = True)

titanic_ds = Dataset.get_by_name(ws, 'titanic_ds')

The code first logs in to the Azure Machine Learning workspace defined in **config.json** (for an explanation, see [Tutorial: Get started creating your first ML experiment with the Python SDK](tutorial-1st-experiment-sdk-setup.md)). If there isn't already a dataset named `'titanic_ds'` registered, then it creates one. The code downloads CSV data from the Web, uses them to instantiate a `TabularDataset` and then registers the dataset with the workspace. Finally, the function `Dataset.get_by_name()` assigns the `Dataset` to `titanic_ds`. 

### Configure your storage and compute target

Additional resources that the pipeline will need are storage and, generally, Azure Machine Learning compute resources.

In [10]:
from azureml.core import Datastore
from azureml.core.compute import AmlCompute, ComputeTarget

datastore = ws.get_default_datastore()

compute_name = 'cpu-compute3'
if not compute_name in ws.compute_targets :
    print('creating a new compute target...')
    provisioning_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2',
                                                                min_nodes=0,
                                                                max_nodes=1)
    compute_target = ComputeTarget.create(ws, compute_name, provisioning_config)

    compute_target.wait_for_completion(
        show_output=True, min_node_count=None, timeout_in_minutes=20)

    # Show the result
    print(compute_target.get_status().serialize())

compute_target = ws.compute_targets[compute_name]

The intermediate data between the data preparation and the automated ML step can be stored in the workspace's default datastore, so we don't need to do more than call `get_default_datastore()` on the `Workspace` object. 

After that, the code checks if the AML compute target `'cpu-cluster'` already exists. If not, we specify that we want a small CPU-based compute target. If you plan to use automated ML's deep learning features (for instance, text featurization with DNN support) you should choose a compute with strong GPU support, as described in [GPU optimized virtual machine sizes](https://docs.microsoft.com/azure/virtual-machines/sizes-gpu). 

The code blocks until the target is provisioned and then prints some details of the just-created compute target. Finally, the named compute target is retrieved from the workspace and assigned to `compute_target`. 

### Configure the training run

The next step is making sure that the remote training run has all the dependencies that are required by the training steps. Dependencies and the runtime context are set by creating and configuring a `RunConfiguration` object.

In [11]:
from azureml.core.runconfig import RunConfiguration
from azureml.core.conda_dependencies import CondaDependencies

aml_run_config = RunConfiguration()
# Use just-specified compute target ("cpu-cluster")
aml_run_config.target = compute_target
aml_run_config.environment.python.user_managed_dependencies = False

# Add some packages relied on by data prep step
aml_run_config.environment.python.conda_dependencies = CondaDependencies.create(
    conda_packages=['pandas','scikit-learn'], 
    pip_packages=['azureml-sdk[automl,explain]', 'azureml-dataprep[fuse,pandas]'], 
    pin_sdk_version=False)

## Prepare data for automated machine learning

### Write the data preparation code

The baseline Titanic dataset consists of mixed numerical and text data, with some values missing. To prepare it for automated machine learning, the data preparation pipeline step will:

- Fill missing data with either random data or a category corresponding to "Unknown"
- Transform categorical data to integers
- Drop columns that we don't intend to use
- Split the data into training and testing sets
- Write the transformed data to the `PipelineData` output paths

In [12]:
%%writefile dataprep.py
from azureml.core import Run

import pandas as pd 
import numpy as np 
import pyarrow as pa
import pyarrow.parquet as pq
import argparse

RANDOM_SEED=42

def prepare_age(df):
    # Fill in missing Age values from distribution of present Age values 
    mean = df["Age"].mean()
    std = df["Age"].std()
    is_null = df["Age"].isnull().sum()
    # compute enough (== is_null().sum()) random numbers between the mean, std
    rand_age = np.random.randint(mean - std, mean + std, size = is_null)
    # fill NaN values in Age column with random values generated
    age_slice = df["Age"].copy()
    age_slice[np.isnan(age_slice)] = rand_age
    df["Age"] = age_slice
    df["Age"] = df["Age"].astype(int)
    
    # Quantize age into 5 classes
    df['Age_Group'] = pd.qcut(df['Age'],5, labels=False)
    df.drop(['Age'], axis=1, inplace=True)
    return df

def prepare_fare(df):
    df['Fare'].fillna(0, inplace=True)
    df['Fare_Group'] = pd.qcut(df['Fare'],5,labels=False)
    df.drop(['Fare'], axis=1, inplace=True)
    return df 

def prepare_genders(df):
    genders = {"male": 0, "female": 1, "unknown": 2}
    df['Sex'] = df['Sex'].map(genders)
    df['Sex'].fillna(2, inplace=True)
    df['Sex'] = df['Sex'].astype(int)
    return df

def prepare_embarked(df):
    df['Embarked'].replace('', 'U', inplace=True)
    df['Embarked'].fillna('U', inplace=True)
    ports = {"S": 0, "C": 1, "Q": 2, "U": 3}
    df['Embarked'] = df['Embarked'].map(ports)
    return df
    
parser = argparse.ArgumentParser()
parser.add_argument('--output_path', dest='output_path', required=True)
args = parser.parse_args()
    
titanic_ds = Run.get_context().input_datasets['titanic_ds']
df = titanic_ds.to_pandas_dataframe().drop(['PassengerId', 'Name', 'Ticket', 'Cabin'], axis=1)
df = prepare_embarked(prepare_genders(prepare_fare(prepare_age(df))))

os.makedirs(os.path.dirname(args.output_path), exist_ok=True)
pq.write_table(pa.Table.from_pandas(df), args.output_path)

print(f"Wrote test to {args.output_path} and train to {args.output_path}")

Overwriting dataprep.py


The above code snippet is a complete, but minimal, example of data preparation for the Titanic data. The snippet starts with a Jupyter "magic command" to output the code to a file. If you aren't using a Jupyter notebook, remove that line and create the file manually.

The various `prepare_` functions in the above snippet modify the relevant column in the input dataset. These functions work on the data once it has been changed into a Pandas `DataFrame` object. In each case, missing data is either filled with representative random data or categorical data indicating "Unknown." Text-based categorical data is mapped to integers. No-longer-needed columns are overwritten or dropped. 

After the code defines the data preparation functions, the code parses the input argument, which is the path to which we want to write our data. (These values will be determined by `PipelineData` objects that will be discussed in the next step.) The code retrieves the registered `'titanic_cs'` `Dataset`, converts it to a Pandas `DataFrame`, and calls the various data preparation functions. 

Since the `output_path` is fully qualified, the function `os.makedirs()` is used to prepare the directory structure. At this point, you could use `DataFrame.to_csv()` to write the output data, but Parquet files are  more efficient. This efficiency would probably be irrelevant with such a small dataset, but using the **PyArrow** package's `from_pandas()` and `write_table()` functions are only a few more keystrokes than `to_csv()`.

Parquet files are natively supported by the automated ML step discussed below, so no special processing is required to consume them. 

### Write the data preparation pipeline step (`PythonScriptStep`)

The data preparation code described above must be associated with a `PythonScripStep` object to be used with a pipeline. The path to which the Parquet data-preparation output is written is generated by a `PipelineData` object. The resources prepared earlier, such as the `ComputeTarget`, the `RunConfig`, and the `'titanic_ds' Dataset` are used to complete the specification.

In [13]:
from azureml.pipeline.core import PipelineData
from azureml.pipeline.steps import PythonScriptStep

prepped_data_path = PipelineData("titanic_train", datastore).as_dataset()
prepped_data_path = PipelineData("titanic_train", datastore).as_dataset()

dataprep_step = PythonScriptStep(
    name="dataprep", 
    script_name="dataprep.py", 
    compute_target=compute_target, 
    runconfig=aml_run_config,
    arguments=["--output_path", prepped_data_path],
    inputs=[titanic_ds.as_named_input("titanic_ds")],
    outputs=[prepped_data_path],
    allow_reuse=True
)

The `prepped_data_path` object is of type `PipelineOutputFileDataset`. Notice that it's specified in both the `arguments` and `outputs` arguments. If you review the previous step, you'll see that within the data preparation code, the value of the argument `'--output_path'` is the file path to which the Parquet file was written. 

## Train with AutoMLStep

Configuring an automated ML pipeline step is done with the `AutoMLConfig` class. This flexible class is described in [Configure automated ML experiments in Python](https://docs.microsoft.com/azure/machine-learning/how-to-configure-auto-train). Data input and output are the only aspects of configuration that require special attention in an ML pipeline. Input and output for `AutoMLConfig` in pipelines is discussed in detail below. Beyond data, an advantage of ML pipelines is the ability to use different compute targets for different steps. You might choose to use a more powerful `ComputeTarget` only for the automated ML process. Doing so is as straightforward as assigning a more powerful `RunConfiguration` to the `AutoMLConfig` object's `run_configuration` parameter.

### Send data to `AutoMLStep`

In an ML pipeline, the input data must be a `Dataset` object. The highest-performing way is to provide the input data in the form of `PipelineOutputTabularDataset` objects. You create an object of that type with the `parse_parquet_files()` or `parse_delimited_files()` on a `PipelineOutputFileDataset`, such as the `prepped_data_path` object.

In [14]:
# type(prepped_data_path) == PipelineOutputFileDataset
# type(prepped_data) == PipelineOutputTabularDataset
prepped_data = prepped_data_path.parse_parquet_files(file_extension=None)

The snippet above creates a high-performing `PipelineOutputTabularDataset` from the `PipelineOutputFileDataset` output of the data preparation step.

Another option is to use `Dataset` objects registered in the workspace:

In [16]:
#prepped_data = Dataset.get_by_name(ws, 'Data_prepared')

Comparing the two techniques:

| Technique |  | 
|-|-|
|`PipelineOutputTabularDataset`| Higher performance | 
|| Natural route from `PipelineData` | 
|| Data isn't persisted after pipeline run |
|| [Notebook showing `PipelineOutputTabularDataset` technique](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/machine-learning-pipelines/nyc-taxi-data-regression-model-building/nyc-taxi-data-regression-model-building.ipynb) |
| Registered `Dataset` | Lower performance |
| | Can be generated in many ways | 
| | Data persists and is visible throughout workspace |
| | [Notebook showing registered `Dataset` technique](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/automated-machine-learning/continuous-retraining/auto-ml-continuous-retraining.ipynb)

### Specify automated ML outputs

The outputs of the `AutoMLStep` are the final metric scores of the higher-performing model and that model itself. To use these outputs in further pipeline steps, prepare `PipelineData` objects to receive them.

In [17]:
from azureml.pipeline.core import TrainingOutput

metrics_data = PipelineData(name='metrics_data',
                           datastore=datastore,
                           pipeline_output_name='metrics_output',
                           training_output=TrainingOutput(type='Metrics'))
model_data = PipelineData(name='best_model_data',
                           datastore=datastore,
                           pipeline_output_name='model_output',
                           training_output=TrainingOutput(type='Model'))

The snippet above creates the two `PipelineData` objects for the metrics and model output. Each is named, assigned to the default datastore retrieved earlier, and associated with the particular `type` of `TrainingOutput` from the `AutoMLStep`. 

### Configure and create the automated ML pipeline step

Once the inputs and outputs are defined, it's time to create the `AutoMLConfig` and `AutoMLStep`. The details of the configuration will depend on your task, as described in [Configure automated ML experiments in Python](https://docs.microsoft.com/azure/machine-learning/how-to-configure-auto-train]). For the Titanic survival classification task, the following snippet demonstrates a simple configuration.

In [18]:
from azureml.train.automl import AutoMLConfig
from azureml.pipeline.steps import AutoMLStep

# Change iterations to a reasonable number (50) to get better accuracy
automl_settings = {
    "iteration_timeout_minutes" : 10,
    "iterations" : 2,
    "experiment_timeout_hours" : 0.25,
    "primary_metric" : 'AUC_weighted'
}

automl_config = AutoMLConfig(task = 'classification',
                             path = '.',
                             debug_log = 'automated_ml_errors.log',
                             compute_target = compute_target,
                             run_configuration = aml_run_config,
                             featurization = 'auto',
                             training_data = prepped_data,
                             label_column_name = 'Survived',
                             **automl_settings)

train_step = AutoMLStep(name='AutoML_Classification',
    automl_config=automl_config,
    passthru_automl_config=False,
    outputs=[metrics_data,model_data],
    allow_reuse=True)

The snippet shows an idiom commonly used with `AutoMLConfig`. Arguments that are more fluid (hyperparameter-ish) are specified in a separate dictionary while the values less likely to change are specified directly in the `AutoMLConfig` constructor. In this case, the `automl_settings` specify a brief run: the run will stop after only 2 iterations or 15 minutes, whichever comes first.

The `automl_settings` dictionary is passed to the `AutoMLConfig` constructor as kwargs. The other parameters aren't complex:

- `task` is set to `classification` for this example. Other valid values are `regression` and `forecasting`
- `path` and `debug_log` describe the path to the project and a local file to which debug information will be written 
- `compute_target` is the previously defined `compute_target` that, in this example, is an inexpensive CPU-based machine. If you're using AutoML's Deep Learning facilities, you would want to change the compute target to be GPU-based
- `featurization` is set to `auto`. More details can be found in the [Data Featurization](https://docs.microsoft.com/azure/machine-learning/how-to-configure-auto-train#data-featurization) section of the automated ML configuration document 
- `training_data` is set to the `PipelineOutputTabularDataset` objects made from the outputs of the data preparation step 
- `label_column_name` indicates which column we are interested in predicting 

The `AutoMLStep` itself takes the `AutoMLConfig` and has, as outputs, the `PipelineData` objects created to hold the metrics and model data. 

>[!Important]
> You must set `passthru_automl_config` to `False` if your `AutoMLStep` is using `PipelineOutputTabularDataset` objects for input.

In this example, the automated ML process will perform cross-validations on the `training_data`. You can control the number of cross-validations with the `n_cross_validations` argument. If you've already split your training data as part of your data preparation steps, you can set `validation_data` to its own `Dataset`.

You might occasionally see the use `X` for data features and `y` for data labels. This technique is deprecated and you should use `training_data` for input. 

## Register the model generated by automated ML 

The last step in a basic ML pipeline is registering the created model. By adding the model to the workspace's model registry, it will be available in the portal and can be versioned. To register the model, write another `PythonScriptStep` that takes the `model_data` output of the `AutoMLStep`.

### Write the code to register the model

A model is registered in a `Workspace`. You're probably familiar with using `Workspace.from_config()` to log on to your workspace on your local machine, but there's another way to get the workspace from within a running ML pipeline. The `Run.get_context()` retrieves the active `Run`. This `run` object provides access to many important objects, including the `Workspace` used here.

In [19]:
%%writefile register_model.py
from azureml.core.model import Model, Dataset
from azureml.core.run import Run, _OfflineRun
from azureml.core import Workspace
import argparse

parser = argparse.ArgumentParser()
parser.add_argument("--model_name", required=True)
parser.add_argument("--model_path", required=True)
args = parser.parse_args()

print(f"model_name : {args.model_name}")
print(f"model_path: {args.model_path}")

run = Run.get_context()
ws = Workspace.from_config() if type(run) == _OfflineRun else run.experiment.workspace

model = Model.register(workspace=ws,
                       model_path=args.model_path,
                       model_name=args.model_name)

print("Registered version {0} of model {1}".format(model.version, model.name))

Overwriting register_model.py


### Write the PythonScriptStep code

The model-registering `PythonScriptStep` uses a `PipelineParameter` for one of its arguments. Pipeline parameters are arguments to pipelines that can be easily set at run-submission time. Once declared, they're passed as normal arguments.

In [20]:

from azureml.pipeline.core.graph import PipelineParameter

# The model name with which to register the trained model in the workspace.
model_name = PipelineParameter("model_name", default_value="TitanicSurvivalInitial")

register_step = PythonScriptStep(script_name="register_model.py",
                                       name="register_model",
                                       allow_reuse=False,
                                       arguments=["--model_name", model_name, "--model_path", model_data],
                                       inputs=[model_data],
                                       compute_target=compute_target,
                                       runconfig=aml_run_config)

## Create and run your automated ML pipeline

Creating and running a pipeline that contains an `AutoMLStep` is no different than a normal pipeline.

In [None]:
from azureml.pipeline.core import Pipeline
from azureml.core import Experiment

pipeline = Pipeline(ws, [dataprep_step, train_step, register_step])

experiment = Experiment(workspace=ws, name='titanic_automl')

run = experiment.submit(pipeline, show_output=True)
run.wait_for_completion()

Created step dataprep [535fdcb1][6acdb23b-bba1-442d-bc41-99a37d43a304], (This step will run and generate new outputs)Created step AutoML_Classification [c29ef81e][371e2f74-f181-495f-8f12-2cee917ba054], (This step will run and generate new outputs)

Created step register_model [35832e40][aaee60ca-1f20-4d11-8588-1d04101ed48f], (This step will run and generate new outputs)
Submitted PipelineRun 3b6e6d42-a2bb-41bb-9a44-aba546e3acb4
Link to Azure Machine Learning Portal: https://ml.azure.com/experiments/titanic_automl/runs/3b6e6d42-a2bb-41bb-9a44-aba546e3acb4?wsid=/subscriptions/65a1016d-0f67-45d2-b838-b8f373d6d52e/resourcegroups/laobri-ml/workspaces/pipelines
PipelineRunId: 3b6e6d42-a2bb-41bb-9a44-aba546e3acb4
Link to Azure Machine Learning Portal: https://ml.azure.com/experiments/titanic_automl/runs/3b6e6d42-a2bb-41bb-9a44-aba546e3acb4?wsid=/subscriptions/65a1016d-0f67-45d2-b838-b8f373d6d52e/resourcegroups/laobri-ml/workspaces/pipelines
PipelineRun Status: NotStarted
PipelineRun Status: R

[91m
mkl-2019.4           | 204.1 MB  | ########4  |  85% [0m[91m
mkl-2019.4           | 204.1 MB  | ########4  |  85% [0m[91m
mkl-2019.4           | 204.1 MB  | ########4  |  85% [0m[91m
mkl-2019.4           | 204.1 MB  | ########4  |  85% [0m[91m
mkl-2019.4           | 204.1 MB  | ########5  |  85% [0m[91m
mkl-2019.4           | 204.1 MB  | ########5  |  85% [0m[91m
mkl-2019.4           | 204.1 MB  | ########5  |  85% [0m[91m
mkl-2019.4           | 204.1 MB  | ########5  |  85% [0m[91m
mkl-2019.4           | 204.1 MB  | ########5  |  85% [0m[91m
mkl-2019.4           | 204.1 MB  | ########5  |  85% [0m[91m
mkl-2019.4           | 204.1 MB  | ########5  |  85% [0m[91m
mkl-2019.4           | 204.1 MB  | ########5  |  86% [0m[91m
mkl-2019.4           | 204.1 MB  | ########5  |  86% [0m[91m
mkl-2019.4           | 204.1 MB  | ########5  |  86% [0m[91m
mkl-2019.4           | 204.1 MB  | ########5  |  86% [0m[91m
mkl-2019.4           | 204.1 MB  | ########5  |  

[91m
mkl-2019.4           | 204.1 MB  | #########6 |  96% [0m[91m
mkl-2019.4           | 204.1 MB  | #########6 |  97% [0m[91m
mkl-2019.4           | 204.1 MB  | #########6 |  97% [0m[91m
mkl-2019.4           | 204.1 MB  | #########6 |  97% [0m[91m
mkl-2019.4           | 204.1 MB  | #########6 |  97% [0m[91m
mkl-2019.4           | 204.1 MB  | #########6 |  97% [0m[91m
mkl-2019.4           | 204.1 MB  | #########6 |  97% [0m[91m
mkl-2019.4           | 204.1 MB  | #########6 |  97% [0m[91m
mkl-2019.4           | 204.1 MB  | #########6 |  97% [0m[91m
mkl-2019.4           | 204.1 MB  | #########7 |  97% [0m[91m
mkl-2019.4           | 204.1 MB  | #########7 |  97% [0m[91m
mkl-2019.4           | 204.1 MB  | #########7 |  97% [0m[91m
mkl-2019.4           | 204.1 MB  | #########7 |  97% [0m[91m
mkl-2019.4           | 204.1 MB  | #########7 |  97% [0m[91m
mkl-2019.4           | 204.1 MB  | #########7 |  97% [0m[91m
mkl-2019.4           | 204.1 MB  | #########7 |  

Verifying transaction: ...working... done
Executing transaction: ...working... done
Collecting azureml-sdk[automl,explain]
  Downloading azureml_sdk-1.4.0-py3-none-any.whl (4.6 kB)
Collecting azureml-dataprep[fuse,pandas]
  Downloading azureml_dataprep-1.4.6-py3-none-any.whl (26.7 MB)
Collecting azureml-train-automl-client~=1.4.0
  Downloading azureml_train_automl_client-1.4.0-py3-none-any.whl (81 kB)
Collecting azureml-train~=1.4.0
  Downloading azureml_train-1.4.0-py3-none-any.whl (3.2 kB)
Collecting azureml-pipeline~=1.4.0
  Downloading azureml_pipeline-1.4.0-py3-none-any.whl (3.7 kB)
Collecting azureml-core~=1.4.0
  Downloading azureml_core-1.4.0.post1-py3-none-any.whl (1.3 MB)
Collecting azureml-train-automl~=1.4.0; extra == "automl"
  Downloading azureml_train_automl-1.4.0-py3-none-any.whl (3.4 kB)
Collecting azureml-explain-model~=1.4.0; extra == "explain"
  Downloading azureml_explain_model-1.4.0-py3-none-any.whl (22 kB)
Collecting azureml-dataprep-native<15.0.0,>=14.1.0
  Down

  Downloading scikit_learn-0.20.3-cp36-cp36m-manylinux1_x86_64.whl (5.4 MB)
Collecting onnxmltools==1.4.1
  Downloading onnxmltools-1.4.1-py2.py3-none-any.whl (371 kB)
Collecting onnxruntime==1.0.0
  Downloading onnxruntime-1.0.0-cp36-cp36m-manylinux1_x86_64.whl (3.4 MB)
Collecting lightgbm<=2.3.0,>=2.0.11
  Downloading lightgbm-2.3.0-py2.py3-none-manylinux1_x86_64.whl (1.3 MB)
Collecting onnx<=1.6.0,>=1.5.0
  Downloading onnx-1.6.0-cp36-cp36m-manylinux1_x86_64.whl (4.8 MB)
Collecting patsy>=0.5.1
  Downloading patsy-0.5.1-py2.py3-none-any.whl (231 kB)
Collecting wheel==0.30.0
  Downloading wheel-0.30.0-py2.py3-none-any.whl (49 kB)
Collecting sklearn-pandas<=1.7.0,>=1.4.0
  Downloading sklearn_pandas-1.7.0-py2.py3-none-any.whl (10 kB)
Collecting onnxconverter-common<=1.6.0,>=1.4.2
  Downloading onnxconverter_common-1.6.0-py2.py3-none-any.whl (43 kB)
Collecting skl2onnx==1.4.9
  Downloading skl2onnx-1.4.9-py2.py3-none-any.whl (114 kB)
Collecting azureml-defaults~=1.4.0
  Downloading azu

  Created wheel for pyrsistent: filename=pyrsistent-0.16.0-cp36-cp36m-linux_x86_64.whl size=113431 sha256=733135b68b1cf3b7a6c8833ff12e2694b791226351c0418e5d51ab2a57b5627b
  Stored in directory: /root/.cache/pip/wheels/d1/8a/1c/32ab9017418a2c64e4fbaf503c08648bed2f8eb311b869a464
Successfully built fusepy smart-open dill psutil py-cpuinfo json-logging-py JsonForm JsonSir shap fire liac-arff PyYAML termcolor pyrsistent
[91mERROR: azureml-automl-runtime 1.4.0.post1 has requirement numpy<=1.16.2,>=1.16.0, but you'll have numpy 1.18.1 which is incompatible.
[0m[91mERROR: azureml-automl-runtime 1.4.0.post1 has requirement pandas<=0.23.4,>=0.21.0, but you'll have pandas 1.0.3 which is incompatible.
ERROR: azureml-train-automl-runtime 1.4.0.post1 has requirement numpy<=1.16.2,>=1.16.0, but you'll have numpy 1.18.1 which is incompatible.
[0m[91mERROR: azureml-train-automl-runtime 1.4.0.post1 has requirement pandas<=0.23.4,>=0.21.0, but you'll have pandas 1.0.3 which is incompatible.
[0mInst

6ef1a8ae63b7: Pushed

340dc32eb998: Pushed
0e259b09e5f4: Pushed
f2608f66a0e3: Pushed
ccdb13a20bf2: Pushed
85389f9ead9e: Pushed
7f083f9454c0: Pushed
9513cdf4e497: Pushed

df18b66efaa6: Pushed
29f36b5893dc: Pushed

2521d52a0016: Pushed
latest: digest: sha256:5b004cd97a92a3162bda4fa8ed9e229951c729356e3cb18932f1056d7c5b76b9 size: 3883
2020/04/30 19:40:01 Successfully pushed image: pipelines5112a485.azurecr.io/azureml/azureml_9e6164fd71967baaa1312ba8f229b985:latest
2020/04/30 19:40:01 Step ID: acb_step_0 marked as successful (elapsed time in seconds: 368.984865)
2020/04/30 19:40:01 Populating digests for step ID: acb_step_0...
2020/04/30 19:40:07 Successfully populated digests for step ID: acb_step_0
2020/04/30 19:40:07 Step ID: acb_step_1 marked as successful (elapsed time in seconds: 210.356309)
2020/04/30 19:40:07 The following dependencies were found:
2020/04/30 19:40:07 
- image:
    registry: pipelines5112a485.azurecr.io
    repository: azureml/azureml_9e6164fd71967baaa1312ba8f229b985




StepRunId: 82f527a1-02b0-490c-8bbe-19541159a73c
Link to Azure Machine Learning Portal: https://ml.azure.com/experiments/titanic_automl/runs/82f527a1-02b0-490c-8bbe-19541159a73c?wsid=/subscriptions/65a1016d-0f67-45d2-b838-b8f373d6d52e/resourcegroups/laobri-ml/workspaces/pipelines
StepRun( AutoML_Classification ) Status: Running

StepRun(AutoML_Classification) Execution Summary
StepRun( AutoML_Classification ) Status: Finished
{'runId': '82f527a1-02b0-490c-8bbe-19541159a73c', 'target': 'cpu-compute3', 'status': 'Completed', 'startTimeUtc': '2020-04-30T20:12:24.249965Z', 'endTimeUtc': '2020-04-30T20:16:42.292654Z', 'properties': {'azureml.runsource': 'azureml.StepRun', 'ContentSnapshotId': '8aa37e12-63a8-4e4f-a3cd-6233ed3c9285', 'StepType': 'AutoMLStep', 'azureml.pipelinerunid': '3b6e6d42-a2bb-41bb-9a44-aba546e3acb4', 'num_iterations': '2', 'training_type': 'TrainFull', 'acquisition_function': 'EI', 'metrics': 'accuracy', 'primary_metric': 'AUC_weighted', 'train_split': '0', 'MaxTimeSe




StepRunId: e575afeb-aff4-4b37-8b7d-6bd8fbbdcc5f
Link to Azure Machine Learning Portal: https://ml.azure.com/experiments/titanic_automl/runs/e575afeb-aff4-4b37-8b7d-6bd8fbbdcc5f?wsid=/subscriptions/65a1016d-0f67-45d2-b838-b8f373d6d52e/resourcegroups/laobri-ml/workspaces/pipelines
StepRun( register_model ) Status: Running

Streaming azureml-logs/20_image_build_log.txt
2020/04/30 20:17:10 Downloading source code...
2020/04/30 20:17:11 Finished downloading source code
2020/04/30 20:17:12 Creating Docker network: acb_default_network, driver: 'bridge'
2020/04/30 20:17:13 Successfully set up Docker network: acb_default_network
2020/04/30 20:17:13 Setting up Docker configuration...
2020/04/30 20:17:14 Successfully set up Docker configuration
2020/04/30 20:17:14 Logging in to registry: pipelines5112a485.azurecr.io
2020/04/30 20:17:15 Successfully logged into pipelines5112a485.azurecr.io
2020/04/30 20:17:15 Executing step ID: acb_step_0. Timeout(sec): 5400, Working directory: '', Network: 'ac


ephem-3.7.7.0        | 761 KB    |            |   0% [0m[91m
ephem-3.7.7.0        | 761 KB    | ########8  |  89% [0m[91m
ephem-3.7.7.0        | 761 KB    | ########## | 100% [0m[91m

tornado-6.0.4        | 650 KB    |            |   0% [0m[91m
tornado-6.0.4        | 650 KB    | ########4  |  85% [0m[91m
tornado-6.0.4        | 650 KB    | ########## | 100% [0m[91m

python-dateutil-2.8. | 224 KB    |            |   0% [0m[91m
python-dateutil-2.8. | 224 KB    | ########## | 100% [0m[91m

mkl_random-1.1.0     | 369 KB    |            |   0% [0m[91m
mkl_random-1.1.0     | 369 KB    | ########## | 100% [0m[91m

matplotlib-3.1.3     | 21 KB     |            |   0% [0m[91m
matplotlib-3.1.3     | 21 KB     | ########## | 100% [0m[91m

pystan-2.19.0.0      | 16.6 MB   |            |   0% [0m[91m
pystan-2.19.0.0      | 16.6 MB   | #8         |  18% [0m[91m
pystan-2.19.0.0      | 16.6 MB   | ####9      |  50% [0m[91m
pystan-2.19.0.0      | 16.6 MB   | #######5   |  

[91m
mkl-2019.4           | 204.1 MB  | #########5 |  95% [0m[91m
mkl-2019.4           | 204.1 MB  | #########5 |  95% [0m[91m
mkl-2019.4           | 204.1 MB  | #########5 |  95% [0m[91m
mkl-2019.4           | 204.1 MB  | #########5 |  95% [0m[91m
mkl-2019.4           | 204.1 MB  | #########5 |  96% [0m[91m
mkl-2019.4           | 204.1 MB  | #########5 |  96% [0m[91m
mkl-2019.4           | 204.1 MB  | #########5 |  96% [0m[91m
mkl-2019.4           | 204.1 MB  | #########5 |  96% [0m[91m
mkl-2019.4           | 204.1 MB  | #########5 |  96% [0m[91m
mkl-2019.4           | 204.1 MB  | #########5 |  96% [0m[91m
mkl-2019.4           | 204.1 MB  | #########5 |  96% [0m[91m
mkl-2019.4           | 204.1 MB  | #########6 |  96% [0m[91m
mkl-2019.4           | 204.1 MB  | #########6 |  96% [0m[91m
mkl-2019.4           | 204.1 MB  | #########6 |  96% [0m[91m
mkl-2019.4           | 204.1 MB  | #########6 |  96% [0m[91m
mkl-2019.4           | 204.1 MB  | #########6 |  

[91m
qt-5.9.6             | 86.7 MB   | #########3 |  93% [0m[91m
qt-5.9.6             | 86.7 MB   | #########4 |  94% [0m[91m
qt-5.9.6             | 86.7 MB   | #########4 |  95% [0m[91m
qt-5.9.6             | 86.7 MB   | #########4 |  95% [0m[91m
qt-5.9.6             | 86.7 MB   | #########5 |  95% [0m[91m
qt-5.9.6             | 86.7 MB   | #########5 |  96% [0m[91m
qt-5.9.6             | 86.7 MB   | #########5 |  96% [0m[91m
qt-5.9.6             | 86.7 MB   | #########6 |  96% [0m[91m
qt-5.9.6             | 86.7 MB   | #########6 |  96% [0m[91m
qt-5.9.6             | 86.7 MB   | #########6 |  96% [0m[91m
qt-5.9.6             | 86.7 MB   | #########6 |  97% [0m[91m
qt-5.9.6             | 86.7 MB   | #########6 |  97% [0m[91m
qt-5.9.6             | 86.7 MB   | #########7 |  97% [0m[91m
qt-5.9.6             | 86.7 MB   | #########7 |  97% [0m[91m
qt-5.9.6             | 86.7 MB   | #########7 |  97% [0m[91m
qt-5.9.6             | 86.7 MB   | #########7 |  

  Downloading tqdm-4.45.0-py2.py3-none-any.whl (60 kB)
Collecting joblib>=0.11; extra == "required"
  Downloading joblib-0.14.1-py2.py3-none-any.whl (294 kB)
Collecting pyrsistent>=0.14.0
  Downloading pyrsistent-0.16.0.tar.gz (108 kB)
Collecting attrs>=17.4.0
  Downloading attrs-19.3.0-py2.py3-none-any.whl (39 kB)
Collecting termcolor
  Downloading termcolor-1.1.0.tar.gz (3.9 kB)
Collecting docutils<0.16,>=0.10
  Downloading docutils-0.15.2-py3-none-any.whl (547 kB)
Collecting MarkupSafe>=0.23
  Downloading MarkupSafe-1.1.1-cp36-cp36m-manylinux1_x86_64.whl (27 kB)
Building wheels for collected packages: fusepy, wrapt, dill, smart-open, psutil, py-cpuinfo, JsonSir, JsonForm, json-logging-py, shap, PyYAML, fire, liac-arff, pyrsistent, termcolor
  Building wheel for fusepy (setup.py): started
  Building wheel for fusepy (setup.py): finished with status 'done'
  Created wheel for fusepy: filename=fusepy-3.0.1-py3-none-any.whl size=10503 sha256=368d988ca3995b1186af75a6eafa59861ff1b800360c3

    Found existing installation: wheel 0.34.2
    Uninstalling wheel-0.34.2:
      Successfully uninstalled wheel-0.34.2
  Attempting uninstall: scipy
    Found existing installation: scipy 1.4.1
    Uninstalling scipy-1.4.1:
      Successfully uninstalled scipy-1.4.1

Successfully installed Jinja2-2.11.2 JsonForm-0.0.2 JsonSir-0.0.2 MarkupSafe-1.1.1 PyJWT-1.7.1 PyYAML-5.3.1 SecretStorage-3.1.2 adal-1.2.2 applicationinsights-0.11.9 attrs-19.3.0 azure-common-1.1.25 azure-core-1.4.0 azure-graphrbac-0.61.1 azure-identity-1.2.0 azure-mgmt-authorization-0.60.0 azure-mgmt-containerregistry-2.8.0 azure-mgmt-keyvault-2.2.0 azure-mgmt-resource-9.0.0 azure-mgmt-storage-9.0.0 azureml-automl-core-1.4.0 azureml-automl-runtime-1.4.0.post1 azureml-core-1.4.0.post1 azureml-dataprep-1.4.6 azureml-dataprep-native-14.1.0 azureml-defaults-1.4.0 azureml-explain-model-1.4.0 azureml-interpret-1.4.0 azureml-model-management-sdk-1.0.1b6.post1 azureml-pipeline-1.4.0 azureml-pipeline-core-1.4.0 azureml-pipeline-

The code above combines the data preparation, automated ML, and model-registering steps into a `Pipeline` object. It then creates an `Experiment` object. The `Experiment` constructor will retrieve the named experiment if it exists or create it if necessary. It submits the `Pipeline` to the `Experiment`, creating a `Run` object that will asynchronously run the pipeline. The `wait_for_completion()` function blocks until the run completes.

### Download the results of an automated ML run 

While the `run` object in the code above is from the actively running context, you can also retrieve completed `Run` objects from the `Workspace` by way of an `Experiment` object.

The workspace contains a complete record of all your experiments and runs. You can either use the portal to find and download the outputs of experiments or use code.

In [None]:
# Run on local machine
experiment = ws.experiments['titanic_automl']
run = next(run for run in ex.get_runs() if run.id == 'aaaaaaaa-bbbb-cccc-dddd-0123456789AB')
automl_run = next(r for r in run.get_children() if r.name == 'AutoML_Classification')
outputs = automl_run.get_outputs()
metrics = outputs['default_metrics_AutoML_Classification']
model = outputs['default_model_AutoML_Classification']

metrics.get_port_data_reference().download('.')
model.get_port_data_reference().download('.')

The above snippet would run on your local machine. First, it logs on to the workspace. It retrieves the `Experiment` named `titanic_automl` and from that `Experiment`, the `Run` in which you're interested. Notice that you'd set the value being compared to `run.id` to that of the run in which you're interested.

Each `Run` object contains `StepRun` objects that contain information about the individual pipeline step run. The `run` is searched for the `StepRun` object for the `AutoMLStep`. The outputs are retrieved using their default names, which are available even if you don't pass `PipelineData` objects to the `outputs` parameter of the `AutoMLStep`. 

Finally, the actual metrics and model are downloaded to your local machine for further processing.


## Next Steps

- Run this Jupyter notebook showing a [complete example of automated ML in a pipeline](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/machine-learning-pipelines/nyc-taxi-data-regression-model-building/nyc-taxi-data-regression-model-building.ipynb) that uses regression to predict taxi fares
- [Create automated ML experiments without writing code](how-to-use-automated-ml-for-ml-models.md)
- Explore a variety of [Jupyter notebooks demonstrating automated ML](https://github.com/Azure/MachineLearningNotebooks/tree/master/how-to-use-azureml/automated-machine-learning)
- Read about integrating your pipeline in to [End-to-end MLOps](https://docs.microsoft.com/azure/machine-learning/concept-model-management-and-deployment#automate-the-ml-lifecycle) or investigate the [MLOps Github repository](https://github.com/Microsoft/MLOpspython)