<i>Copyright (c) Microsoft Corporation. All rights reserved.</i>

<i>Licensed under the MIT License.</i>

# Using Pipelines and AutoML for Predicting Sentence Similarity

This notebook demonstrates how to use AzureML pipelines and AutoML to streamline the creation of a machine learning workflow for predicting sentence similarity. The pipeline contains two steps:   
1. PythonScriptStep: uses a popular sentence embedding model from Google, Universal Sentence Encoder, to convert our sentence data into numerical data
2. AutoMLStep: demonstrates how to use AutoML to automate model selection for predicting sentence similarity scores (regression)

An AmlCompute target is used to run the pipeline, Azure Datastores are used for storing of our data, and logging is utilized. 

### What are AzureML Pipelines?

AzureML Pipelines "define reusable machine learning workflows that can be used as a template for your machine learning scenarios" (https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-ml-pipelines). Pipelines allow you to optimize your workflow and spend time on machine learning rather than infrastructure. A Pipeline is defined by a series of steps; the following steps are available: AdlaStep, AutoMLStep, AzureBatchStep, DataTransferStep, DatabricksStep, EstimatorStep, HyperDriveStep, ModuleStep, MpiStep, and PythonScriptStep (see [here](https://docs.microsoft.com/en-us/python/api/azureml-pipeline-steps/?view=azure-ml-py) for details of each step). When the pipeline is run, cached results are used for all steps that have not changed, optimizing the run time. Data sources and intermediate data can be used across multiple steps in a pipeline, saving time and resources. Below we see an example of an AzureML pipeline.

![](pipelines.png)

### What is Azure AutoML?

Automated machine learning (AutoML) is a capability of Microsoft's Azure Machine Learning service. The goal of AutoML is to "improve the productivity of data scientists and democratize AI" [1] by allowing for the rapid development and deployment of machine learning models. To acheive this goal, AutoML automates the process of selecting a ML model and tuning the model. All the user is required to provide is a dataset (suitable for a classification, regression, or time-series forecasting problem) and a metric to optimize in choosing the model and hyperparameters. The user is also given the ability to set time and cost constraints for the model selection and tuning.

[1]https://azure.microsoft.com/en-us/blog/new-automated-machine-learning-capabilities-in-azure-machine-learning-service/

![](automl.png)

The AutoML model selection and tuning process can be easily tracked through the Azure portal or directly in python notebooks through the use of widgets. AutoML quickly selects a high quilty machine learning model tailored for your prediction problem. In this notebook, we walk through the steps of preparing data, setting up an AutoML experiment, and evaluating the results of our best model. More information about running AutoML experiments in Python can be found [here](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-auto-train). 

### Modeling Problem

The regression problem we will demonstrate is predicting sentence similarity scores on the STS Benchmark dataset. The [STS Benchmark dataset](http://ixa2.si.ehu.es/stswiki/index.php/STSbenchmark#STS_benchmark_dataset_and_companion_dataset) contains a selection of English datasets that were used in Semantic Textual Similarity (STS) tasks 2012-2017. The dataset contains 8,628 sentence pairs with a human-labeled integer representing the sentences' similarity (ranging from 0, for no meaning overlap, to 5, meaning equivalence).

For each sentence in the sentence pair, we will use Google's pretrained Universal Sentence Encoder (details provided below) to generate a $512$-dimensional embedding. Both embeddings in the sentence pair will be concatenated and the resulting $1024$-dimensional vector will be used as features in our regression problem. Our target variable is the sentence similarity score.

In [4]:
# Set the environment path to find NLP
import sys
sys.path.append("../../")
import time
import logging
import csv
import os
import pandas as pd
import shutil
import numpy as np
import torch
import sys
from scipy.stats import pearsonr
from scipy.spatial import distance
from sklearn.externals import joblib

# Import utils
from utils_nlp.azureml import azureml_utils
from utils_nlp.dataset import stsbenchmark
from utils_nlp.dataset.preprocess import (
    to_lowercase,
    to_spacy_tokens,
    rm_spacy_stopwords,
)

# Tensorflow dependencies for Google Universal Sentence Encoder
import tensorflow as tf
import tensorflow_hub as hub
tf.logging.set_verbosity(tf.logging.ERROR) # reduce logging output

# AzureML packages
import azureml as aml
import logging
from azureml.telemetry import set_diagnostics_collection
set_diagnostics_collection(send_diagnostics=True)
from azureml.train.automl import AutoMLConfig
from azureml.core import Datastore, Experiment
from azureml.data.data_reference import DataReference  
from azureml.widgets import RunDetails
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.runconfig import RunConfiguration
from azureml.core.conda_dependencies import CondaDependencies
from azureml.train.automl import AutoMLStep
from azureml.pipeline.core import Pipeline, PipelineData, TrainingOutput
from azureml.pipeline.steps import PythonScriptStep

print("System version: {}".format(sys.version))
print("Azure ML SDK Version:", aml.core.VERSION)
print("Pandas version: {}".format(pd.__version__))
print("Tensorflow Version:", tf.VERSION)

W0618 22:40:59.654601 10096 __init__.py:56] Some hub symbols are not available because TensorFlow version is less than 1.14


Turning diagnostics collection on. 
System version: 3.6.7 |Anaconda, Inc.| (default, Dec 10 2018, 20:35:02) [MSC v.1915 64 bit (AMD64)]
Azure ML SDK Version: 1.0.41
Pandas version: 0.23.4
Tensorflow Version: 1.13.1


In [5]:
BASE_DATA_PATH = '../../data'

# 1. Data Preparation

**STS Benchmark Dataset**

As described above, the STS Benchmark dataset contains 8.6K sentence pairs along with a human-annotated score for how similiar the two sentences are. We will load the training, development (validation), and test sets provided by STS Benchmark and preprocess the data (lowercase the text, drop irrelevant columns, and rename the remaining columns) using the utils contained in this repo. Each dataset will ultimately have three columns: _sentence1_ and _sentence2_ which contain the text of the sentences in the sentence pair, and _score_ which contains the human-annotated similarity score of the sentence pair.

In [6]:
# Load in the raw datasets as pandas dataframes
train_raw = stsbenchmark.load_pandas_df(BASE_DATA_PATH, file_split="train")
dev_raw = stsbenchmark.load_pandas_df(BASE_DATA_PATH, file_split="dev")
test_raw = stsbenchmark.load_pandas_df(BASE_DATA_PATH, file_split="test")

100%|████████████████████████████████████████████████| 401/401 [00:01<00:00, 232KB/s]


Data downloaded to ../../data\raw\stsbenchmark


100%|████████████████████████████████████████████████| 401/401 [00:02<00:00, 140KB/s]


Data downloaded to ../../data\raw\stsbenchmark


100%|████████████████████████████████████████████████| 401/401 [00:02<00:00, 165KB/s]


Data downloaded to ../../data\raw\stsbenchmark


In [7]:
# Clean each dataset by lowercasing text, removing irrelevant columns,
# and renaming the remaining columns
train_clean = stsbenchmark.clean_sts(train_raw)
dev_clean = stsbenchmark.clean_sts(dev_raw)
test_clean = stsbenchmark.clean_sts(test_raw)

In [8]:
# Convert all text to lowercase
train = to_lowercase(train_clean)
dev = to_lowercase(dev_clean)
test = to_lowercase(test_clean)

In [9]:
print("Training set has {} sentences".format(len(train)))
print("Development set has {} sentences".format(len(dev)))
print("Testing set has {} sentences".format(len(test)))

Training set has 5749 sentences
Development set has 1500 sentences
Testing set has 1379 sentences


In [10]:
train.head(5)

Unnamed: 0,score,sentence1,sentence2
0,5.0,a plane is taking off.,an air plane is taking off.
1,3.8,a man is playing a large flute.,a man is playing a flute.
2,3.8,a man is spreading shreded cheese on a pizza.,a man is spreading shredded cheese on an uncoo...
3,2.6,three men are playing chess.,two men are playing chess.
4,4.25,a man is playing the cello.,a man seated is playing the cello.


In [14]:
#Save the cleaned data
if not os.path.isdir('data'):
    os.mkdir('data')
    
train.to_csv("data/train.csv", index=False)
test.to_csv("data/test.csv", index=False)
dev.to_csv("data/dev.csv", index=False)

# 2. Set up AzureML Workspace, Experiment, Compute & Datastore

## 2a. Link to or create a workspace

In [11]:
ws = azureml_utils.get_or_create_workspace(
    subscription_id="<SUBSCRIPTION_ID>",
    resource_group="<RESOURCE_GROUP>",
    workspace_name="<WORKSPACE_NAME>",
    workspace_region="<WORKSPACE_REGION>"
)
print('Workspace name: ' + ws.name, 
      'Azure region: ' + ws.location, 
      'Subscription id: ' + ws.subscription_id, 
      'Resource group: ' + ws.resource_group, sep='\n')

If you run your code in unattended mode, i.e., where you can't give a user input, then we recommend to use ServicePrincipalAuthentication or MsiAuthentication.
Please refer to aka.ms/aml-notebook-auth for different authentication mechanisms in azureml-sdk.


Performing interactive authentication. Please follow the instructions on the terminal.


W0618 22:55:15.432929 37988 _profile.py:1082] Note, we have launched a browser for you to login. For old experience with device code, use "az login --use-device-code"
W0618 22:55:30.586771 10096 _profile.py:774] You have logged in. Now let us find all the subscriptions to which you have access...


Interactive authentication successfully completed.
Workspace name: MAIDAPTest
Azure region: eastus2
Subscription id: 15ae9cb6-95c1-483d-a0e3-b1a1a3b06324
Resource group: nlprg


## 2b. Set up an experiment and logging

In [15]:
# Make a folder for the project
project_folder = './automl-sentence-similarity'
if not os.path.exists(project_folder):
    os.makedirs(project_folder)

# Set up an experiment
experiment_name = 'automl-sentence-similarity'
experiment = Experiment(ws, experiment_name)
run = experiment.start_logging()

## 2c. Link compute target

In [13]:
# choose a name for your cluster
cluster_name = "gpucluster"

try:
    compute_target = ComputeTarget(workspace=ws, name=cluster_name)
    print('Found existing compute target.')
except ComputeTargetException:
    print('Creating a new compute target...')
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6',
                                                           max_nodes=4)

    # create the cluster
    compute_target = ComputeTarget.create(ws, cluster_name, compute_config)

    compute_target.wait_for_completion(show_output=True)

# use get_status() to get a detailed status for the current AmlCompute. 
print(compute_target.get_status().serialize())

Found existing compute target.
{'currentNodeCount': 0, 'targetNodeCount': 0, 'nodeStateCounts': {'preparingNodeCount': 0, 'runningNodeCount': 0, 'idleNodeCount': 0, 'unusableNodeCount': 0, 'leavingNodeCount': 0, 'preemptedNodeCount': 0}, 'allocationState': 'Steady', 'allocationStateTransitionTime': '2019-06-19T02:52:52.599000+00:00', 'errors': None, 'creationTime': '2019-05-20T22:09:40.142683+00:00', 'modifiedTime': '2019-05-20T22:10:11.888950+00:00', 'provisioningState': 'Succeeded', 'provisioningStateTransitionTime': None, 'scaleSettings': {'minNodeCount': 0, 'maxNodeCount': 4, 'nodeIdleTimeBeforeScaleDown': 'PT120S'}, 'vmPriority': 'Dedicated', 'vmSize': 'STANDARD_NC6'}


## 2d. Upload data to datastore

In [16]:
# Select a specific datastore or can call ws.get_default_datastore()
datastore_name = 'workspacefilestore'
ds = ws.datastores[datastore_name]

# Upload files in data folder
ds.upload(src_dir='./data', target_path='stsbenchmark_data', overwrite=True, show_progress=True)

Uploading ./data\dev.csv
Uploading ./data\test.csv
Uploading ./data\train.csv
Uploaded ./data\dev.csv, 1 files out of an estimated total of 3
Uploaded ./data\train.csv, 2 files out of an estimated total of 3
Uploaded ./data\test.csv, 3 files out of an estimated total of 3


$AZUREML_DATAREFERENCE_e806155bf4c3452596bd2c3ffa76743d

Set up a **DataReference** object that points to the data we just uploaded into the stsbenchmark_data folder. DataReference objects point to data that is accessible from a datastore.

In [17]:
input_data = DataReference(datastore=ds, 
                           data_reference_name="stsbenchmark",
                           path_on_datastore='stsbenchmark_data/',
                           overwrite=False)

# 3. Create Pipeline

## 3a. Set up run configuration file

In [18]:
# create a new RunConfig object
conda_run_config = RunConfiguration(framework="python")

# Set compute target to AmlCompute
conda_run_config.target = compute_target

conda_run_config.environment.docker.enabled = True
conda_run_config.environment.docker.base_image = aml.core.runconfig.DEFAULT_CPU_IMAGE

# Use conda_dependencies.yml to create a conda environment in the Docker image for execution
conda_run_config.environment.python.user_managed_dependencies = False

conda_run_config.environment.python.conda_dependencies = CondaDependencies.create(pip_packages=['azureml-sdk[automl]', 'azureml-sdk', 'azureml-dataprep', 'azureml-train-automl==1.0.33'], 
                              conda_packages=['numpy', 'py-xgboost', 'pandas', 'tensorflow', 'tensorflow-hub', 'scikit-learn'], 
                              pin_sdk_version=False)

print('run config is ready')

run config is ready


## 3b. PythonScriptStep

In this pipeline step, we will convert our sentences into a numerical representation in order to use them in our machine learning model. We will embed both sentences using the Google Universal Sentence Encoder and concatenate their representations into a $1024$-dimensional vector to use as features for AutoML.

**Google Universal Sentence Encoder: Overview**
We'll use a popular sentence encoder called Google Universal Sentence Encoder (see [original paper](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/46808.pdf)). Google provides two pretrained models based on different design goals: a Transformer model (targets high accuracy even if this reduces model complexity) and a Deep Averaging Network model (DAN; targets efficient inference). Both models are trained on a variety of web sources (Wikipedia, news, question-answers pages, and discussion forums) and produced 512-dimensional embeddings. This notebook utilizes the Transformer-based encoding model which can be downloaded [here](https://tfhub.dev/google/universal-sentence-encoder-large/3) because of its better performance relative to the DAN model on the STS Benchmark dataset (see Table 2 in Google Research's [paper](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/46808.pdf)). 

**Google Universal Sentence Encoder: Transformer Model** The Transformer model produces sentence embeddings using the "encoding sub-graph of the transformer architecture" (original architecture introduced [here](https://arxiv.org/abs/1706.03762)). "This sub-graph uses attention to compute context aware representations of words in a sentence that take into account both the ordering and identity of all the other workds. The context aware word representations are converted to a fixed length sentence encoding vector by computing the element-wise sum of the representations at each word position." The input to the model is lowercase PTB-tokenized strings and the model is designed to be useful for multiple different tasks by using multi-task learning. More details about the model can be found in the [paper](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/46808.pdf) by Google Research.

**Using the Pretrained Model**

Tensorflow-hub provides the pretrained model for use by the public. We import the model from its url and then feed the model our sentences for it to encode.

In [19]:
%%writefile $project_folder/embed.py
import argparse
import os
import azureml.core
import pandas as pd
import numpy as np
import tensorflow as tf
import tensorflow_hub as hub
tf.logging.set_verbosity(tf.logging.ERROR) # reduce logging output

def google_encoder(dataset):
    """ Function that embeds sentences using the Google Universal
    Sentence Encoder pretrained model
    
    Parameters:
    ----------
    dataset: pandas dataframe with sentences and scores
    
    Returns:
    -------
    emb1: 512-dimensional representation of sentence1
    emb2: 512-dimensional representation of sentence2
    """
    sts_input1 = tf.placeholder(tf.string, shape=(None))
    sts_input2 = tf.placeholder(tf.string, shape=(None))

    # Apply embedding model and normalize the input
    sts_encode1 = tf.nn.l2_normalize(embedding_model(sts_input1), axis=1)
    sts_encode2 = tf.nn.l2_normalize(embedding_model(sts_input2), axis=1)
    
    with tf.Session() as session:
        session.run(tf.global_variables_initializer())
        session.run(tf.tables_initializer())
        emb1, emb2 = session.run(
          [sts_encode1, sts_encode2],
          feed_dict={
              sts_input1: dataset['sentence1'],
              sts_input2: dataset['sentence2']
          })
    return emb1, emb2

def feature_engineering(dataset):
    """Extracts embedding features from the dataset and returns
    features and target in a dataframe
    
    Parameters:
    ----------
    dataset: pandas dataframe with sentences and scores
    
    Returns:
    -------
    df: pandas dataframe with embedding features
    scores: list of target variables
    """
    google_USE_emb1, google_USE_emb2 = google_encoder(dataset)
    n_google = google_USE_emb1.shape[1] #length of the embeddings 
    df = np.concatenate((google_USE_emb1, google_USE_emb2), axis=1)
    names = ['USEEmb1_'+str(i) for i in range(n_google)]+['USEEmb2_'+str(i) for i in range(n_google)]
    df = pd.DataFrame(df, columns=names)
    return df, dataset['score']

def write_output(df, path, name):
    os.makedirs(path, exist_ok=True)
    print("%s created" % path)
    df.to_csv(path + "/" + name, index=False)

parser = argparse.ArgumentParser()
parser.add_argument("--sentence_data", type=str)
parser.add_argument("--embedded_data", type=str)
args = parser.parse_args()

# Import the Universal Sentence Encoder's TF Hub module
module_url = "https://tfhub.dev/google/universal-sentence-encoder-large/3"
embedding_model = hub.Module(module_url)

train = pd.read_csv(args.sentence_data + "/train.csv")
dev = pd.read_csv(args.sentence_data + "/dev.csv")

training_data, training_scores = feature_engineering(train)
validation_data, validation_scores = feature_engineering(dev)

write_output(training_data, args.embedded_data, "X_train.csv")
write_output(pd.DataFrame(training_scores, columns=['score']), args.embedded_data, "y_train.csv")

write_output(validation_data, args.embedded_data, "X_dev.csv")
write_output(pd.DataFrame(validation_scores, columns=['score']), args.embedded_data, "y_dev.csv")

Overwriting ./automl-sentence-similarity/embed.py


**PipelineData** objects represent a piece of intermediate data in a pipeline. Generally they are produced by one step (as an output) and then consumed by the next step (as an input), introducing an implicit order between steps in a pipeline. We create a PipelineData object that can represent the data produced by our first pipeline step that will be consumed by our second pipeline step.

In [20]:
embedded_data = PipelineData("embedded_data", datastore=ds)

In [21]:
embedStep = PythonScriptStep(
    name="Embed",
    script_name="embed.py", 
    arguments=["--embedded_data", embedded_data,
               "--sentence_data", input_data],
    inputs=[input_data],
    outputs=[embedded_data],
    compute_target=compute_target,
    runconfig = conda_run_config,
    hash_paths=["embed.py"],
    source_directory=project_folder,
    allow_reuse=True
)

## 3c. AutoMLStep

In [22]:
%%writefile $project_folder/get_data.py

import os
import pandas as pd

def get_data():
    X_train  = pd.read_csv(os.environ['AZUREML_DATAREFERENCE_embedded_data'] + "/X_train.csv")
    y_train  = pd.read_csv(os.environ['AZUREML_DATAREFERENCE_embedded_data'] + "/y_train.csv")
    X_dev  = pd.read_csv(os.environ['AZUREML_DATAREFERENCE_embedded_data'] + "/X_dev.csv")
    y_dev  = pd.read_csv(os.environ['AZUREML_DATAREFERENCE_embedded_data'] + "/y_dev.csv")
    return { "X" : X_train.values, "y" : y_train.values.flatten(), "X_valid": X_dev.values, "y_valid": y_dev.values.flatten()}

Overwriting ./automl-sentence-similarity/get_data.py


In [23]:
# Create PipelineData objects for tracking AutoML metrics 
metrics_output_name = 'metrics_output'
best_model_output_name = 'best_model_output'

metrics_data = PipelineData(name='metrics_data',
                           datastore=ds,
                           pipeline_output_name=metrics_output_name,
                           training_output=TrainingOutput(type='Metrics'))
model_data = PipelineData(name='model_data',
                           datastore=ds,
                           pipeline_output_name=best_model_output_name,
                           training_output=TrainingOutput(type='Model'))

In [24]:
automl_settings = {
    "iteration_timeout_minutes": 5,
    "iterations": 5,
    "primary_metric": 'spearman_correlation',
    "preprocess": True,
    "verbosity": logging.INFO,
}
automl_config = AutoMLConfig(task = 'regression',
                 debug_log = 'automl_errors.log',
                 path = project_folder,
                 compute_target=compute_target,
                 run_configuration=conda_run_config,
                 data_script = project_folder + "/get_data.py",
                 **automl_settings
                )

In [25]:
automl_step = AutoMLStep(
    name='AutoML',
    automl_config=automl_config,
    inputs=[embedded_data],
    outputs=[metrics_data, model_data],
    hash_paths=["get_data.py"],
    allow_reuse=True)

# 4. Run Pipeline

In [26]:
automl_step.run_after(embedStep)
pipeline = Pipeline(
    description="pipeline_embed_automl",
    workspace=ws,    
    steps=[automl_step])

In [27]:
pipeline_run = experiment.submit(pipeline)

Created step AutoML [320f0121][5913af95-9ebb-42c0-a650-7725b7fe0b54], (This step will run and generate new outputs)
Created step Embed [81087fb9][d271deed-bd3b-4e41-9814-29fc11e585b4], (This step is eligible to reuse a previous run's output)
Using data reference stsbenchmark for StepId [8ca56eac][e3340790-c54f-4147-8dd0-bcb80a9b7b46], (Consumers of this data are eligible to reuse prior runs.)
Submitted pipeline run: 5549c561-26e2-4979-9f3f-0379e38de86a


In [28]:
RunDetails(pipeline_run).show()

_PipelineWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': True, 'log_level': 'INFO', '…

In [29]:
pipeline_run.wait_for_completion(show_output=True)

PipelineRunId: 5549c561-26e2-4979-9f3f-0379e38de86a
Link to Portal: https://mlworkspace.azure.ai/portal/subscriptions/15ae9cb6-95c1-483d-a0e3-b1a1a3b06324/resourceGroups/nlprg/providers/Microsoft.MachineLearningServices/workspaces/MAIDAPTest/experiments/automl-sentence-similarity/runs/5549c561-26e2-4979-9f3f-0379e38de86a
PipelineRun Status: Running


StepRunId: 3fffcae0-74f3-49c5-bb7d-f877bda582f7
Link to Portal: https://mlworkspace.azure.ai/portal/subscriptions/15ae9cb6-95c1-483d-a0e3-b1a1a3b06324/resourceGroups/nlprg/providers/Microsoft.MachineLearningServices/workspaces/MAIDAPTest/experiments/automl-sentence-similarity/runs/3fffcae0-74f3-49c5-bb7d-f877bda582f7

StepRun(Embed) Execution Summary
StepRun( Embed ) Status: Finished
{'runId': '3fffcae0-74f3-49c5-bb7d-f877bda582f7', 'target': 'gpucluster', 'status': 'Completed', 'startTimeUtc': '2019-06-19T03:30:21.798384Z', 'endTimeUtc': '2019-06-19T03:30:21.864304Z', 'properties': {'azureml.reusedrunid': 'f78cb325-802a-4779-ada8-05db82c9

Link to Portal: https://mlworkspace.azure.ai/portal/subscriptions/15ae9cb6-95c1-483d-a0e3-b1a1a3b06324/resourceGroups/nlprg/providers/Microsoft.MachineLearningServices/workspaces/MAIDAPTest/experiments/automl-sentence-similarity/runs/297207dd-e830-4133-af2b-1efff54ee11a
StepRun( AutoML ) Status: NotStarted
StepRun( AutoML ) Status: Running

StepRun(AutoML) Execution Summary
StepRun( AutoML ) Status: Finished
{'runId': '297207dd-e830-4133-af2b-1efff54ee11a', 'target': 'gpucluster', 'status': 'Completed', 'startTimeUtc': '2019-06-19T03:36:02.737145Z', 'endTimeUtc': '2019-06-19T03:44:50.348314Z', 'properties': {'azureml.runsource': 'azureml.StepRun', 'ContentSnapshotId': '81120654-2a16-4013-96f7-922eda5e4e1e', 'StepType': 'AutoMLStep', 'azureml.pipelinerunid': '5549c561-26e2-4979-9f3f-0379e38de86a', 'num_iterations': '5', 'training_type': 'TrainFull', 'acquisition_function': 'EI', 'metrics': 'accuracy', 'primary_metric': 'spearman_correlation', 'train_split': '0', 'MaxTimeSeconds': '300',

'Finished'

In [31]:
published_pipeline = pipeline.publish(
    name="Sentence_Similarity_Pipeline", 
    description="Sentence Similarity with Google USE Features")