# Unified AI Platform: AutoML Tabular 'end-to-end' workflow

## Introduction

This example shows how to build an [AutoML Tabular](https://cloud.google.com/ai-platform-unified/docs/training/training) 'end-to-end' workflow for Managed Pipelines, using the [Unified AI Platform](https://cloud.google.com/ai-platform-unified/docs)'s SDK.

The pipeline creates a *Dataset* from a [BigQuery](https://cloud.google.com/bigquery/) table, uses it to train a tabular regression model, gets evaluation information about the trained model, and if the model is sufficiently accurate, deploys the model for prediction.

<a href="https://storage.googleapis.com/amy-jo/images/automl/ucaip_automl_tabular_dag.png" target="_blank"><img src="https://storage.googleapis.com/amy-jo/images/automl/ucaip_automl_tabular_dag.png" width="90%"/></a>

The training can take a few hours, so this example pipeline will take a while to run.

### About the dataset and modeling task

The  [Cloud Public Datasets Program](https://cloud.google.com/bigquery/public-data/)  makes available public datasets that are useful for experimenting with machine learning. Just as with this “[Explaining model predictions on structured data](https://cloud.google.com/blog/products/ai-machine-learning/explaining-model-predictions-structured-data)” post, we’ll use data that is essentially a join of two public datasets stored in  [BigQuery](https://cloud.google.com/bigquery/) :  [London Bike rentals](https://console.cloud.google.com/bigquery?p=bigquery-public-data&d=london_bicycles&page=dataset)  and  [NOAA weather data](https://console.cloud.google.com/bigquery?p=bigquery-public-data&d=noaa_gsod&page=dataset) , with some additional processing to clean up outliers and derive additional GIS and day-of-week fields.

We’ll use this dataset to build a *regression* model to predict the duration of a bike rental based on information about the start and end stations, the day of the week, the weather on that day, and other data. If we were running a bike rental company, for example, these predictions—and their explanations—could help us anticipate demand and even plan how to stock each location.

## Setup

Before you run this notebook, ensure that your Google Cloud user account and project are granted access to the Managed Pipelines Experimental. To be granted access to the Managed Pipelines Experimental, fill out this [form](http://go/cloud-mlpipelines-signup) and let your account representative know you have requested access. 

This notebook is intended to be run on either one of:
* [AI Platform Notebooks](https://cloud.google.com/ai-platform-notebooks). See the "AI Platform Notebooks" section in the Experimental [User Guide](https://docs.google.com/document/d/1JXtowHwppgyghnj1N1CT73hwD1caKtWkLcm2_0qGBoI/edit?usp=sharing) for more detail on creating a notebook server instance.
* [Google Colab](https://colab.research.google.com/notebooks/intro.ipynb)

If you haven't already enabled the AI Platform API, on the [AI Platform (Unified) Dashboard](https://console.cloud.google.com/ai/platform) page in the Google Cloud Console, click **Enable the AI Platform API**.


We'll first install some libraries and set up some variables.


Set `gcloud` to use your project.  **Edit the following cell before running it**.

In [None]:
PROJECT_ID = 'laah-pipeline'  # <---CHANGE THIS

In [None]:
# aju temp testing
PROJECT_ID = 'aju-vtests2'  # <---CHANGE THIS

In [None]:
!gcloud config set project {PROJECT_ID}

If you're running this notebook on colab, authenticate with your user account:

In [None]:
import sys
if 'google.colab' in sys.modules:
  from google.colab import auth
  auth.authenticate_user()

### Install the KFP SDK and AI Platform Pipelines client library

For Managed Pipelines Experimental, you'll need to download a special version of the AI Platform client library.

In [None]:
!gsutil cp gs://cloud-aiplatform-pipelines/releases/20201123/aiplatform_pipelines_client-0.1.0.caip20201123-py3-none-any.whl .

Then, install the libraries and restart the kernel.

In [None]:
if 'google.colab' in sys.modules:
  USER_FLAG = ''
else:
  USER_FLAG = '--user'

In [None]:
!python3 -m pip install {USER_FLAG} kfp==1.1.2 --upgrade
!python3 -m pip install {USER_FLAG} aiplatform_pipelines_client-0.1.0.caip20201123-py3-none-any.whl --upgrade

In [None]:
!python3 -m pip install {USER_FLAG} google-cloud-aiplatform

In [None]:
# Automatically restart kernel after installs 
# (for this notebook, seems this might be necessary for colab too)
import IPython
app = IPython.Application.instance()
app.kernel.do_shutdown(True)

The KFP version should be == 1.1.2.



In [None]:
# Check the KFP version
!python3 -c "import kfp; print('KFP version: {}'.format(kfp.__version__))"

If you're on colab, re-authorize after the kernel restart. **Edit the following cell for your project ID before running it.**

In [None]:
import sys
if 'google.colab' in sys.modules:
  PROJECT_ID = 'laah-pipeline'  # <---CHANGE THIS
  !gcloud config set project {PROJECT_ID}
  from google.colab import auth
  auth.authenticate_user()
  USER_FLAG = ''

In [None]:
# aju temp testing
import sys
if 'google.colab' in sys.modules:
  PROJECT_ID = 'aju-vtests2'  # <---CHANGE THIS
  !gcloud config set project {PROJECT_ID}
  from google.colab import auth
  auth.authenticate_user()
  USER_FLAG = ''

### Set some variables

**Before you run the next cell**, **edit it** to set variables for your project.  See the "Before you begin" section of the User Guide for information on creating your API key.  For `BUCKET_NAME`, enter the name of a Cloud Storage (GCS) bucket in your project.  Don't include the `gs://` prefix.

In [None]:
PATH=%env PATH
%env PATH={PATH}:/home/jupyter/.local/bin

# Required Parameters
USER = 'laah' # <---CHANGE THIS
BUCKET_NAME = 'laah-mp-lab'  # <---CHANGE THIS
PIPELINE_ROOT = 'gs://{}/pipeline_root/{}'.format(BUCKET_NAME, USER)

PROJECT_ID = 'laah-pipeline'  # <---CHANGE THIS
REGION = 'us-central1'
API_KEY = 'AIzaSyAkN9WSMWWMuneFZBSuEv7nJrmov2UBVQU'  # <---CHANGE THIS

print('PIPELINE_ROOT: {}'.format(PIPELINE_ROOT))

## Create the underlying container image used for the pipeline steps



We'll use Cloud Build to generate the container image used by all the pipeline components in this example. 

In [None]:
%%writefile Dockerfile

FROM gcr.io/deeplearning-platform-release/tf2-cpu.2-3:latest

RUN pip install -U google-cloud-aiplatform
RUN pip install -U google-cloud-storage

In [None]:
!gcloud builds submit --tag gcr.io/{PROJECT_ID}/custom-container-ucaip:{USER} .

## Create the pipeline components

In this section we'll define our pipeline components. We'll build Python function-based components.  First we'll define some python functions, then generate component `yaml` files based on those function definitions.

### Create the dataset

Create a *Dataset* and ingest data into it from a BigQuery table. This function assumes a BQ table as the source, and would be a bit different if the source was a set of GCS files.

In [None]:
from kfp.v2.components import OutputPath

def create_dataset_tabular_bigquery_sample(
    project: str,
    display_name: str,
    bigquery_uri: str, # 'bq://aju-dev-demos.london_bikes_weather.bikes_weather',
    location: str, # "us-central1",
    api_endpoint: str, # "us-central1-aiplatform.googleapis.com",
    timeout: int, # 500,
    dataset_id: OutputPath('String'),
):

  import logging
  import subprocess
  import time

  from google.cloud import aiplatform
  from google.protobuf import json_format
  from google.protobuf.struct_pb2 import Value

  logging.getLogger().setLevel(logging.INFO)

  client_options = {"api_endpoint": api_endpoint}
  # Initialize client that will be used to create and send requests.
  client = aiplatform.gapic.DatasetServiceClient(client_options=client_options)
  metadata_dict = {"input_config": {"bigquery_source": {"uri": bigquery_uri}}}
  metadata = json_format.ParseDict(metadata_dict, Value())

  dataset = {
      "display_name": display_name,
      "metadata_schema_uri": "gs://google-cloud-aiplatform/schema/dataset/metadata/tabular_1.0.0.yaml",
      "metadata": metadata,
  }
  parent = f"projects/{project}/locations/{location}"
  response = client.create_dataset(parent=parent, dataset=dataset)
  print("Long running operation:", response.operation.name)
  create_dataset_response = response.result(timeout=timeout)
  logging.info("create_dataset_response: %s", create_dataset_response)
  path_components = create_dataset_response.name.split('/')
  logging.info('got dataset id: %s', path_components[-1])
  # write the dataset id as output
  with open('temp.txt', "w") as outfile:
    outfile.write(path_components[-1])
  subprocess.run(['gsutil', 'cp', 'temp.txt', dataset_id])


### Train an AutoML tabular regression model

We'll train an AutoML tabular model on the dataset.  To set up the training job, we need to provide the target (label) column and a dict of the input transformations: which fields to use as model inputs, and the input types.  These will be set later on in the notebook, when we run the pipeline.   
The target column will be set to `duration`, a numeric field. This function definition assumes a *regression* model, and would be a bit different if we were training a classification model instead.

In [None]:
from kfp.v2.components import OutputPath

def training_tabular_regression(
    project: str,
    display_name: str,
    dataset_id: str,
    model_prefix: str,
    target_column: str,
    transformations_str: str,
    location: str, # "us-central1",
    api_endpoint: str, # "us-central1-aiplatform.googleapis.com"
    model_id: OutputPath('String'),
    model_dispname: OutputPath('String')
):
  import json
  import logging
  import subprocess
  import time
  from google.cloud import aiplatform
  from google.protobuf import json_format
  from google.protobuf.struct_pb2 import Value
  from google.cloud.aiplatform_v1beta1.types import pipeline_state

  SLEEP_INTERVAL = 100

  logging.getLogger().setLevel(logging.INFO)
  logging.info('using dataset id: %s', dataset_id)
  client_options = {"api_endpoint": api_endpoint}
  # Initialize client that will be used to create and send requests.
  client = aiplatform.gapic.PipelineServiceClient(client_options=client_options)
  # set the columns used for training and their data types
  transformations = json.loads(transformations_str)
  logging.info('using transformations: %s', transformations)

  training_task_inputs_dict = {
        # required inputs
        "targetColumn": target_column,
        "predictionType": "regression",
        "transformations": transformations,
        "trainBudgetMilliNodeHours": 2000,
        "disableEarlyStopping": False,
        "optimizationObjective": "minimize-rmse",
  }
  training_task_inputs = json_format.ParseDict(training_task_inputs_dict, Value())
  model_display_name = '{}_{}'.format(model_prefix, str(int(time.time())))

  training_pipeline = {
        "display_name": display_name,
        "training_task_definition": "gs://google-cloud-aiplatform/schema/trainingjob/definition/automl_tabular_1.0.0.yaml",
        "training_task_inputs": training_task_inputs,
        "input_data_config": {
            "dataset_id": dataset_id,
            "fraction_split": {
                "training_fraction": 0.8,
                "validation_fraction": 0.1,
                "test_fraction": 0.1,
            },
        },
        "model_to_upload": {"display_name": model_display_name},
  }
  parent = f"projects/{project}/locations/{location}"
  response = client.create_training_pipeline(
        parent=parent, training_pipeline=training_pipeline
  )
  training_pipeline_name = response.name
  logging.info("pipeline name: %s", training_pipeline_name)
  # Poll periodically until training completes
  while True:
    mresponse = client.get_training_pipeline(name=training_pipeline_name)
    logging.info('mresponse: %s', mresponse)
    logging.info('job state: %s', mresponse.state)
    if mresponse.state == pipeline_state.PipelineState.PIPELINE_STATE_SUCCEEDED:
      logging.info('training finished')
      # write some outputs once finished
      model_name = mresponse.model_to_upload.name 
      logging.info('got model name: %s', model_name)
      with open('temp.txt', "w") as outfile:
        outfile.write(model_name)
      subprocess.run(['gsutil', 'cp', 'temp.txt', model_id])
      with open('temp2.txt', "w") as outfile:
        outfile.write(model_display_name)
      subprocess.run(['gsutil', 'cp', 'temp2.txt', model_dispname])      
      break
    else:
      time.sleep(SLEEP_INTERVAL)


### Get evaluation information for the trained model

...

In [None]:

def get_model_evaluation_tabular(
    project: str,
    model_id: str,
    location: str, #"us-central1",
    api_endpoint: str, # "us-central1-aiplatform.googleapis.com",
    eval_info: OutputPath('String'),
):
  import json
  import logging
  import subprocess  
  from google.cloud import aiplatform

  def get_eval_id(client, model_name):
    from google.protobuf.json_format import MessageToDict
    response = client.list_model_evaluations(parent=model_name)
    for evaluation in response:
        print("model_evaluation")
        print(" name:", evaluation.name)
        print(" metrics_schema_uri:", evaluation.metrics_schema_uri)
        metrics = MessageToDict(evaluation._pb.metrics)
        for metric in metrics.keys():
            logging.info('metric: %s, value: %s', metric, metrics[metric])
        metrics_str = json.dumps(metrics)

    return (evaluation.name, metrics_str)  # for regression, only one slice

  logging.getLogger().setLevel(logging.INFO)

  client_options = {"api_endpoint": api_endpoint}
  # Initialize client that will be used to create and send requests.
  client = aiplatform.gapic.ModelServiceClient(client_options=client_options)
  eval_name, metrics_str = get_eval_id(client, model_id)
  logging.info('got evaluation name: %s', eval_name)
  logging.info('got metrics dict string: %s', metrics_str)
  with open('temp.txt', "w") as outfile:
    outfile.write(metrics_str)
  subprocess.run(['gsutil', 'cp', 'temp.txt', eval_info])  


### Create a serving endpoint

Here we're creating a serving endpoint to which we'll deploy the trained model.  If an existing endpoint path is given as an arg, we'll use that instead of creating a new endpoint. 

In [None]:
from kfp.v2.components import OutputPath

def create_endpoint(
    project: str,
    display_name: str,
    endpoint_path: str,    
    location: str, # "us-central1",
    api_endpoint: str, # "us-central1-aiplatform.googleapis.com",
    timeout: int,
    endpoint_id: OutputPath('String'),

):
  import logging
  import subprocess  
  from google.cloud import aiplatform

  logging.getLogger().setLevel(logging.INFO)
  if endpoint_path == 'new':  # then create new endpoint, using given display name
    logging.info('creating new endpoint with display name: %s', display_name)
    client_options = {"api_endpoint": api_endpoint}
    # Initialize client that will be used to create and send requests.
    client = aiplatform.gapic.EndpointServiceClient(client_options=client_options)
    endpoint = {"display_name": display_name}
    parent = f"projects/{project}/locations/{location}"
    response = client.create_endpoint(parent=parent, endpoint=endpoint)
    logging.info("Long running operation: %s", response.operation.name)
    create_endpoint_response = response.result(timeout=timeout)
    logging.info("create_endpoint_response: %s", create_endpoint_response)
    endpoint_name = create_endpoint_response.name 
    logging.info('endpoint name: %s', endpoint_name)
  else:  # otherwise, use given endpoint path expression (TODO: add error checking)
    logging.info('using existing endpoint: %s', endpoint_path)
    endpoint_name = endpoint_path
  # write the endpoint name (path expression) as output
  with open('temp.txt', "w") as outfile:
    outfile.write(endpoint_name)
  subprocess.run(['gsutil', 'cp', 'temp.txt', endpoint_id])


### Deploy the trained model for serving

This function deploys the trained model for serving.  It supports a simple 'gating' on model quality: 
We're passing in the model eval metrics info and will compare that with a dict of metrics threshold values.  If the model doesn't meet the given threshold value(s), it won't be deployed. 

In [None]:
def deploy_automl_tabular_model(
    project: str,
    endpoint_name: str,
    model_name: str,
    deployed_model_display_name: str,
    eval_info: str,
    location: str = "us-central1",
    api_endpoint: str = "us-central1-aiplatform.googleapis.com",
    timeout: int = 7200,
    thresholds_dict_str: str = '{"meanAbsoluteError": 470}'
):
  import json
  import logging
  from google.cloud import aiplatform

  # check the model metrics against the given thresholds dict
  def regression_thresholds_check(metrics_dict, thresholds_dict):
    for k, v in thresholds_dict.items():
      logging.info('k {}, v {}'.format(k, v))
      if k in ['rootMeanSquaredError', 'meanAbsoluteError']:  # lower is better
        if metrics_dict[k] > v:  # if over threshold
          logging.info('{} > {}; returning False'.format(
              metrics_dict[k], v))
          return False
      elif k in ['rSquared']:  # higher is better
        if metrics_dict[k] < v:  # if under threshold
          logging.info('{} < {}; returning False'.format(
              metrics_dict[k], v))
          return False
      else:  # unhandled key in thresholds dict
        # TODO: should the default instead be to deploy?
        logging.info('unhandled threshold key %s; not deploying', k)
        return False
    logging.info('threshold checks passed.')
    return True  

  logging.getLogger().setLevel(logging.INFO)
  metrics_dict = json.loads(eval_info)
  thresholds_dict = json.loads(thresholds_dict_str)
  logging.info('got metrics dict: %s', metrics_dict)
  logging.info('got thresholds dict: %s', thresholds_dict)
  deploy = regression_thresholds_check(metrics_dict, thresholds_dict)
  if not deploy:
    # then don't deploy the model
    logging.warning('model is not accurate enough to deploy')
    return 

  client_options = {"api_endpoint": api_endpoint}
  # Initialize client that will be used to create and send requests.
  client = aiplatform.gapic.EndpointServiceClient(client_options=client_options)
  deployed_model = {
      # format: 'projects/{project}/locations/{location}/models/{model}'
      "model": model_name,
      "display_name": deployed_model_display_name,
      "dedicated_resources": {
          "min_replica_count": 1,
          "machine_spec": {
              "machine_type": "n1-standard-8",
              # Accelerators can be used only if the model specifies a GPU image.
              # 'accelerator_type': aiplatform.AcceleratorType.NVIDIA_TESLA_K80,
              # 'accelerator_count': 1,
          },
      }        
  }
  # key '0' assigns traffic for the newly deployed model
  # Traffic percentage values must add up to 100
  # Leave dictionary empty if endpoint should not accept any traffic
  traffic_split = {"0": 100}
  response = client.deploy_model(
      endpoint=endpoint_name, deployed_model=deployed_model, traffic_split=traffic_split
  )
  print("Long running operation:", response.operation.name)
  deploy_model_response = response.result(timeout=timeout)
  print("deploy_model_response:", deploy_model_response)


### Create components from the python functions

Now we'll use the `func_to_container_op` method to create `yaml` component files for the functions above.  In all cases we'll use as the `base_image` the container we built above.  

The `yaml` component files make it easy to version-track and share component definitions. 

In [None]:
from kfp.v2 import components
from kfp.v2 import dsl
from kfp.v2 import compiler

components.func_to_container_op(create_dataset_tabular_bigquery_sample,
      output_component_file='tables_create_dataset_component.yaml', 
      base_image='gcr.io/{}/custom-container-ucaip:{}'.format(PROJECT_ID, USER))

components.func_to_container_op(training_tabular_regression,
      output_component_file='tables_train_component.yaml', 
      base_image='gcr.io/{}/custom-container-ucaip:{}'.format(PROJECT_ID, USER))

components.func_to_container_op(get_model_evaluation_tabular,
      output_component_file='tables_eval_component.yaml', 
      base_image='gcr.io/{}/custom-container-ucaip:{}'.format(PROJECT_ID, USER))

components.func_to_container_op(create_endpoint,
      output_component_file='tables_endpoint_component.yaml', 
      base_image='gcr.io/{}/custom-container-ucaip:{}'.format(PROJECT_ID, USER))

components.func_to_container_op(deploy_automl_tabular_model,
      output_component_file='tables_deploy_component.yaml', 
      base_image='gcr.io/{}/custom-container-ucaip:{}'.format(PROJECT_ID, USER))


## Define and run a pipeline using the components



First, we'll create pipeline ops from the components, using the `load_component_from_file` method.

While we're not using it here, there is also a `load_component_from_url` method, which is handy if your component files are checked into a repo or otherwise stored online. (For GitHub files, use the 'raw' URL).

In [None]:
import json
from kfp.v2 import dsl
from kfp.v2 import compiler
from kfp.v2 import components

create_dataset_op = components.load_component_from_file(
  './tables_create_dataset_component.yaml'
  )

train_op = components.load_component_from_file(
  './tables_train_component.yaml'
  )

eval_op = components.load_component_from_file(
  './tables_eval_component.yaml'
  )

create_endpoint_op = components.load_component_from_file(
  './tables_endpoint_component.yaml'
  )

deploy_op = components.load_component_from_file(
  './tables_deploy_component.yaml'
  )



Next, we'll define the pipeline, using the ops defined above.

In [None]:
import json
# We'll use this transformation specification as an arg for the training step.
TRANSFORMATIONS = [
    {"auto": {"column_name": "bike_id"}},
    {"auto": {"column_name": "day_of_week"}},
    {"auto": {"column_name": "dewp"}},
    {"auto": {"column_name": "duration"}},
    {"auto": {"column_name": "end_latitude"}},
    {"auto": {"column_name": "end_longitude"}},
    {"categorical": {"column_name": "end_station_id"}},
    {"auto": {"column_name": "euclidean"}},
    {"categorical": {"column_name": "loc_cross"}},
    {"auto": {"column_name": "max"}},
    {"auto": {"column_name": "min"}},
    {"auto": {"column_name": "prcp"}},
    {"auto": {"column_name": "start_latitude"}},
    {"auto": {"column_name": "start_longitude"}},
    {"categorical": {"column_name": "start_station_id"}},
    {"auto": {"column_name": "temp"}},
    {"timestamp": {"column_name": "ts"}}
]
TRANSFORMATIONS_STR = json.dumps(TRANSFORMATIONS)

Some of the pipeline steps take as inputs the outputs from other ops.

Note that the `create_endpoint` step has no ordering constraints and can run right away, so it can run concurrently with the steps that create a dataset and train a model.  

In [None]:
@dsl.pipeline(
  name='ucaip-automl-tables',
  description='Demonstrate a ucaip AutoML Tables workflow'
)
def automl_tables( 
  gcp_project_id: str = 'laah-pieline',
  gcp_region: str = 'us-central1',
  dataset_display_name: str = 'mptest1612815613',
  api_endpoint: str = 'us-central1-aiplatform.googleapis.com',
  timeout: int = 2000,
  bigquery_uri: str = 'bq://aju-dev-demos.london_bikes_weather.bikes_weather',
  target_col_name: str = 'duration',
  time_col_name: str = 'none',    
  transformations: str = TRANSFORMATIONS_STR,
  train_budget_milli_node_hours: int = 1000,
  model_prefix: str = 'bwmodel',    
  # optimization_objective: str = 'minimize-rmse', 
  training_display_name: str = 'laah-training',
  endpoint_display_name = 'laah-endpoint',
  # if set to other than 'new', use the given endpoint path rather than create new endpoint.  
  endpoint_path:str = 'new',
  thresholds_dict_str = '{"meanAbsoluteError": 470}'
  ):

  create_dataset = create_dataset_op(
    gcp_project_id,
    dataset_display_name,
    bigquery_uri,
    gcp_region,
    api_endpoint,
    timeout
    )
  
  train = train_op(
    gcp_project_id,
    training_display_name,
    create_dataset.outputs['dataset_id'],
    model_prefix,
    target_col_name,
    transformations,
    gcp_region,
    api_endpoint
    )
  
  eval = eval_op(
    gcp_project_id,
    train.outputs['model_id'],
    gcp_region,
    api_endpoint
    )
  
  create_endpoint = create_endpoint_op(
    gcp_project_id,
    dataset_display_name,
    endpoint_path,
    gcp_region,
    api_endpoint,
    timeout
  )

  deploy = deploy_op(
    gcp_project_id,
    create_endpoint.outputs['endpoint_id'],
    train.outputs['model_id'],
    train.outputs['model_dispname'],
    eval.outputs['eval_info'],
    gcp_region,
    api_endpoint,
    timeout,
    thresholds_dict_str
  )

Compile the pipeline...

In [None]:
compiler.Compiler().compile(pipeline_func=automl_tables,
                            pipeline_root=PIPELINE_ROOT,
                            output_path='automl_pipeline_spec.json')

... then run it.

In [None]:
import time
from aiplatform.pipelines import client

api_client = client.Client(project_id=PROJECT_ID, region=REGION, api_key=API_KEY)
display_name = 'mptest{}'.format(str(int(time.time())))
print(display_name)


Note that we can define pipeline input values via the `parameter_values` arg.

In [None]:
result = api_client.create_run_from_job_spec(
          job_spec_path='automl_pipeline_spec.json',
#           pipeline_root=PIPELINE_ROOT,  # you can add this arg if you want to override the compiled value
          parameter_values={'gcp_project_id': '{}'.format(PROJECT_ID),
                           'dataset_display_name': display_name,
                            'endpoint_display_name': display_name,
                            'training_display_name': display_name,
                            'thresholds_dict_str': '{"meanAbsoluteError": 470}'
                           })

Visit the running pipeline job in the Cloud Console by clicking the link above. As it runs, you should see a graph like the following.  

<a href="https://storage.googleapis.com/amy-jo/images/automl/ucaip_automl_tabular_pipeline_in_progress.png" target="_blank"><img src="https://storage.googleapis.com/amy-jo/images/automl/ucaip_automl_tabular_pipeline_in_progress.png" width="90%"/></a>

You can view and manage information about your dataset, model, and endpoint in the [Cloud Console](https://console.cloud.google.com/ai/platform/models) as well.


## (TODO) Using your deployed model for prediction

...

In [None]:
from google.cloud import aiplatform

def predict_custom_model_sample(endpoint: str, instance: dict, parameters_dict: dict):
    client_options = dict(api_endpoint="us-central1-prediction-aiplatform.googleapis.com")
    client = aiplatform.gapic.PredictionServiceClient(client_options=client_options)

    from google.protobuf import json_format
    from google.protobuf.struct_pb2 import Value

    # The format of the parameters must be consistent with what the model expects.
    parameters = json_format.ParseDict(parameters_dict, Value())

    # The format of the instances must be consistent with what the model expects.
    instances_list = [instance]
    instances = [json_format.ParseDict(s, Value()) for s in instances_list]
    response = client.predict(
        endpoint=endpoint, instances=instances, parameters=parameters
    )

    print("response")
    print(" deployed_model_id:", response.deployed_model_id)
    predictions = response.predictions
    print("predictions")
    for prediction in predictions:
        print(" prediction:", dict(prediction))

In [None]:
endpoint_path = "projects/467744782358/locations/us-central1/endpoints/6770352799193497600"  # aju temp testing
instance1 =  {
      "bike_id": "5373",
      "day_of_week": "3",
      "end_latitude": 51.52059681,
      "end_longitude": -0.116688468,
      "end_station_id": "68",
      "euclidean": 3589.5146210024977,
      "loc_cross": "POINT(-0.07 51.52)POINT(-0.12 51.52)",
      "max": 44.6,
      "min": 34.0,
      "prcp": 0,
      "ts": "1480407420",
      "start_latitude": 51.52388,
      "start_longitude": -0.065076,
      "start_station_id": "445",
      "temp": 38.2,
      "dewp": 28.6
    }

predict_custom_model_sample(
    endpoint_path,
    instance1, {}
)

-----------------------------
Copyright 2020 Google LLC

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

     http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.