# 03 How to use a linear_regression model to predict {placeholder} using BigQuery ML

### Summary

This project used a linear regression model in BigQuery ML to **predict** the {PLACEHOLDER}
## Objective 
1. Create a linear regression model 
1. Evaluate the linear regression model 
1. Make the penguin weight predictions using the linear regression model. 

## Key Concepts
1. Linear regression 
1. Explainable AI
1. ML.EVALUATE
1. ML PREDICT

## steps
1. Create the dataset and dataset table
1. Use the SELECT statement to examine the data 
1. Use the CREATE VIEW statement to compile your training data
1. Use the CREATE MODEL statement to create your linear regression model. 
1. Use the ML.EVALUATE function to evaluate the model data
1. Use the ML.PREDICT function to predict the penguin weight for a given set of data
1. Use the ML.EXPLAIN_PREDICT function to explain prediction results with explainable AI Methods. 
1. Use the ML.GLOBAL_EXPLAIN function to know which features are the most important to determine the weight. 

 


### Dataset
Dataset was retrieved from the BQ public dataset of [`penguin data `](https://pantheon.corp.google.com/bigquery?p=bigquery-public-data&d=census_bureau_usa&page=dataset)has about 342 rows of data.


### Install additional packages

Install the following packages required to execute this notebook. 

In [None]:
import os

# The Vertex AI Workbench Notebook product has specific requirements
IS_WORKBENCH_NOTEBOOK = os.getenv("DL_ANACONDA_HOME")
IS_USER_MANAGED_WORKBENCH_NOTEBOOK = os.path.exists(
    "/opt/deeplearning/metadata/env_version"
)

# Vertex AI Notebook requires dependencies to be installed with '--user'
USER_FLAG = ""
if IS_WORKBENCH_NOTEBOOK:
    USER_FLAG = "--user"

! pip3 install --upgrade google-cloud-aiplatform {USER_FLAG} -q google-cloud-bigquery db-dtypes

### Restart the kernel

After you install the additional packages, you need to restart the notebook kernel so it can find the packages.

In [2]:
# Automatically restart kernel after installs
import os

if not os.getenv("IS_TESTING"):
    # Automatically restart kernel after installs
    import IPython

    app = IPython.Application.instance()
    app.kernel.do_shutdown(True)

## Project Variables 

In [11]:
# Project variables 
#
# These are the project variable used in this ML Model: 
#

----MODIFY THIS

PROJECT_ID = "" # @param {type:"string"}
bqml_type = "linear_reg" # @param {type:"string"}
BQML_MODEL_NAME = "bqml_lin_reg_penguin_weight_predict_model"
job_display_name = BQML_MODEL_NAME + "_job"
ENDPOINT_NAME = BQML_MODEL_NAME + "_endpoint"

# datasets
BQ_DATASET_NAME = BQML_MODEL_NAME + "_dataset"
sql_create_dataset = f"""CREATE SCHEMA IF NOT EXISTS {BQ_DATASET_NAME}"""
BQ_PUBLIC_DATASET = "bigquery-public-data.ml_datasets.penguins" 

# bucket details
BUCKET_NAME = "bqml_tutorials"
BUCKET_URI = f"gs://{BUCKET_NAME}/{bqml_type}/"
OUTPUTBUCKET = f"gs://bqml_datasets_predictions/{bqml_type}/"

# Region 
REGION = "" # @param {type: "string"} 

Otherwise, set your project ID here.

#### Region

You can also change the `REGION` variable, which is used for operations
throughout the rest of this notebook.  Below are regions supported for Vertex AI. We recommend that you choose the region closest to you.

- Americas: `us-central1`
- Europe: `europe-west4`
- Asia Pacific: `asia-east1`

You might not be able to use a multi-regional bucket for training with Vertex AI. Not all regions provide support for all Vertex AI services.

Learn more about <a href="https://cloud.google.com/vertex-ai/docs/general/locations" target="_blank">Vertex AI regions</a>.

In [2]:
# If you are running this notebook in Colab, run this cell and follow the
# instructions to authenticate your GCP account. This provides access to your
# Cloud Storage bucket and lets you submit training jobs and prediction
# requests.

import os
import sys

# If on Vertex AI Workbench, then don't execute this code
IS_COLAB = "google.colab" in sys.modules
if not os.path.exists("/opt/deeplearning/metadata/env_version") and not os.getenv(
    "DL_ANACONDA_HOME"
):
    if "google.colab" in sys.modules:
        from google.colab import auth as google_auth

        google_auth.authenticate_user()

    # If you are running this notebook locally, replace the string below with the
    # path to your service account key and run this cell to authenticate your GCP
    # account.
    elif not os.getenv("IS_TESTING"):
        %env GOOGLE_APPLICATION_CREDENTIALS ''

In [3]:
SERVICE_ACCOUNT = ""  # @param {type:"string"}
print(SERVICE_ACCOUNT)




In [None]:
if (
    SERVICE_ACCOUNT == ""
    or SERVICE_ACCOUNT is None
    or SERVICE_ACCOUNT == "[your-service-account]"
):
    # Get your service account from gcloud
    if not IS_COLAB:
        shell_output = !gcloud auth list 2>/dev/null
        SERVICE_ACCOUNT = shell_output[2].replace("*", "").strip()

    else:  # IS_COLAB:
        shell_output = ! gcloud projects describe  $PROJECT_ID
        project_number = shell_output[-1].split(":")[1].strip().replace("'", "")
        SERVICE_ACCOUNT = f"{project_number}-compute@developer.gserviceaccount.com"

    print("Service Account:", SERVICE_ACCOUNT)

### Import libraries

In [5]:
from typing import Union

import google.cloud.aiplatform as vertex_ai
import pandas as pd
from google.cloud import bigquery

### Initialize Vertex AI and BigQuery SDKs for Python

Initialize the Vertex AI SDK for Python for your project and corresponding bucket.

In [6]:
vertex_ai.init(project=PROJECT_ID, location=REGION)

Create the BigQuery client.

In [7]:
bq_client = bigquery.Client(project=PROJECT_ID)

Use a helper function for sending queries to BigQuery.

In [8]:
# Wrapper to use BigQuery client to run query/job, return job ID or result as DF
def run_bq_query(sql: str) -> Union[str, pd.DataFrame]:
    """
    Input: SQL query, as a string, to execute in BigQuery
    Returns the query results as a pandas DataFrame, or error, if any
    """

    # Try dry run before executing query to catch any errors
    job_config = bigquery.QueryJobConfig(dry_run=True, use_query_cache=False)
    bq_client.query(sql, job_config=job_config)

    # If dry run succeeds without errors, proceed to run query
    job_config = bigquery.QueryJobConfig()
    client_result = bq_client.query(sql, job_config=job_config)

    job_id = client_result.job_id

    # Wait for query/job to finish running. then get & return data frame
    df = client_result.result().to_arrow().to_pandas()
    print(f"Finished job_id: {job_id}")
    return df

## BigQuery ML Model Training & Validation

BigQuery ML (BQML) provides the capability to train ML tabular models, such as classification, regression, forecasting, and matrix factorization, in BigQuery using SQL syntax directly. BigQuery ML uses the scalable infrastructure of BigQuery ML so you don't need to set up additional infrastructure for training or batch serving.

In [9]:
sql_create_dataset = f"""CREATE SCHEMA IF NOT EXISTS {BQ_DATASET_NAME}"""

print(sql_create_dataset)

run_bq_query(sql_create_dataset)

CREATE SCHEMA IF NOT EXISTS bqml_lin_reg_penguin_weight_predict_model_dataset
Finished job_id: ef175140-840c-4947-9d16-933a25a6612c


### Use the SELECT statement to examine the data

Dataset was retrieved from the BQ public dataset of [`penguin data `](https://pantheon.corp.google.com/bigquery?p=bigquery-public-data&d=census_bureau_usa&page=dataset)has about 342 rows of data.

Inpect data that has been pre-processed from   [`penguin data `] so that it can be used for regression analysis.

The data view results show that the `body_mass_g` column in the penguins table has linear values.



In [15]:
sql_data_inspect = f"""

SELECT
 species,
 island,
 culmen_length_mm,
 culmen_depth_mm,
 Flipper_length_mm,
 body_mass_g,
 sex
FROM
 {BQ_PUBLIC_DATASET}
LIMIT
 100;
"""
run_bq_query(sql_data_inspect)



Finished job_id: 9e4533b6-0a4d-4cf9-bd8f-906544e51e20


Unnamed: 0,species,island,culmen_length_mm,culmen_depth_mm,Flipper_length_mm,body_mass_g,sex
0,Adelie Penguin (Pygoscelis adeliae),Dream,36.6,18.4,184.0,3475.0,FEMALE
1,Adelie Penguin (Pygoscelis adeliae),Dream,39.8,19.1,184.0,4650.0,MALE
2,Adelie Penguin (Pygoscelis adeliae),Dream,40.9,18.9,184.0,3900.0,MALE
3,Chinstrap penguin (Pygoscelis antarctica),Dream,46.5,17.9,192.0,3500.0,FEMALE
4,Adelie Penguin (Pygoscelis adeliae),Dream,37.3,16.8,192.0,3000.0,FEMALE
...,...,...,...,...,...,...,...
95,Adelie Penguin (Pygoscelis adeliae),Dream,36.5,18.0,182.0,3150.0,FEMALE
96,Adelie Penguin (Pygoscelis adeliae),Dream,41.1,19.0,182.0,3425.0,MALE
97,Adelie Penguin (Pygoscelis adeliae),Dream,36.0,17.9,190.0,3450.0,FEMALE
98,Adelie Penguin (Pygoscelis adeliae),Dream,41.1,17.5,190.0,3900.0,MALE


### Use the CREATE MODEL Statement to create your linear regression model


Next, we used the CREATE MODEL statement to train the new linear regression model with the option 'LINEAR_REG' on the view from the previous query.


In the `OPTIONS` parameter:
* with `model_registry="vertex_ai"`, the BigQuery ML model will automatically be <a href="https://cloud.google.com/vertex-ai/docs/model-registry/model-registry-bqml" target="_blank">registered to Vertex AI Model Registry</a>, which enables you to view all of your registered models and its versions on Google Cloud in one place.

* `vertex_ai_model_version_aliases allows you to set aliases to help you keep track of your model version (<a href="https://cloud.google.com/vertex-ai/docs/model-registry/model-alias" target="_blank">documentation</a>).

In [17]:
# this cell may take ~1 min to run

sql_train_model_bqml = f"""
CREATE OR REPLACE MODEL {BQ_DATASET_NAME}.{BQML_MODEL_NAME}
OPTIONS
 (
   model_type='linear_reg',
   input_label_cols=['body_mass_g'], -- prediction column
   model_registry = "vertex_ai",
    vertex_ai_model_version_aliases = ['linear_reg', 'experimental']
 ) AS 
SELECT
 *
FROM
 `{BQ_PUBLIC_DATASET}`
WHERE
 body_mass_g IS NOT NULL
"""

print(sql_train_model_bqml)

run_bq_query(sql_train_model_bqml)


CREATE OR REPLACE MODEL bqml_lin_reg_penguin_weight_predict_model_dataset.bqml_lin_reg_penguin_weight_predict_model
OPTIONS
 (
   model_type='linear_reg',
   input_label_cols=['body_mass_g'], -- prediction column
   model_registry = "vertex_ai",
    vertex_ai_model_version_aliases = ['logistic_reg', 'experimental']
 ) AS 
SELECT
 *
FROM
 `bigquery-public-data.ml_datasets.penguins`
WHERE
 body_mass_g IS NOT NULL

Finished job_id: ddc8f217-7172-4f74-86e9-6e4d843a04b4


### Use the ML.EVALUATE function to evaluate the model data

With the model created, you can now evaluate the linear regression model. Behind the scenes, BigQuery ML automatically <a href="https://cloud.google.com/bigquery-ml/docs/reference/standard-sql/bigqueryml-syntax-create#data_split_method" target="_blank">split the data</a>, which makes it easier to quickly train and evaluate models.

In [18]:
sql_evaluate_model = f"""

SELECT
 *
FROM
 ML.EVALUATE(MODEL {BQ_DATASET_NAME}.{BQML_MODEL_NAME},
   (
   SELECT
     *
   FROM
     {BQ_PUBLIC_DATASET}
   WHERE
     body_mass_g IS NOT NULL) )
"""

print(sql_evaluate_model)

run_bq_query(sql_evaluate_model)




SELECT
 *
FROM
 ML.EVALUATE(MODEL bqml_lin_reg_penguin_weight_predict_model_dataset.bqml_lin_reg_penguin_weight_predict_model,
   (
   SELECT
     *
   FROM
     bigquery-public-data.ml_datasets.penguins
   WHERE
     body_mass_g IS NOT NULL) )

Finished job_id: 8d5dcf0a-0a87-480e-9c5c-2770677915b0




Unnamed: 0,mean_absolute_error,mean_squared_error,mean_squared_log_error,median_absolute_error,r2_score,explained_variance
0,227.012237,81838.159892,0.00507,173.080816,0.872377,0.872377


These metrics help you understand the performance of the model. 

**Mean Absolute Error**: MAE is the average absolute difference between the expected and predicted values across all training examples. 

**Mean Squared Error**: The average squared loss per example. MSE is calculated by dividing the squared loss by the number of examples. 

**Mean Squared Log Error:** can be interpreted as a measure of the ratio between the true and predicted values. 

**Median Absolute Error**: The loss is calculated by taking the median of all absolute differences between the target and the prediction. 

*R2* - The R2 score is a statistical measure that determines if the linear regression predictions approximate the actual data. ***0*** indicates that the model explains none of the variability of the response data around the mean. ***1*** indicates that the model explains all the variability of the response data around the mean.


There are various metrics for linear regression and other model types (full list of metrics can be found in the <a href="https://cloud.google.com/bigquery-ml/docs/reference/standard-sql/bigqueryml-syntax-evaluate#mlevaluate_output" target="_blank">documentation</a>).

### Use the ML.PREDICT function to predict the penguin weight fora given set of data

Next, the ML.PREDICT function was used to predict the penguin weight using the penguins_model.

In [19]:
sql_ml_predict = f"""
SELECT
 *
FROM
 ML.PREDICT(MODEL {BQ_DATASET_NAME}.{BQML_MODEL_NAME},
   (
   SELECT
     *
   FROM
     {BQ_PUBLIC_DATASET}
   WHERE
     body_mass_g IS NOT NULL
     AND island = "Biscoe"))
"""

print(sql_ml_predict)

run_bq_query(sql_ml_predict)


SELECT
 *
FROM
 ML.PREDICT(MODEL bqml_lin_reg_penguin_weight_predict_model_dataset.bqml_lin_reg_penguin_weight_predict_model,
   (
   SELECT
     *
   FROM
     bigquery-public-data.ml_datasets.penguins
   WHERE
     body_mass_g IS NOT NULL
     AND island = "Biscoe"))





Finished job_id: f2793713-44fa-4f37-b018-de36c9ce33d1


Unnamed: 0,predicted_body_mass_g,species,island,culmen_length_mm,culmen_depth_mm,flipper_length_mm,body_mass_g,sex
0,3875.224470,Adelie Penguin (Pygoscelis adeliae),Biscoe,39.7,18.9,184.0,3550.0,MALE
1,3303.096891,Adelie Penguin (Pygoscelis adeliae),Biscoe,36.4,17.1,184.0,2850.0,FEMALE
2,3976.529009,Adelie Penguin (Pygoscelis adeliae),Biscoe,41.6,18.0,192.0,3950.0,MALE
3,3457.923587,Adelie Penguin (Pygoscelis adeliae),Biscoe,35.0,17.9,192.0,3725.0,FEMALE
4,3980.584958,Adelie Penguin (Pygoscelis adeliae),Biscoe,41.1,18.2,192.0,4050.0,MALE
...,...,...,...,...,...,...,...,...
162,4791.928703,Gentoo penguin (Pygoscelis papua),Biscoe,46.8,14.3,215.0,4850.0,FEMALE
163,4884.431154,Gentoo penguin (Pygoscelis papua),Biscoe,47.2,15.5,215.0,4975.0,FEMALE
164,5425.216588,Gentoo penguin (Pygoscelis papua),Biscoe,50.7,15.0,223.0,5550.0,MALE
165,5413.510134,Gentoo penguin (Pygoscelis papua),Biscoe,45.2,16.4,223.0,5950.0,MALE


### results
When the function runs, it generates a new column called `predicted_body_mass_g` column.

These results are compared to the original `body_mass_g` column which was present in the data


###  Use the ML.EXPLAIN_PREDICT function to know which features are the most important to determine the weight.

To understand why the model is generating these prediction results, you can use the ML.EXPLAIN_PREDICT function.


<a href="https://cloud.google.com/bigquery-ml/docs/reference/standard-sql/bigqueryml-syntax-explain-predict" target="_blank">ML.EXPLAIN_PREDICT</a> has built-in <a href="https://cloud.google.com/bigquery-ml/docs/reference/standard-sql/bigqueryml-syntax-xai-overview" target="_blank">Explainable AI</a>. This allows you to see the top contributing features to each prediction and interpret how it was computed.

In [20]:
sql_explain_predict = f"""

SELECT
 *
FROM
 ML.EXPLAIN_PREDICT (MODEL {BQ_DATASET_NAME}.{BQML_MODEL_NAME}, (
 SELECT
   *
 FROM
   {BQ_PUBLIC_DATASET}
 WHERE
   body_mass_g IS NOT NULL
   AND island = "Biscoe"),
 STRUCT(3 AS top_k_features));
"""

print(sql_explain_predict)

run_bq_query(sql_explain_predict)



SELECT
 *
FROM
 ML.EXPLAIN_PREDICT (MODEL bqml_lin_reg_penguin_weight_predict_model_dataset.bqml_lin_reg_penguin_weight_predict_model, (
 SELECT
   *
 FROM
   bigquery-public-data.ml_datasets.penguins
 WHERE
   body_mass_g IS NOT NULL
   AND island = "Biscoe"),
 STRUCT(3 AS top_k_features));





Finished job_id: 2319da2c-4907-414c-be37-b3fa394d2a3d


Unnamed: 0,predicted_body_mass_g,top_feature_attributions,baseline_prediction_value,prediction_value,approximation_error,species,island,culmen_length_mm,culmen_depth_mm,flipper_length_mm,body_mass_g,sex
0,3875.224470,"[{'feature': 'species', 'attribution': 18611.0...",-11113.255583,3875.224470,0.0,Adelie Penguin (Pygoscelis adeliae),Biscoe,39.7,18.9,184.0,3550.0,MALE
1,3303.096891,"[{'feature': 'species', 'attribution': 18611.0...",-11113.255583,3303.096891,0.0,Adelie Penguin (Pygoscelis adeliae),Biscoe,36.4,17.1,184.0,2850.0,FEMALE
2,3976.529009,"[{'feature': 'species', 'attribution': 18611.0...",-11113.255583,3976.529009,0.0,Adelie Penguin (Pygoscelis adeliae),Biscoe,41.6,18.0,192.0,3950.0,MALE
3,3457.923587,"[{'feature': 'species', 'attribution': 18611.0...",-11113.255583,3457.923587,0.0,Adelie Penguin (Pygoscelis adeliae),Biscoe,35.0,17.9,192.0,3725.0,FEMALE
4,3980.584958,"[{'feature': 'species', 'attribution': 18611.0...",-11113.255583,3980.584958,0.0,Adelie Penguin (Pygoscelis adeliae),Biscoe,41.1,18.2,192.0,4050.0,MALE
...,...,...,...,...,...,...,...,...,...,...,...,...
162,4791.928703,"[{'feature': 'species', 'attribution': 19598.0...",-11113.255583,4791.928703,0.0,Gentoo penguin (Pygoscelis papua),Biscoe,46.8,14.3,215.0,4850.0,FEMALE
163,4884.431154,"[{'feature': 'species', 'attribution': 19598.0...",-11113.255583,4884.431154,0.0,Gentoo penguin (Pygoscelis papua),Biscoe,47.2,15.5,215.0,4975.0,FEMALE
164,5425.216588,"[{'feature': 'species', 'attribution': 19598.0...",-11113.255583,5425.216588,0.0,Gentoo penguin (Pygoscelis papua),Biscoe,50.7,15.0,223.0,5550.0,MALE
165,5413.510134,"[{'feature': 'species', 'attribution': 19598.0...",-11113.255583,5413.510134,0.0,Gentoo penguin (Pygoscelis papua),Biscoe,45.2,16.4,223.0,5950.0,MALE


### Results
When the function runs, it generates a new column called `top_feature_attribution.feature,top_feature_attribution.attribution` columns. The attributions are sorted by the absolute value of the attribution in descending order. In this case, **Island** was the top feature contributing to the **body weight prediction.**

### Inspect the model on Vertex AI Model Registry

When the model was trained in BigQuery ML, the line `model_registry="vertex_ai"` registered the model to Vertex AI Model Registry automatically upon completion.

You can view the model on the <a href="https://console.cloud.google.com/vertex-ai/models" target="_blank">Vertex AI Model Registry page</a>, or use the code below to check that it was successfully registered:

In [21]:
model = vertex_ai.Model(model_name=BQML_MODEL_NAME)

print(model.gca_resource)

name: "projects/993987777814/locations/us-central1/models/bqml_lin_reg_penguin_weight_predict_model"
display_name: "bqml_lin_reg_penguin_weight_predict_model"
supported_deployment_resources_types: DEDICATED_RESOURCES
supported_input_storage_formats: "jsonl"
supported_input_storage_formats: "bigquery"
supported_input_storage_formats: "csv"
supported_input_storage_formats: "tf-record"
supported_input_storage_formats: "tf-record-gzip"
supported_input_storage_formats: "file-list"
supported_output_storage_formats: "jsonl"
supported_output_storage_formats: "bigquery"
create_time {
  seconds: 1667769612
  nanos: 103444000
}
update_time {
  seconds: 1667769619
  nanos: 271291000
}
etag: "AMEw9yNtsiflZeeFjB6O1Cx_mtrzNHX5wML_XH-iKp76u9_eLVlj5qL0mPKzaR0CKgSI"
version_id: "1"
version_aliases: "logistic_reg"
version_aliases: "experimental"
version_aliases: "default"
version_create_time {
  seconds: 1667769612
  nanos: 103444000
}
version_update_time {
  seconds: 1667769619
  nanos: 155571000
}
mode

### Deploy the model to an endpoint

While BigQuery ML supports batch prediction with <a href="https://cloud.google.com/bigquery-ml/docs/reference/standard-sql/bigqueryml-syntax-predict" target="_blank">ML.PREDICT</a> and <a href="https://cloud.google.com/bigquery-ml/docs/reference/standard-sql/bigqueryml-syntax-explain-predict" target="_blank">ML.EXPLAIN_PREDICT</a>, BigQuery ML is not suitable for real-time predictions where you need low latency predictions with potentially high frequency of requests.

In other words, deploying the BigQuery ML model to an endpoint enables you to do online predictions.

#### Create a Vertex AI endpoint

To deploy your model to an endpoint, you will first need to create an endpoint before you deploy the model to it.

In [22]:

endpoint = vertex_ai.Endpoint.create(
    display_name=ENDPOINT_NAME,
    project=PROJECT_ID,
    location=REGION,
)

print(endpoint.display_name)
print(endpoint.resource_name)

Creating Endpoint


INFO:google.cloud.aiplatform.models:Creating Endpoint


Create Endpoint backing LRO: projects/993987777814/locations/us-central1/endpoints/1351576867466903552/operations/9036940288358088704


INFO:google.cloud.aiplatform.models:Create Endpoint backing LRO: projects/993987777814/locations/us-central1/endpoints/1351576867466903552/operations/9036940288358088704


Endpoint created. Resource name: projects/993987777814/locations/us-central1/endpoints/1351576867466903552


INFO:google.cloud.aiplatform.models:Endpoint created. Resource name: projects/993987777814/locations/us-central1/endpoints/1351576867466903552


To use this Endpoint in another session:


INFO:google.cloud.aiplatform.models:To use this Endpoint in another session:


endpoint = aiplatform.Endpoint('projects/993987777814/locations/us-central1/endpoints/1351576867466903552')


INFO:google.cloud.aiplatform.models:endpoint = aiplatform.Endpoint('projects/993987777814/locations/us-central1/endpoints/1351576867466903552')


bqml_lin_reg_penguin_weight_predict_model_endpoint
projects/993987777814/locations/us-central1/endpoints/1351576867466903552


#### List endpoints

List the endpoints to make sure it has successfully been created. (You can also view your endpoints on the <a href="https://console.cloud.google.com/vertex-ai/endpoints" target="_blank">Vertex AI Endpoints page</a>).

In [23]:
endpoint.list()

[<google.cloud.aiplatform.models.Endpoint object at 0x7f4157b67350> 
 resource name: projects/993987777814/locations/us-central1/endpoints/1351576867466903552,
 <google.cloud.aiplatform.models.Endpoint object at 0x7f4157966c10> 
 resource name: projects/993987777814/locations/us-central1/endpoints/1139626210003779584,
 <google.cloud.aiplatform.models.Endpoint object at 0x7f4156e68f10> 
 resource name: projects/993987777814/locations/us-central1/endpoints/4502689231742697472,
 <google.cloud.aiplatform.models.Endpoint object at 0x7f4156edb650> 
 resource name: projects/993987777814/locations/us-central1/endpoints/369919692048957440,
 <google.cloud.aiplatform.models.Endpoint object at 0x7f4156edb2d0> 
 resource name: projects/993987777814/locations/us-central1/endpoints/1236862620317777920,
 <google.cloud.aiplatform.models.Endpoint object at 0x7f4156ed5c50> 
 resource name: projects/993987777814/locations/us-central1/endpoints/5420706674144968704,
 <google.cloud.aiplatform.models.Endpoint

#### Deploy model to Vertex endpoint

With the new endpoint, you can now deploy your model.

In [24]:
# deploying the model to the endpoint may take 10-15 minutes
model.deploy(endpoint=endpoint)

Deploying model to Endpoint : projects/993987777814/locations/us-central1/endpoints/1351576867466903552


INFO:google.cloud.aiplatform.models:Deploying model to Endpoint : projects/993987777814/locations/us-central1/endpoints/1351576867466903552


Using default machine_type: n1-standard-2


INFO:google.cloud.aiplatform.models:Using default machine_type: n1-standard-2


Deploy Endpoint model backing LRO: projects/993987777814/locations/us-central1/endpoints/1351576867466903552/operations/8136783312837410816


INFO:google.cloud.aiplatform.models:Deploy Endpoint model backing LRO: projects/993987777814/locations/us-central1/endpoints/1351576867466903552/operations/8136783312837410816


Endpoint model deployed. Resource name: projects/993987777814/locations/us-central1/endpoints/1351576867466903552


INFO:google.cloud.aiplatform.models:Endpoint model deployed. Resource name: projects/993987777814/locations/us-central1/endpoints/1351576867466903552


<google.cloud.aiplatform.models.Endpoint object at 0x7f41582ab350> 
resource name: projects/993987777814/locations/us-central1/endpoints/1351576867466903552

You can also check on the status of your model by visiting the <a href="https://console.cloud.google.com/vertex-ai/endpoints" target="_blank">Vertex AI Endpoints page</a>.

### Make online predictions to the endpoint

Using a sample of the training data, you can test the endpoint to make online predictions. We are predicting the "body_mass_g" value. 

**SCHEMA DETAILS**: 

species	STRING	REQUIRED			
island	STRING	NULLABLE			
culmen_length_mm	FLOAT	NULLABLE			
culmen_depth_mm	FLOAT	NULLABLE			
flipper_length_mm	FLOAT	NULLABLE			
body_mass_g	FLOAT	NULLABLE			
sex	STRING	NULLABLE


In [28]:
df_sample_requests_list = [
    {
        "species": "Adelie Penguin (Pygoscelis adeliae)",
        "island": "Dream",
        "culmen_length_mm": 39.8,
        "culmen_depth_mm": 17.9,
        "Flipper_length_mm": 192.0,
        "sex": "FEMALE",

    },
    {
        "species": "Adelie Penguin (Pygoscelis adeliae)",
        "island": "Dream",
        "culmen_length_mm": 39.8,
        "culmen_depth_mm": 17.9,
        "Flipper_length_mm": 192.0,
        "sex": "MALE",
    }
]

In [None]:
prediction = endpoint.predict(df_sample_requests_list)
print(prediction)

You can then extract the predictions from the prediction response

In [None]:
prediction.predictions

## Cleaning up

To clean up all Google Cloud resources used in this project, you can <a href="https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects" target="_blank">delete the Google Cloud
project</a> you used for the tutorial.

Otherwise, you can delete the individual resources you created in this tutorial:

In [None]:
# Undeploy model from endpoint and delete endpoint
endpoint.undeploy_all()
endpoint.delete()

# Delete BigQuery dataset, including the BigQuery ML model
! bq rm -r -f $PROJECT_ID:$BQ_DATASET_NAME