
### How to use a logistic classification regression model to predict Income bucket on census data using  BigQueryML


### Summary

This project uses a logisitic classification regression model to predict churn for real-time inference, with the data in BigQuery and model trained using BigQuery ML, registered to Vertex AI Model Registry, and deployed to an endpoint on Vertex AI for online predictions.

This tutorial uses the following Google Cloud data analytics and ML services:

## Key Concepts
- BigQuery
- BigQuery ML
- Vertex AI Model Registry
- Vertex endpoints
1. Logistic regression 
1. Explainable AI
1. ML.EVALUATE
1. ML PREDICT

## steps
1. Create the dataset 
1. Use the SELECT statement to examine the data 
1. Use the CREATE VIEW statement to compile your training data
1. Use the CREATE MODEL statement to create your logistic regression model. 
1. Use the ML.EVALUATE function to evaluate the model data
1. Use the ML.PREDICT function to predict the income bracket for a given set of census participants.
1. Use the ML.EXPLAIN_PREDICT function to explain prediction results with explainable AI Methods. 
1. Use the ML.GLOBAL_EXPLAIN function to know which features are the most important to determine the income bracket. 

#### Execute notebook in Colab
<a href="https://colab.research.google.com/github/paulycloud/ml_portfolio/blob/main/AutoML/01_image_classification/02_salad_categories/index.ipynb">
    <img src="https://cloud.google.com/ml-engine/images/colab-logo-32px.png" alt="Colab logo"> Run in Colab
</a>


### Dataset

 The dataset [`census_adult_income`](https://cloud.google.com/bigquery?p=bigquery-public-data&d=census_bureau_usa&page=dataset) has about 32561 rows of data. 

### Install additional packages

Install the following packages required to execute this notebook. 

In [1]:
import os

# The Vertex AI Workbench Notebook product has specific requirements
IS_WORKBENCH_NOTEBOOK = os.getenv("DL_ANACONDA_HOME")
IS_USER_MANAGED_WORKBENCH_NOTEBOOK = os.path.exists(
    "/opt/deeplearning/metadata/env_version"
)

# Vertex AI Notebook requires dependencies to be installed with '--user'
USER_FLAG = ""
if IS_WORKBENCH_NOTEBOOK:
    USER_FLAG = "--user"

! pip3 install --upgrade google-cloud-aiplatform {USER_FLAG} -q google-cloud-bigquery db-dtypes

[K     |████████████████████████████████| 2.3 MB 7.9 MB/s 
[K     |████████████████████████████████| 211 kB 25.3 MB/s 
[K     |████████████████████████████████| 115 kB 32.9 MB/s 
[K     |████████████████████████████████| 233 kB 48.2 MB/s 
[K     |████████████████████████████████| 408 kB 9.9 MB/s 
[K     |████████████████████████████████| 206 kB 13.5 MB/s 
[K     |████████████████████████████████| 47 kB 2.2 MB/s 
[K     |████████████████████████████████| 106 kB 40.7 MB/s 
[K     |████████████████████████████████| 1.0 MB 27.9 MB/s 
[K     |████████████████████████████████| 77 kB 1.4 MB/s 
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
tensorflow 2.9.2 requires protobuf<3.20,>=3.9.2, but you have protobuf 3.20.3 which is incompatible.
tensorboard 2.9.1 requires protobuf<3.20,>=3.9.2, but you have protobuf 3.20.3 which is incompatible.
pandas-gbq 0.

### Restart the kernel

After you install the additional packages, you need to restart the notebook kernel so it can find the packages.

In [2]:
# Automatically restart kernel after installs
import os

if not os.getenv("IS_TESTING"):
    # Automatically restart kernel after installs
    import IPython

    app = IPython.Application.instance()
    app.kernel.do_shutdown(True)

## Project Variables 

In [18]:
# Project variables 
#
# These are the project variable used in this ML Model: 
#

PROJECT_ID = "" # @param {type:"string"}
bqml_type = "log_reg" # @param {type:"string"}
BQML_MODEL_NAME = "bqml_log_reg_census_income_predict_model"
job_display_name = BQML_MODEL_NAME + "_job"
ENDPOINT_NAME = BQML_MODEL_NAME + "_endpoint"

# datasets
BQ_DATASET_NAME = BQML_MODEL_NAME + "_dataset"
sql_create_dataset = f"""CREATE SCHEMA IF NOT EXISTS {BQ_DATASET_NAME}"""
BQ_PUBLIC_DATASET = "bigquery-public-data.ml_datasets.census_adult_income"

# bucket details
BUCKET_NAME = "bqml_tutorials"
BUCKET_URI = f"gs://{BUCKET_NAME}/{bqml_type}/"
OUTPUTBUCKET = f"gs://bqml_datasets_predictions/{bqml_type}/"

# Region 
REGION = "us-central1" # @param {type: "string"} 

Otherwise, set your project ID here.

#### Region

You can also change the `REGION` variable, which is used for operations
throughout the rest of this notebook.  Below are regions supported for Vertex AI. We recommend that you choose the region closest to you.

- Americas: `us-central1`
- Europe: `europe-west4`
- Asia Pacific: `asia-east1`

You might not be able to use a multi-regional bucket for training with Vertex AI. Not all regions provide support for all Vertex AI services.

Learn more about <a href="https://cloud.google.com/vertex-ai/docs/general/locations" target="_blank">Vertex AI regions</a>.

In [2]:
# If you are running this notebook in Colab, run this cell and follow the
# instructions to authenticate your GCP account. This provides access to your
# Cloud Storage bucket and lets you submit training jobs and prediction
# requests.

import os
import sys

# If on Vertex AI Workbench, then don't execute this code
IS_COLAB = "google.colab" in sys.modules
if not os.path.exists("/opt/deeplearning/metadata/env_version") and not os.getenv(
    "DL_ANACONDA_HOME"
):
    if "google.colab" in sys.modules:
        from google.colab import auth as google_auth

        google_auth.authenticate_user()

    # If you are running this notebook locally, replace the string below with the
    # path to your service account key and run this cell to authenticate your GCP
    # account.
    elif not os.getenv("IS_TESTING"):
        %env GOOGLE_APPLICATION_CREDENTIALS ''

In [4]:
SERVICE_ACCOUNT = ""  # @param {type:"string"}
print(SERVICE_ACCOUNT)




In [None]:
if (
    SERVICE_ACCOUNT == ""
    or SERVICE_ACCOUNT is None
    or SERVICE_ACCOUNT == "[your-service-account]"
):
    # Get your service account from gcloud
    if not IS_COLAB:
        shell_output = !gcloud auth list 2>/dev/null
        SERVICE_ACCOUNT = shell_output[2].replace("*", "").strip()

    else:  # IS_COLAB:
        shell_output = ! gcloud projects describe  $PROJECT_ID
        project_number = shell_output[-1].split(":")[1].strip().replace("'", "")
        SERVICE_ACCOUNT = f"{project_number}-compute@developer.gserviceaccount.com"

    print("Service Account:", SERVICE_ACCOUNT)

### Import libraries

In [6]:
from typing import Union

import google.cloud.aiplatform as vertex_ai
import pandas as pd
from google.cloud import bigquery

### Initialize Vertex AI and BigQuery SDKs for Python

Initialize the Vertex AI SDK for Python for your project and corresponding bucket.

In [7]:
vertex_ai.init(project=PROJECT_ID, location=REGION)

Create the BigQuery client.

In [8]:
bq_client = bigquery.Client(project=PROJECT_ID)

Use a helper function for sending queries to BigQuery.

In [9]:
# Wrapper to use BigQuery client to run query/job, return job ID or result as DF
def run_bq_query(sql: str) -> Union[str, pd.DataFrame]:
    """
    Input: SQL query, as a string, to execute in BigQuery
    Returns the query results as a pandas DataFrame, or error, if any
    """

    # Try dry run before executing query to catch any errors
    job_config = bigquery.QueryJobConfig(dry_run=True, use_query_cache=False)
    bq_client.query(sql, job_config=job_config)

    # If dry run succeeds without errors, proceed to run query
    job_config = bigquery.QueryJobConfig()
    client_result = bq_client.query(sql, job_config=job_config)

    job_id = client_result.job_id

    # Wait for query/job to finish running. then get & return data frame
    df = client_result.result().to_arrow().to_pandas()
    print(f"Finished job_id: {job_id}")
    return df

## BigQuery ML Model Training & Validation

BigQuery ML (BQML) provides the capability to train ML tabular models, such as classification, regression, forecasting, and matrix factorization, in BigQuery using SQL syntax directly. BigQuery ML uses the scalable infrastructure of BigQuery ML so you don't need to set up additional infrastructure for training or batch serving.

In [10]:
sql_create_dataset = f"""CREATE SCHEMA IF NOT EXISTS {BQ_DATASET_NAME}"""

print(sql_create_dataset)

run_bq_query(sql_create_dataset)

CREATE SCHEMA IF NOT EXISTS bqml_log_reg_census_income_predict_model_dataset
Finished job_id: c50a7b25-251c-41ae-95e8-11ab5cfd59e1


### Using logistic regression model for income bucket classification on census data 

Inpect data that has been pre-processed from  [`census_adult_income`](https://cloud.google.com/bigquery?p=bigquery-public-data&d=census_bureau_usa&page=dataset)  so that it can be used for classification.

The data view results show that the **``income_bracket``** column in the census_adult_income table has only one of two values: <=50K or >50K.


In [11]:
sql_inspect = """
SELECT
 age,
 workclass,
 marital_status,
 education_num,
 occupation,
 hours_per_week,
 income_bracket
FROM
 {BQ_PUBLIC_DATASET}
LIMIT
 100;
"""
run_bq_query(sql_inspect)



Finished job_id: b69ba246-a320-40da-be26-05a42a1cb877


Unnamed: 0,age,workclass,marital_status,education_num,occupation,hours_per_week,income_bracket
0,34,?,Married-civ-spouse,7,?,8,<=50K
1,21,?,Married-civ-spouse,7,?,56,<=50K
2,28,?,Married-civ-spouse,9,?,17,<=50K
3,47,?,Married-civ-spouse,9,?,8,>50K
4,22,?,Married-civ-spouse,9,?,22,<=50K
...,...,...,...,...,...,...,...
95,63,?,Married-civ-spouse,13,?,54,>50K
96,47,?,Married-civ-spouse,13,?,18,<=50K
97,73,?,Married-civ-spouse,13,?,5,<=50K
98,66,?,Married-civ-spouse,13,?,6,<=50K


### Use the CREATE VIEW statement to compile the training data

The next step was to create a view that compiles the training data. This was done by selecting the data used to train your logistic regression model.The census respondent income prediction is done based on the following attributes:

In [13]:

sql_create_view_bqml = f"""
CREATE OR REPLACE VIEW {BQ_DATASET_NAME}.input_view AS
SELECT
 age,
 workclass,
 marital_status,
 education_num,
 occupation,
 hours_per_week,
 income_bracket,
 CASE
   WHEN MOD(functional_weight, 10) < 8 THEN 'training'
   WHEN MOD(functional_weight, 10) = 8 THEN 'evaluation'
   WHEN MOD(functional_weight, 10) = 9 THEN 'prediction'
 END AS dataframe
FROM
{BQ_PUBLIC_DATASET}
"""

print(sql_create_view_bqml)

run_bq_query(sql_create_view_bqml)


CREATE OR REPLACE VIEW bqml_log_reg_census_income_predict_model_dataset.input_view AS
SELECT
 age,
 workclass,
 marital_status,
 education_num,
 occupation,
 hours_per_week,
 income_bracket,
 CASE
   WHEN MOD(functional_weight, 10) < 8 THEN 'training'
   WHEN MOD(functional_weight, 10) = 8 THEN 'evaluation'
   WHEN MOD(functional_weight, 10) = 9 THEN 'prediction'
 END AS dataframe
FROM
bigquery-public-data.ml_datasets.census_adult_income

Finished job_id: 16598f86-d037-4978-a440-914e7b1b632e


### Use the CREATE MODEL statement to create your logistic regression model.

The query below trains a logistic regression model using BigQuery ML. BigQuery resources are used to train the model.

In the `OPTIONS` parameter:
* with `model_registry="vertex_ai"`, the BigQuery ML model will automatically be <a href="https://cloud.google.com/vertex-ai/docs/model-registry/model-registry-bqml" target="_blank">registered to Vertex AI Model Registry</a>, which enables you to view all of your registered models and its versions on Google Cloud in one place.

* `vertex_ai_model_version_aliases allows you to set aliases to help you keep track of your model version (<a href="https://cloud.google.com/vertex-ai/docs/model-registry/model-alias" target="_blank">documentation</a>).

In [16]:
# this cell may take ~1 min to run

sql_train_model_bqml = f"""
CREATE OR REPLACE MODEL {BQ_DATASET_NAME}.{BQML_MODEL_NAME} OPTIONS (
  model_type = 'LOGISTIC_REG',
  auto_class_weights = TRUE, -- Balances the class labels in the training data
  input_label_cols = ['income_bracket'], -- prediction column
  model_registry = "vertex_ai",
  vertex_ai_model_version_aliases = ['logistic_reg', 'experimental']
) AS
SELECT --The SELECT statement queries the view from Step 3.
  *
EXCEPT
(dataframe)
FROM
  {BQ_DATASET_NAME}.input_view
WHERE
  dataframe = 'training'
"""

print(sql_train_model_bqml)

run_bq_query(sql_train_model_bqml)


CREATE OR REPLACE MODEL bqml_log_reg_census_income_predict_model_dataset.bqml_log_reg_census_income_predict_model OPTIONS (
  model_type = 'LOGISTIC_REG',
  auto_class_weights = TRUE, -- Balances the class labels in the training data
  input_label_cols = ['income_bracket'], -- prediction column
  model_registry = "vertex_ai",
  vertex_ai_model_version_aliases = ['logistic_reg', 'experimental']
) AS
SELECT --The SELECT statement queries the view from Step 3.
  *
EXCEPT
(dataframe)
FROM
  bqml_log_reg_census_income_predict_model_dataset.input_view
WHERE
  dataframe = 'training'

Finished job_id: 2c888196-948c-40cd-a2fb-e63a7e699dc3


### Use the ML.EVALUATE function to evaluate the model data

With the model created, you can now evaluate the logistic regression model. Behind the scenes, BigQuery ML automatically <a href="https://cloud.google.com/bigquery-ml/docs/reference/standard-sql/bigqueryml-syntax-create#data_split_method" target="_blank">split the data</a>, which makes it easier to quickly train and evaluate models.

In [17]:
sql_evaluate_model = f"""
SELECT
  *
FROM
  ML.EVALUATE(MODEL {BQ_DATASET_NAME}.{BQML_MODEL_NAME})
"""

print(sql_evaluate_model)

run_bq_query(sql_evaluate_model)


SELECT
  *
FROM
  ML.EVALUATE(MODEL bqml_log_reg_census_income_predict_model_dataset.bqml_log_reg_census_income_predict_model)





Finished job_id: 1e7a424c-86d2-479a-b7f3-47cd7c1f2ac2


Unnamed: 0,precision,recall,accuracy,f1_score,log_loss,roc_auc
0,0.581818,0.750586,0.808175,0.655514,0.402187,0.881006


These metrics help you understand the performance of the model. 

There are various metrics for logistic regression and other model types (full list of metrics can be found in the <a href="https://cloud.google.com/bigquery-ml/docs/reference/standard-sql/bigqueryml-syntax-evaluate#mlevaluate_output" target="_blank">documentation</a>).

### Use the ML.PREDICT function to predict the income bracket for a given set of census participants.

Finally, you use the ML.PREDICT function to predict the income bracket for a given set of census participants.


In [19]:
sql_ml_predict = f"""

SELECT
 *
FROM
 ML.PREDICT (MODEL {BQ_DATASET_NAME}.{BQML_MODEL_NAME},
   (
   SELECT
     *
   FROM
     {BQ_DATASET_NAME}.input_view
   WHERE
     dataframe = 'prediction'
    )
 )
"""

print(sql_ml_predict)

run_bq_query(sql_ml_predict)



SELECT
 *
FROM
 ML.PREDICT (MODEL bqml_log_reg_census_income_predict_model_dataset.bqml_log_reg_census_income_predict_model,
   (
   SELECT
     *
   FROM
     bqml_log_reg_census_income_predict_model_dataset.input_view
   WHERE
     dataframe = 'prediction'
    )
 )





Finished job_id: 44996c10-a439-4d35-9d1f-e39d463c3ef9


Unnamed: 0,predicted_income_bracket,predicted_income_bracket_probs,age,workclass,marital_status,education_num,occupation,hours_per_week,income_bracket,dataframe
0,<=50K,"[{'label': ' >50K', 'prob': 0.0546065818652878...",34,?,Married-civ-spouse,7,?,8,<=50K,prediction
1,<=50K,"[{'label': ' >50K', 'prob': 0.0630589207946214...",25,?,Married-civ-spouse,9,?,4,<=50K,prediction
2,<=50K,"[{'label': ' >50K', 'prob': 0.1069501440592713...",75,?,Married-civ-spouse,5,?,5,<=50K,prediction
3,<=50K,"[{'label': ' >50K', 'prob': 0.0994746477046556...",67,?,Married-civ-spouse,6,?,2,<=50K,prediction
4,<=50K,"[{'label': ' >50K', 'prob': 0.4422743244662300...",59,?,Married-civ-spouse,9,?,41,>50K,prediction
...,...,...,...,...,...,...,...,...,...,...
3137,>50K,"[{'label': ' >50K', 'prob': 0.8335553388676905...",51,Local-gov,Married-civ-spouse,9,Protective-serv,70,<=50K,prediction
3138,>50K,"[{'label': ' >50K', 'prob': 0.8505311791530946...",46,Local-gov,Married-civ-spouse,10,Protective-serv,70,>50K,prediction
3139,>50K,"[{'label': ' >50K', 'prob': 0.6321767600301973...",36,Private,Married-civ-spouse,9,Transport-moving,70,>50K,prediction
3140,>50K,"[{'label': ' >50K', 'prob': 0.5648379516096959...",28,Private,Married-civ-spouse,9,Transport-moving,70,<=50K,prediction


### Use the ML.EXPLAIN_PREDICT function to know which features are the most important to determine the weight.


To understand why the model is generating these prediction results, you can use the ML.EXPLAIN_PREDICT function.


<a href="https://cloud.google.com/bigquery-ml/docs/reference/standard-sql/bigqueryml-syntax-explain-predict" target="_blank">ML.EXPLAIN_PREDICT</a> has built-in <a href="https://cloud.google.com/bigquery-ml/docs/reference/standard-sql/bigqueryml-syntax-xai-overview" target="_blank">Explainable AI</a>. This allows you to see the top contributing features to each prediction and interpret how it was computed.

In [23]:
sql_explain_predict = f"""

SELECT * FROM
ML.EXPLAIN_PREDICT(MODEL {BQ_DATASET_NAME}.{BQML_MODEL_NAME},
 (
  SELECT * FROM {BQ_DATASET_NAME}.input_view
  WHERE dataframe = 'prediction'),
  STRUCT(3 as top_k_features)  
 )
"""

print(sql_explain_predict)

run_bq_query(sql_explain_predict)



SELECT * FROM
ML.EXPLAIN_PREDICT(MODEL bqml_log_reg_census_income_predict_model_dataset.bqml_log_reg_census_income_predict_model,
 (
  SELECT * FROM bqml_log_reg_census_income_predict_model_dataset.input_view
  WHERE dataframe = 'prediction'),
  STRUCT(3 as top_k_features)  
 )





Finished job_id: 704ddb1f-6574-42e9-ad2d-28f655d3e8a6


Unnamed: 0,predicted_income_bracket,probability,top_feature_attributions,baseline_prediction_value,prediction_value,approximation_error,age,workclass,marital_status,education_num,occupation,hours_per_week,income_bracket,dataframe
0,<=50K,0.945393,"[{'feature': 'hours_per_week', 'attribution': ...",-0.298787,-2.851447,0.0,34,?,Married-civ-spouse,7,?,8,<=50K,prediction
1,<=50K,0.936941,"[{'feature': 'hours_per_week', 'attribution': ...",-0.298787,-2.698551,0.0,25,?,Married-civ-spouse,9,?,4,<=50K,prediction
2,<=50K,0.893050,"[{'feature': 'education_num', 'attribution': -...",-0.298787,-2.122280,0.0,75,?,Married-civ-spouse,5,?,5,<=50K,prediction
3,<=50K,0.900525,"[{'feature': 'hours_per_week', 'attribution': ...",-0.298787,-2.203076,0.0,67,?,Married-civ-spouse,6,?,2,<=50K,prediction
4,<=50K,0.557726,"[{'feature': 'marital_status', 'attribution': ...",-0.298787,-0.231937,0.0,59,?,Married-civ-spouse,9,?,41,>50K,prediction
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3137,>50K,0.833555,"[{'feature': 'hours_per_week', 'attribution': ...",-0.298787,1.611037,0.0,51,Local-gov,Married-civ-spouse,9,Protective-serv,70,<=50K,prediction
3138,>50K,0.850531,"[{'feature': 'hours_per_week', 'attribution': ...",-0.298787,1.738773,0.0,46,Local-gov,Married-civ-spouse,10,Protective-serv,70,>50K,prediction
3139,>50K,0.632177,"[{'feature': 'hours_per_week', 'attribution': ...",-0.298787,0.541567,0.0,36,Private,Married-civ-spouse,9,Transport-moving,70,>50K,prediction
3140,>50K,0.564838,"[{'feature': 'hours_per_week', 'attribution': ...",-0.298787,0.260820,0.0,28,Private,Married-civ-spouse,9,Transport-moving,70,<=50K,prediction


### Inspect the model on Vertex AI Model Registry

When the model was trained in BigQuery ML, the line `model_registry="vertex_ai"` registered the model to Vertex AI Model Registry automatically upon completion.

You can view the model on the <a href="https://console.cloud.google.com/vertex-ai/models" target="_blank">Vertex AI Model Registry page</a>, or use the code below to check that it was successfully registered:

In [30]:
model = vertex_ai.Model(model_name=BQML_MODEL_NAME)

print(model.gca_resource)

name: "projects/993987777814/locations/us-central1/models/bqml_log_reg_census_income_predict_model"
display_name: "bqml_log_reg_census_income_predict_model"
supported_deployment_resources_types: DEDICATED_RESOURCES
supported_input_storage_formats: "jsonl"
supported_input_storage_formats: "bigquery"
supported_input_storage_formats: "csv"
supported_input_storage_formats: "tf-record"
supported_input_storage_formats: "tf-record-gzip"
supported_input_storage_formats: "file-list"
supported_output_storage_formats: "jsonl"
supported_output_storage_formats: "bigquery"
create_time {
  seconds: 1667581691
  nanos: 345408000
}
update_time {
  seconds: 1667581732
  nanos: 587274000
}
etag: "AMEw9yPdtq_-FN0lSvQuTl-899Cy8rMjvdHpxlKZTUgRF-9495hQftyvPmjSLQKtvvBD"
version_id: "1"
version_aliases: "logistic_reg"
version_aliases: "experimental"
version_aliases: "default"
version_create_time {
  seconds: 1667581691
  nanos: 345408000
}
version_update_time {
  seconds: 1667581732
  nanos: 465828000
}
model_

### Deploy the model to an endpoint

While BigQuery ML supports batch prediction with <a href="https://cloud.google.com/bigquery-ml/docs/reference/standard-sql/bigqueryml-syntax-predict" target="_blank">ML.PREDICT</a> and <a href="https://cloud.google.com/bigquery-ml/docs/reference/standard-sql/bigqueryml-syntax-explain-predict" target="_blank">ML.EXPLAIN_PREDICT</a>, BigQuery ML is not suitable for real-time predictions where you need low latency predictions with potentially high frequency of requests.

In other words, deploying the BigQuery ML model to an endpoint enables you to do online predictions.

#### Create a Vertex AI endpoint

To deploy your model to an endpoint, you will first need to create an endpoint before you deploy the model to it.

In [31]:

endpoint = vertex_ai.Endpoint.create(
    display_name=ENDPOINT_NAME,
    project=PROJECT_ID,
    location=REGION,
)

print(endpoint.display_name)
print(endpoint.resource_name)

Creating Endpoint


INFO:google.cloud.aiplatform.models:Creating Endpoint


Create Endpoint backing LRO: projects/993987777814/locations/us-central1/endpoints/1139626210003779584/operations/3098416414088757248


INFO:google.cloud.aiplatform.models:Create Endpoint backing LRO: projects/993987777814/locations/us-central1/endpoints/1139626210003779584/operations/3098416414088757248


Endpoint created. Resource name: projects/993987777814/locations/us-central1/endpoints/1139626210003779584


INFO:google.cloud.aiplatform.models:Endpoint created. Resource name: projects/993987777814/locations/us-central1/endpoints/1139626210003779584


To use this Endpoint in another session:


INFO:google.cloud.aiplatform.models:To use this Endpoint in another session:


endpoint = aiplatform.Endpoint('projects/993987777814/locations/us-central1/endpoints/1139626210003779584')


INFO:google.cloud.aiplatform.models:endpoint = aiplatform.Endpoint('projects/993987777814/locations/us-central1/endpoints/1139626210003779584')


bqml_log_reg_census_income_predict_model_endpoint
projects/993987777814/locations/us-central1/endpoints/1139626210003779584


#### List endpoints

List the endpoints to make sure it has successfully been created. (You can also view your endpoints on the <a href="https://console.cloud.google.com/vertex-ai/endpoints" target="_blank">Vertex AI Endpoints page</a>).

In [32]:
endpoint.list()

[<google.cloud.aiplatform.models.Endpoint object at 0x7f564a7e1850> 
 resource name: projects/993987777814/locations/us-central1/endpoints/1139626210003779584,
 <google.cloud.aiplatform.models.Endpoint object at 0x7f564a7da850> 
 resource name: projects/993987777814/locations/us-central1/endpoints/4502689231742697472,
 <google.cloud.aiplatform.models.Endpoint object at 0x7f564a7e6d90> 
 resource name: projects/993987777814/locations/us-central1/endpoints/369919692048957440,
 <google.cloud.aiplatform.models.Endpoint object at 0x7f564a7f39d0> 
 resource name: projects/993987777814/locations/us-central1/endpoints/1236862620317777920,
 <google.cloud.aiplatform.models.Endpoint object at 0x7f564a782550> 
 resource name: projects/993987777814/locations/us-central1/endpoints/5420706674144968704,
 <google.cloud.aiplatform.models.Endpoint object at 0x7f564a7efc50> 
 resource name: projects/993987777814/locations/us-central1/endpoints/7994667792815095808]

#### Deploy model to Vertex endpoint

With the new endpoint, you can now deploy your model.

In [33]:
# deploying the model to the endpoint may take 10-15 minutes
model.deploy(endpoint=endpoint)

Deploying model to Endpoint : projects/993987777814/locations/us-central1/endpoints/1139626210003779584


INFO:google.cloud.aiplatform.models:Deploying model to Endpoint : projects/993987777814/locations/us-central1/endpoints/1139626210003779584


Using default machine_type: n1-standard-2


INFO:google.cloud.aiplatform.models:Using default machine_type: n1-standard-2


Deploy Endpoint model backing LRO: projects/993987777814/locations/us-central1/endpoints/1139626210003779584/operations/1002096340664057856


INFO:google.cloud.aiplatform.models:Deploy Endpoint model backing LRO: projects/993987777814/locations/us-central1/endpoints/1139626210003779584/operations/1002096340664057856


Endpoint model deployed. Resource name: projects/993987777814/locations/us-central1/endpoints/1139626210003779584


INFO:google.cloud.aiplatform.models:Endpoint model deployed. Resource name: projects/993987777814/locations/us-central1/endpoints/1139626210003779584


<google.cloud.aiplatform.models.Endpoint object at 0x7f565837e190> 
resource name: projects/993987777814/locations/us-central1/endpoints/1139626210003779584

You can also check on the status of your model by visiting the <a href="https://console.cloud.google.com/vertex-ai/endpoints" target="_blank">Vertex AI Endpoints page</a>.

### Make online predictions to the endpoint

Using a sample of the training data, you can test the endpoint to make online predictions.

In [40]:
df_sample_requests_list = [
    {
        "age": 45,
        "workclass": "Private",
        "marital_status": "Single",
        "education_num": 6,
        "occupation": "Exec-managerial",
        "hours_per_week": 40,
    },
    {
        "age": 30,
        "workclass": "Private",
        "marital_status": "Married",
        "education_num": 2,
        "occupation": "Machine-op-inspct",
        "hours_per_week": 50,
    }
]

In [41]:
prediction = endpoint.predict(df_sample_requests_list)
print(prediction)

Prediction(predictions=[{'income_bracket_probs': [0.2098744173305852, 0.7901255826694148], 'predicted_income_bracket': [' <=50K'], 'income_bracket_values': [' >50K', ' <=50K']}, {'income_bracket_values': [' >50K', ' <=50K'], 'predicted_income_bracket': [' <=50K'], 'income_bracket_probs': [0.06176187987532602, 0.938238120124674]}], deployed_model_id='1547676965304008704', model_version_id='1', model_resource_name='projects/993987777814/locations/us-central1/models/bqml_log_reg_census_income_predict_model', explanations=None)


You can then extract the predictions from the prediction response

In [42]:
prediction.predictions

[{'income_bracket_probs': [0.2098744173305852, 0.7901255826694148],
  'predicted_income_bracket': [' <=50K'],
  'income_bracket_values': [' >50K', ' <=50K']},
 {'income_bracket_values': [' >50K', ' <=50K'],
  'predicted_income_bracket': [' <=50K'],
  'income_bracket_probs': [0.06176187987532602, 0.938238120124674]}]

## Cleaning up

To clean up all Google Cloud resources used in this project, you can <a href="https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects" target="_blank">delete the Google Cloud
project</a> you used for the tutorial.

Otherwise, you can delete the individual resources you created in this tutorial:

In [None]:
# Undeploy model from endpoint and delete endpoint
endpoint.undeploy_all()
endpoint.delete()

# Delete BigQuery dataset, including the BigQuery ML model
! bq rm -r -f $PROJECT_ID:$BQ_DATASET_NAME