## Set-up

In [1]:
# Upgrade pip
!pip install --upgrade pip



In [30]:
!pip install --user google-cloud-bigquery==3.4.1
!pip install --user pandas
!pip install google-cloud-bigquery
!pip install google-cloud-bigquery-storage
!pip install google-cloud-storage
!pip install pyarrow
!pip install db-dtypes
!pip install tqdm
!pip install matplotlib
!pip install ipywidgets
!pip install ipywidgets
!pip install google-cloud-aiplatform
!pip install numpy
!jupyter nbextension enable --py widgetsnbextension

Collecting xgboost==1.3.3
  Using cached xgboost-1.3.3-py3-none-manylinux2010_x86_64.whl (157.5 MB)
Installing collected packages: xgboost
Successfully installed xgboost-1.3.3
usage: jupyter [-h] [--version] [--config-dir] [--data-dir] [--runtime-dir]
               [--paths] [--json] [--debug]
               [subcommand]

Jupyter: Interactive Computing

positional arguments:
  subcommand     the subcommand to launch

options:
  -h, --help     show this help message and exit
  --version      show the versions of core jupyter packages and exit
  --config-dir   show Jupyter config dir
  --data-dir     show Jupyter data dir
  --runtime-dir  show Jupyter runtime dir
  --paths        show all Jupyter paths. Add --json for machine-readable
                 format.
  --json         output paths as machine-readable json
  --debug        output debug information about paths

Available subcommands: dejavu events execute kernel kernelspec lab
labextension labhub migrate nbconvert run server troub

In [44]:
from google.cloud import bigquery
from google.cloud import storage
from tqdm import tqdm
import matplotlib.pyplot as plt
import ipywidgets
import numpy as np
from google.cloud import aiplatform

import xgboost as xgb
import pandas as pd
from pandas import MultiIndex, Int16Dtype
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder


In [12]:
!PROJECT_ID=$(gcloud config get-value project)
PROJECT_ID = "bqml-sandbox-396011"
VERTEX_AI_LOCATION = 'europe-west4'

In [12]:
%load_ext google.cloud.bigquery

The google.cloud.bigquery extension is already loaded. To reload it, use:
  %reload_ext google.cloud.bigquery


In [13]:
aiplatform.init(project=PROJECT_ID, location=VERTEX_AI_LOCATION)

## Owerview

There are four ways to export BigQueryMl models:
1. by using the Google Cloud Console,
2. by using `EXPORT MODEL` statement,
3. by using `bq extract` command,
4. Using API or Client Library.

Most of the time the model is saved by default as `TensorfFlow SavedModel`

In [5]:
# list all models
!bq ls -m --format=pretty $PROJECT_ID:BQ_ML_ID


Welcome to BigQuery! This script will walk you through the 
process of initializing your .bigqueryrc configuration file.

First, we need to set up your credentials if they do not 
already exist.

Setting project_id bqml-sandbox-396011 as the default.

BigQuery configuration complete! Type "bq" to get started.

+--------------------------+--------------------------------+--------+-----------------+
|            Id            |           Model Type           | Labels |  Creation Time  |
+--------------------------+--------------------------------+--------+-----------------+
| BASE_LOGISTIC_REGRESSION | LOGISTIC_REGRESSION            |        | 10 Sep 08:38:52 |
| DNN                      | DNN_LINEAR_COMBINED_CLASSIFIER |        | 10 Sep 11:43:48 |
+--------------------------+--------------------------------+--------+-----------------+


In [18]:
# Create gcs bucket to store models

project_id = "bqml-sandbox-396011"
bucket_name = "bq-ml-store"
default_storage_class = "STANDARD" 

# Initialize the client
client = storage.Client(project=project_id)

# Create the bucket with the specified default storage class
bucket = client.bucket(bucket_name)
bucket.location = "EU"
bucket.storage_class = default_storage_class
    # Try to create the bucket (it will raise an error if it already exists)
try:
    bucket.create()
    print(f"Bucket '{bucket_name}' created with default storage class '{default_storage_class}'.")
except Exception as e:
    print(f"Error creating bucket: {e}")


  bucket.location = "EU"


Bucket 'bq-ml-store' created with default storage class 'STANDARD'.


In [19]:
!bq extract --model 'BQ_ML_ID.BASE_LOGISTIC_REGRESSION' gs://bq-ml-store/base-logistic-regression

Waiting on bqjob_r53868589930e1046_0000018a7f0057a2_1 ... (33s) Current status: DONE   


In [21]:
%%bigquery

 EXPORT MODEL `BQ_ML_ID.DNN`
 OPTIONS(URI = 'gs://bq-ml-store/dnn')
 

Query is running:   0%|          |

## Register the model in Vertex AI

In [25]:
%%bigquery
ALTER MODEL BQ_ML_ID.BASE_LOGISTIC_REGRESSION SET OPTIONS (vertex_ai_model_id="base_logistic_regression");

Query is running:   0%|          |

## Deploying Model in Vertex AI

In [15]:
# create an endpoint

endpoint = aiplatform.Endpoint.create(
        display_name= "base_logistic_regression",
        project= PROJECT_ID,
        location= VERTEX_AI_LOCATION,
    )


Creating Endpoint
Create Endpoint backing LRO: projects/115333740492/locations/europe-west4/endpoints/8224984692508590080/operations/4462031765648703488
Endpoint created. Resource name: projects/115333740492/locations/europe-west4/endpoints/8224984692508590080
To use this Endpoint in another session:
endpoint = aiplatform.Endpoint('projects/115333740492/locations/europe-west4/endpoints/8224984692508590080')


In [18]:
# deploy a model
model = aiplatform.Model(model_name = "base_logistic_regression")
model.deploy(
    endpoint = endpoint,
    deployed_model_display_name = "base_logistic_regression",
    traffic_percentage = 100, # only one model in the endpoint so it must be 100%
    machine_type = "n1-standard-2",
    min_replica_count = 1,
    max_replica_count = 4,
    accelerator_type = None ,
    accelerator_count = None ,
    sync=True,
    )

model.wait()

Deploying model to Endpoint : projects/115333740492/locations/europe-west4/endpoints/8224984692508590080
Deploy Endpoint model backing LRO: projects/115333740492/locations/europe-west4/endpoints/8224984692508590080/operations/3963821056870842368


Endpoint model deployed. Resource name: projects/115333740492/locations/europe-west4/endpoints/8224984692508590080


<google.cloud.aiplatform.models.Endpoint object at 0x7f66b34ce710> 
resource name: projects/115333740492/locations/europe-west4/endpoints/8224984692508590080

In [None]:
## Sample prediction


## Importing a Model to BigQuery ML

The models defined and trained outside of the BigQuery ML can be also imported into the service
The possible extensions are:
1. XGBoost,
2. Tensorflow,
3. Tensorflow light,
4. Open Neural Network Exchange (ONNX)

I tried importing the XGBoost models but the were many errors especially because at the moment the BigQuery ML does not support the current version of XGBoost, but only below 1.5.1.