# Overview
These notebooks will demonstrate the different ways to train a model in various services (AutoML, BQML, Databricks) and export that model to the model registry to deploy it on a Vertex AI endpoint for online prediction

* Setup notebook - required infrastructure and datasets that will be used in the subsequent notebooks
* AutoML notebook - train a Vertex AI AutoML model, and then get the parameters for that model from the logs after training
* BQML notebook - train model using BigQuery ML, check the artifacts and export to the Vertex AI model registry 
* Vertex Workbench managed notebook - how to train the model using Vertex AI's managed notebooks
* Deployment & cleanup notebook - create an endpoint for each type of model, deploy each model to the endpoint, test the prediction service. Finally, delete running resources to avoid incurring extra costs.  

As a next step, show how to use Vertex AI pipelines for end to end MLOps orchestration 

## Create the required datasets 
We'll be using a publicly available Google Analytics dataset for the exercises, you can read more about it here: 
* https://support.google.com/analytics/answer/7586738?hl=en&ref_topic=3416089#zippy=%2Cin-this-article
* Also used here: https://cloud.google.com/bigquery-ml/docs/create-machine-learning-model

### Before you begin
* Make sure that billing is enabled for your Cloud project. Learn how to check if billing is enabled on a project.

* BigQuery is automatically enabled in new projects. To activate BigQuery in a pre-existing project, go to
Enable the BigQuery API.

### Create a BQ Dataset
The first step is to create a BigQuery dataset to store your ML model. To create your dataset:

1. In the Google Cloud console, go to the BigQuery page.
2. In the navigation panel, in the Resources section, click your project name.
3. On the right side, in the details panel, click Create dataset.
4. On the Create dataset page:
* For Dataset ID, enter a unique name (this lab uses bq_databricks_vertex).
* For Data location, choose United States (US). Currently, the public datasets are stored in the US multi-region location. For simplicity, you should place your dataset in the same location.
![](./create_dataset.png)
5. Leave all of the other default settings in place and click Create dataset.

Make sure to keep a note of the name you choose for your dataset, as you'll be using it throughout the remaining exercises. 

Next, create the tables in your newly created dataset - please note, your PROJECT, DATASET may be different from the below - you will need to change before running in the console.

Navigate to the BigQuery console and open a new editor tab. There, use the following code snippets below to create the training and testing tables. Remember to change the project and dataset names to your own!

1. First, create the training dataset

```sql
CREATE OR REPLACE TABLE `leedeb-experimentation.bq_databricks_vertex.training_data` AS
SELECT
  IF(totals.transactions IS NULL, 0, 1) AS label,
  IFNULL(device.operatingSystem, "") AS os,
  device.isMobile AS is_mobile,
  IFNULL(geoNetwork.country, "") AS country,
  IFNULL(totals.pageviews, 0) AS pageviews
FROM
  `bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE
  _TABLE_SUFFIX BETWEEN '20160801' AND '20170630'
````

2. Next, create the testing dataset
```sql
CREATE OR REPLACE TABLE `leedeb-experimentation.bq_databricks_vertex.testing_data` AS
SELECT
  IF(totals.transactions IS NULL, 0, 1) AS label,
  IFNULL(device.operatingSystem, "") AS os,
  device.isMobile AS is_mobile,
  IFNULL(geoNetwork.country, "") AS country,
  IFNULL(totals.pageviews, 0) AS pageviews
FROM
  `bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE
  _TABLE_SUFFIX BETWEEN '20170701' AND '20170801'
```

You should now see the two tables created under your dataset. 

![](./tables_created.png) 

## Create Vertex AI Managed Notebook
Finally, let's create the Vertex AI Workbench Managed Notebook so it is ready to use for the exercises. 
1. Navigate to Vertex AI from the GCP console
2. Click on Workbench on the menu, and click on Managed Notebooks
3. Make sure you're in a us region - as our sample dataset is in the US regions - and click on New Notebook
4. Give your notebook a name and select Service Account under Permission, feel free to customize other features under Advanced if desired such as idle shut down time, instance size, etc. 
![](./managed_notebook.png) 


Now, you have your resources created - you can move onto the exercises! 

## Create a GCS Bucket
If you don't have an existing Google Cloud Storage bucket, please create one in the same region where you have created your vertex AI managed notebook and use the default storage settings. 
![](./bucket_models.png)
Once the bucket is created, create a models folder within it. 

## Service Account Permissions
You may need to add some extra permissions to the compute engine default service account, depending on how it was already set up. 
* Vertex AI Admin
* GCS Storage Admin
* BigQuery Admin
* Notebooks Admin