# AutoML Hello tabular - bank Marketing


## Summary

This notebook walks through an example of creating a model with Vertex AI AutoML Tabular with low code. 

We will use the publicly available [Bank marketing](https://datahub.io/machine-learning/bank-marketing) open-source dataset, which is available through a Creative Commons CCO: Public Domain license. The column names have been updated for clarity.

Source: [Hello tabular data  |  Vertex AI  |  Google Cloud](https://cloud.google.com/vertex-ai/docs/tutorials/tabular-automl/overview)

Other helpful links:

* [Introduction to tabular data  |  Vertex AI  |  Google Cloud](https://cloud.google.com/vertex-ai/docs/tabular-data/tabular101)

## Summary 

### Objective 

Predict whether a customer will buy a term deposit (investment) using features like age, income profession. 

### Challenge(s)

* data: too much data for a rules-based system (e.g. - users who sign up have X attributes and then do Y with us)
* time & resource constraints: limited programming and SQL knowledge, limited time to spend building ML models 

### Outcome(s)

* A list of customers with probability of buying a term deposit for outreach via phone call, email or website personalization  
* Generate additional revenue from customers as result of outreach 

### Solution 

Train an ML model using Vertex AI AutoML (Tabular)s


## Steps

We will perform the following steps: 

1. Create a dataset (tabular) 
2. Analyze the dataset 
3. Train an AutoML classification model
5. Deploy the model and request online prediction (UI and SDK)
6. Batch prediction job 


### 0. Copy raw data to bucket 

Copy publicly available dataset to a bucket you own.

First, we create a bucket:

In [None]:
! gsutil mb -l us-central1 gs://demos-vertex-ai-hello-tabular-bank-marketing

Then we copy the data file:

In [None]:
! gsutil cp gs://cloud-ml-tables-data/bank-marketing.csv \
    gs://demos-vertex-ai-hello-tabular-bank-marketing/bank-marketing.csv

### 1. Create dataset 

Open Google Cloud console and navigate to [datasets](https://console.cloud.google.com/vertex-ai/datasets) and click "Create" at the top.

Then add the following:

* dataset name: `hello_tabular_bank_marketing`
* data type and objective: `regression/classification`
* region: `us-central1`

Next, on the "source" screen that appears we'll add our dataset by doing the following:

* Select a data source: `Select CSV from Cloud Storage` 
* Cloud storage import file path: `demos-vertex-ai-hello-tabular-bank-marketing/bank-marketing.csv`

Then click continue 

### 2. Analyze dataset 

Click generate statistics to sanity check our data before training a model (e.g. - missing or NULL values)

Because our dataset is formatted correctly for this tutorial, you don't need to do anything on this page and can skip this section.

1.  **Optional**. Click **Generate statistics** to view the number of missing or NULL values in the dataset. This can take 10 minutes or longer.
    
2.  **Optional**. Click on one of the feature columns to learn more about the data values.

### 4. Train an AutoML classification model

* Click "train new model"
* Select "Other"

Then fill out the following:

* Objective: "Classification" 
* Model training method: AutoML

Next, fill out the Model details page:

* name: `hello_tabular_bank_marketing`
* Target column: `Deposit` - The target column is what we're training the model to predict. For the bank-marketing.csv dataset, the Deposit column indicates whether the client purchased a term deposit (2 = yes, 1 = no).

then click continue 

For Compute and pricing page set the following: 

* Budget `1` - The training budget determines actual training time, but the time to complete training includes other activities, so the entire process can take longer than one hour. When the model finishes training, it is displayed in the model tab as a live link, with a green checkmark status icon.
* enable early stopping - turned on 

Finally, click "Start training"  **NOTE: this will take an hour or more to complete**

### 5. Deploy model and request a prediction - UI

When model finished training, it can be viewed in the [models page](https://console.cloud.google.com/vertex-ai/models)

#### Evaluate model 

Click on the recently trained model and then the version number `1` to view the evaluation page 

* evaluation metrics
* confusion matrix 
* feature importance


#### Deploy model to an endpoint

To test a model or make online predictions, we first must deploy it to an endpoint:

First, click deploy to endpoint and then fill out the following:

* define your endpoint
    * Endpoint name: `hello_tabular_bank_marketing`
* model settings:
    * explainability: enable
    * leave rest as default
* model monitoring: disable for this tutorial

Then click "Deploy". This operation will take roughly 5 minutes to complete.

#### Request a prediction 

After model deployed, test the model within the UI. There are random  prefilled values in the "test your model section" below.

See below for how to use the Python SDK to perform the same workflow.

### 5.2 Deploy model and request prediction - SDK

The code below will perform the following steps:

1. initialize Python SDK
2. Create endpoint
3. List models and get model for deploying
4. deploy model to endpoint
5. submit sample online prediction
6. cleanup - undeploy model and delete endpoint resource

In [None]:
from google.cloud import aiplatform

aiplatform.init(project = "demos-vertex-ai", location = "us-central1")

In [None]:
# create endpoint 
endpoint = aiplatform.Endpoint.create(display_name='hello_tabular_bank_marketing')

In [None]:
# list models to obtain name for deploying
models = aiplatform.Model.list(filter = "display_name=hello_tabular_bank_marketing")
model = aiplatform.Model(model_name = models[0].resource_name)
model

In [None]:
# deploy model to endpoint 
endpoint.deploy(model,
                min_replica_count=1,
                max_replica_count=5,
                machine_type='n1-standard-4')

In [None]:
## To use this Endpoint in another session if already created/resuming work:
### dynamically set 
endpoint_display_name = f"hello_tabular_bank_marketing"
filter = f'display_name="{endpoint_display_name}"'

for endpoint_info in aiplatform.Endpoint.list(filter=filter):
    print(
        f"Endpoint display name = {endpoint_info.display_name} resource id = {endpoint_info.resource_name} "
    )
endpoint = aiplatform.Endpoint(endpoint_info.resource_name)

### manually set
# endpoint = aiplatform.Endpoint(
#     endpoint_name="projects/746038361521/locations/us-central1/endpoints/123456")
# endpoint

In [None]:
# create sample of for online prediction 
test_instance={
    "Age": "39.0", 
    "Job": "blue-collar", 
    "MaritalStatus": "married", 
    "Education": "secondary", 
    "Default": "no",
    "Balance": "450.0", 
    "Housing": "yes", 
    "Loan": "no", 
    "Contact": "cellular", 
    "Day": "16.0", 
    "Month": "may", 
    "Duration": "180.0", 
    "Campaign": "2.0", 
    "PDays": "-1.0", 
    "Previous": "0.0", 
    "POutcome": "unknown", 
    "Deposit": "1"
}

response = endpoint.predict([test_instance])

print('API response: ', response)

In [None]:
# cleanup
endpoint.undeploy_all()
endpoint.delete()

### 6. Batch predictions (optional)

You can also perform a batch prediction, follow the steps below to create a sample file, upload to GCS so that your meodel can perform a batch prediction job. 

#### Create a sample batch prediction file

In [None]:
# download locally for slicing by rows
! gsutil cp gs://cloud-ml-tables-data/bank-marketing.csv \
    ./bank-marketing.csv

In [None]:
# visually check for sanity 
! head ./bank-marketing.csv

In [None]:
# create sample batch file from csv
! head -n 25  ./bank-marketing.csv > ./bank-marketing-batch01.csv

In [None]:
# upload to GCS 
! gsutil cp ./bank-marketing-batch01.csv gs://demos-vertex-ai-hello-tabular-bank-marketing/bank-marketing-batch01.csv

#### 6.1 Submit batch prediction job

Goto model > batch prediction job and specify:

input: `gs://demos-vertex-ai-hello-tabular-bank-marketing/bank-marketing-batch01.csv`
output: `demos-vertex-ai.hello_tabular_bank_marketing`



#### 6.2 View batch prediction results  

Once complete, view results in BigQuery below

As a reminder, the target column is what we're training the model to predict. For the bank-marketing.csv dataset, the Deposit column indicates whether the client purchased a term deposit (2 = yes, 1 = no).

Query to show all prediction results (predicted class and their class probabilities): 

```sql
SELECT predicted_Deposit.classes AS classes,
predicted_Deposit.scores AS scores
FROM hello_tabular_bank_marketing.predictions_2023_03_03T13_10_29_819Z_030
```
https://cloud.google.com/vertex-ai/docs/tabular-data/classification-regression/get-batch-predictions#retrieve-batch-results


 Query to show customers and the likelihood to purchase a term deposit (the bigger the value the better)
 
```sql
SELECT
  predicted_Deposit.classes[OFFSET(1)] AS depositYN,
  predicted_Deposit.scores[OFFSET(1)] AS depositPropensity
FROM `demos-vertex-ai.hello_tabular_bank_marketing.predictions_2023_03_03T13_10_29_819Z_030`
ORDER BY
  CAST(predicted_Deposit.scores[OFFSET(1)] AS FLOAT64) DESC

```


## Appendix 

### Example set of online predictions 

Copy and paste the following into the UI/GCP console or use via REST (gcloud SDK + curl) or via Python

```json
{
  "instances": [{   
    "Age": "39.0", 
    "Job": "blue-collar", 
    "MaritalStatus": "married", 
    "Education": "secondary", 
    "Default": "no",
    "Balance": "450.0", 
    "Housing": "yes", 
    "Loan": "no", 
    "Contact": "cellular", 
    "Day": "16.0", 
    "Month": "may", 
    "Duration": "180.0", 
    "Campaign": "2.0", 
    "PDays": "-1.0", 
    "Previous": "0.0", 
    "POutcome": "unknown", 
    "Deposit": "1"
    }]
  }
```