#  Predict whether a Bank client will purchase a Term deposit (1=no, 2=yes) using AutoML Binary Logistic Regression

###### Copyright 
**GUI Author:** [ Google Cloud](https://cloud.google.com/vertex-ai/docs/tutorials/tabular-automl/overview)<br>

**Notebook Author:** [ Kamau](https://bio.link/paulkamau)<br>

**Project Type:** AutoML Tabular<br>

**Date created:** 2022/04/16<br>

**Last modified:** 2022/04/16<br>

**Description:** The goal of the trained model is to predict whether a bank client will buy a term deposit (a type of investment) using features like age, income, and profession. This type of model can help banks determine who to focus its marketing resources on.

**Training Dataset** This project uses the [Bank marketing](https://datahub.io/machine-learning/bank-marketing) open-source dataset. 


### Objective

The steps performed include the following:

- Create a Vertex AI model training job.
- Train an AutoML Tabular model.
- Deploy the `Model` resource to a serving `Endpoint` resource.
- Make a prediction by sending data.
- Undeploy the `Model` resource.

### Dataset

This project uses the [Bank marketing](https://datahub.io/machine-learning/bank-marketing) open-source dataset. 

```https://storage.googleapis.com/cloud-ml-tables-data/bank-marketing.csv```

#### Execute notebook in Colab
<a href="#">
    <img src="https://cloud.google.com/ml-engine/images/colab-logo-32px.png" alt="Colab logo"> Run in Colab
</a>

# Project Variables

In [41]:
# Project variables 
#
# These are the project variable used in this ML Model: 
#

PROJECT_ID = "paulkamau" # @param {type:"string"}
REGION = "us-central1" 
automl_type = "tabular" #@param ["image", "text", "tabular", "video"]
model_display_name = "automl_tabular_bank_purchase_model"
job_display_name = model_display_name + "_job"
batch_display_name = model_display_name + "_batch_prediction"
endpoint_display_name = model_display_name + "_endpoint"

## datasets 
dataset_display_name = model_display_name + "_dataset"
dataset_source_uri = "gs://cloud-ml-tables-data/bank-marketing.csv"
dataset_local_source_uri = "gs://auto-ml-tutorials/tabular/bank-marketing.csv"
dataset_source_public = "https://storage.googleapis.com/cloud-ml-tables-data/bank-marketing.csv"

# bucket details
BUCKET_NAME = "auto-ml-tutorials" #auto-ml bucket 
BUCKET_URI = f"gs://{BUCKET_NAME}/{automl_type}/" # automl bucket uri
BUCKET_PREDICTION_OUTPUT = f"gs://auto_ml_datasets_predictions/{automl_type}/" 
BUCKET_INPUT_BATCHPREDICT = f"gs://{BUCKET_NAME}/{automl_type}/{model_display_name}" # contains the files to be used for batch prediction

print("All project variables set. Lets go")


All project variables set. Lets go


## Installation

In [42]:
import os

# The Google Cloud Notebook product has specific requirements
IS_GOOGLE_CLOUD_NOTEBOOK = os.path.exists("/opt/deeplearning/metadata/env_version")

# Google Cloud Notebook requires dependencies to be installed with '--user'
USER_FLAG = ""
if IS_GOOGLE_CLOUD_NOTEBOOK:
    USER_FLAG = "--user"

Install the latest version of the Vertex AI client library.

Run the following command in your virtual environment to install the Vertex SDK for Python:

In [None]:
! pip install {USER_FLAG} --upgrade google-cloud-aiplatform

Install the Cloud Storage library:

In [None]:
!pip install {USER_FLAG} --upgrade google-cloud-storage

### Restart the kernel

After you install the additional packages, you need to restart the notebook kernel so it can find the packages.

In [5]:
# Automatically restart kernel after installs
import os

if not os.getenv("IS_TESTING"):
    # Automatically restart kernel after installs
    import IPython

    app = IPython.Application.instance()
    app.kernel.do_shutdown(True)

### Set up your Google Cloud project

**The following steps are required, regardless of your notebook environment.**

1. [Select or create a Google Cloud project](https://console.cloud.google.com/cloud-resource-manager). When you first create an account, you get a $300 free credit towards your compute/storage costs.

1. [Make sure that billing is enabled for your project](https://cloud.google.com/billing/docs/how-to/modify-project).

1. [Enable the Vertex AI API and Compute Engine API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com,compute_component).

1. If you are running this notebook locally, you will need to install the [Cloud SDK](https://cloud.google.com/sdk).

1. Enter your project ID in the cell below. Then run the cell to make sure the
Cloud SDK uses the right project for all the commands in this notebook.

**Note**: Jupyter runs lines prefixed with `!` as shell commands, and it interpolates Python variables prefixed with `$` into these commands.

### Authenticate your Google Cloud account

**If you are using Notebooks**, your environment is already
authenticated. Skip this step.

**If you are using Colab**, run the cell below and follow the instructions
when prompted to authenticate your account via oAuth.

**Otherwise**, follow these steps:

1. In the Cloud Console, go to the [**Create service account key**
   page](https://console.cloud.google.com/apis/credentials/serviceaccountkey).

2. Click **Create service account**.

3. In the **Service account name** field, enter a name, and
   click **Create**.

4. In the **Grant this service account access to project** section, click the **Role** drop-down list. Type "Vertex AI"
into the filter box, and select
   **Vertex AI Administrator**. Type "Storage Object Admin" into the filter box, and select **Storage Object Admin**.

5. Click *Create*. A JSON file that contains your key downloads to your
local environment.

6. Enter the path to your service account key as the
`GOOGLE_APPLICATION_CREDENTIALS` variable in the cell below and run the cell.

In [45]:
import os
import sys
import pandas as pd

# If you are running this notebook in Colab, run this cell and follow the
# instructions to authenticate your GCP account. This provides access to your
# Cloud Storage bucket and lets you submit training jobs and prediction
# requests.

# The Google Cloud Notebook product has specific requirements
IS_GOOGLE_CLOUD_NOTEBOOK = os.path.exists("/opt/deeplearning/metadata/env_version")

# If on Google Cloud Notebooks, then don't execute this code
if not IS_GOOGLE_CLOUD_NOTEBOOK:
    if "google.colab" in sys.modules:
        from google.colab import auth as google_auth

        google_auth.authenticate_user()

    # If you are running this notebook locally, replace the string below with the
    # path to your service account key and run this cell to authenticate your GCP
    # account.
    elif not os.getenv("IS_TESTING"):
        %env GOOGLE_APPLICATION_CREDENTIALS ''

### Create a Cloud Storage bucket

**The following steps are required, regardless of your notebook environment.**

This notebook demonstrates how to use Vertex AI SDK for Python to create an AutoML model based on a tabular dataset. You will need to provide a Cloud Storage bucket where the dataset will be stored.

Set the name of your Cloud Storage bucket below. It must be unique across all of your Cloud Storage buckets.

**Only if your bucket doesn't already exist**: Run the following cell to create your Cloud Storage bucket.

In [None]:
! gsutil mb -l $REGION $BUCKET_NAME

Finally, validate access to your Cloud Storage bucket by examining its contents:

In [46]:
! gsutil ls -al $BUCKET_URI

         0  2022-07-27T17:51:07Z  gs://auto-ml-tutorials/tabular/#1658944267893591  metageneration=1
   3700818  2022-07-27T20:05:40Z  gs://auto-ml-tutorials/tabular/bank-marketing.csv#1658952340977817  metageneration=1
TOTAL: 2 objects, 3700818 bytes (3.53 MiB)


### Import Vertex SDK for Python

Import the Vertex SDK into your Python environment and initialize it.

In [47]:
from google.cloud import aiplatform

aiplatform.init(project=PROJECT_ID, location=REGION)

## Build the Bank Deposit AutoML Model 

This will be a classification model

## Read The Dataset

This code will display a sample of the data. 

In [48]:
# Read the data
dataset_sample = pd.read_csv(dataset_source_public,nrows=5)
                                   
# Use the next code cell to print the first five rows of the data.
dataset_sample.head()

print("\n5 rows sample")
print(dataset_sample.to_string())

## list columns 
print("\nThese are the columns")
cols = list(pd.read_csv(dataset_source_public, nrows =1))
for i in cols:
       print(i)

# Shape of training data (num_rows, num_columns)
#print(dataset_sample.head())


5 rows sample
   Age           Job MaritalStatus  Education Default  Balance Housing Loan  Contact  Day Month  Duration  Campaign  PDays  Previous POutcome  Deposit
0   58    management       married   tertiary      no     2143     yes   no  unknown    5   may       261         1     -1         0  unknown        1
1   44    technician        single  secondary      no       29     yes   no  unknown    5   may       151         1     -1         0  unknown        1
2   33  entrepreneur       married  secondary      no        2     yes  yes  unknown    5   may        76         1     -1         0  unknown        1
3   47   blue-collar       married    unknown      no     1506     yes   no  unknown    5   may        92         1     -1         0  unknown        1
4   33       unknown        single    unknown      no        1      no   no  unknown    5   may       198         1     -1         0  unknown        1

These are the columns
Age
Job
MaritalStatus
Education
Default
Balance
Housing


### Create a Managed Tabular Dataset from a CSV

This section will create a dataset from a CSV file stored on your GCS bucket.

In [49]:
# Create dataset if it doesn't exist
datasets = aiplatform.TabularDataset.list(filter = f"display_name={dataset_display_name}")

if datasets:
    dataset = datasets[0]
    print(f"Dataset Exists: {dataset.display_name}")
else:
    #create the new dataset 
    dataset = aiplatform.TabularDataset.create (
    display_name=dataset_display_name,
    gcs_source=dataset_local_source_uri,
    )
    print(f"Dataset Created: {dataset.display_name}")
    
    print(f'Review the Dataset in the Console:\nhttps://console.cloud.google.com/vertex-ai/locations/{REGION}/datasets/{dataset.resource_name}?project={PROJECT_ID}')


Dataset Exists: automl_tabular_bank_purchase_model_dataset


### Launch a Training Job to Create a Model

You create a custom model by training it using a prepared dataset. AutoML Tables uses the items from the dataset to train the model, test it, and evaluate its performance. You can review the results, adjust the training dataset as needed and train a new model using the improved dataset.

[SDK docs reference](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.AutoMLTabularTrainingJob)

In [50]:
# Define the training job and create one if it doesn't exit 
jobs = aiplatform.AutoMLTabularTrainingJob.list(filter = f"display_name={job_display_name}")

if jobs:
   job = jobs[0]
   print(f"Jobs Exists: {jobs[0].display_name}")
else:
    job = aiplatform.AutoMLTabularTrainingJob(
    display_name=job_display_name,
    optimization_prediction_type="classification",
    )
    print("Training job prepared")


Jobs Exists: automl_tabular_bank_purchase_model_job


### Launch a Model run job

Once we have defined your training script, we will create a model. The `run` function creates a training pipeline that trains and creates a `Model` object. After the training pipeline completes, the `run` function returns the `Model` object.

In [51]:
# Create model job if it doesn't exist
models = aiplatform.Model.list(filter = f"display_name={model_display_name}")

if models:
    model = models[0]
    print(f"Model Exists: {models[0].display_name}")
else:
    model = job.run(
    dataset=dataset,
    target_column="Deposit",
    model_display_name=model_display_name,
    training_fraction_split=0.8,
    validation_fraction_split=0.1,
    test_fraction_split=0.1,
    sync=True,
    )
    print(f"Model Created: {model.display_name}")

Model Exists: automl_tabular_bank_purchase_model


# Review model evaluation scores
After your model training has finished, you can review the evaluation scores for it using the list_model_evaluations() method. This method will return an iterator for each evaluation slice.



In [86]:
model_evaluations = model.list_model_evaluations()
get_model_evaluations = model.get_model_evaluation()


for model_evaluation in model_evaluations:
    print(model_evaluation.to_dict())

{'name': 'projects/993987777814/locations/us-central1/models/3243156880683433984@1/evaluations/8518542272604391807', 'metricsSchemaUri': 'gs://google-cloud-aiplatform/schema/modelevaluation/classification_metrics_1.0.0.yaml', 'metrics': {'auPrc': 0.9785546, 'confusionMatrix': {'rows': [[3806.0, 152.0], [277.0, 245.0]], 'annotationSpecs': [{'id': '0', 'displayName': '1'}, {'id': '1', 'displayName': '2'}]}, 'logLoss': 0.19009551, 'confidenceMetrics': [{'confidenceThreshold': -0.005, 'precisionAt1': 0.5, 'falsePositiveRateAt1': 1.0, 'precision': 0.5, 'recall': 1.0, 'falsePositiveCount': '4480', 'falsePositiveRate': 1.0, 'f1Score': 0.6666667, 'f1ScoreAt1': 0.6666667, 'recallAt1': 1.0, 'truePositiveCount': '4480', 'confusionMatrix': {'rows': [[3806.0, 152.0], [277.0, 245.0]], 'annotationSpecs': [{'id': '0', 'displayName': '1'}, {'id': '1', 'displayName': '2'}]}}, {'f1Score': 0.6666667, 'recallAt1': 0.9042411, 'recall': 1.0, 'f1ScoreAt1': 0.9042411, 'falsePositiveCount': '4480', 'truePositiv

# Endpoints (Create Endpoint, Deploy Model, Make Prediction)
Endpoints are machine learning models made available for online prediction requests. Endpoints are useful for timely predictions from many users (for example, in response to an application request). You can also request batch predictions if you don't need immediate results.

To create an endpoint, you need at least one machine learning model. [SDK Docs](https://cloud.google.com/vertex-ai/docs/predictions/deploy-model-console?_ga=2.2361459.-270543887.1669322993)

### Deploy your model

Before you use your model to make predictions, you need to deploy it to an `Endpoint`. You can do this by calling the `deploy` function on the `Model` resource. This function does two things:

1. Creates an `Endpoint` resource to which the `Model` resource will be deployed.

2. Deploys the `Model` resource to the `Endpoint` resource.

Deploy your model.

### NOTE: Wait until the model **FINISHES** deployment before proceeding to prediction.

## Create the Endpoint

In [103]:
# Check if Endpoint Exists
endpoints = aiplatform.Endpoint.list(filter = f"display_name={endpoint_display_name}")

if endpoints:
    endpoint = endpoints[0]
    print(f"Endpoint Exists: {endpoints[0].display_name}")
else:
    endpoint = aiplatform.Endpoint.create(
    display_name=endpoint_display_name,
    )
    print(f"Endpoint Created: {endpoint.display_name}")

print(f'Review the Endpoint in the Console:\nhttps://console.cloud.google.com/vertex-ai/locations/{REGION}/endpoints/{endpoint.name}?project={PROJECT_ID}')

Endpoint Exists: automl_tabular_bank_purchase_model_endpoint
Review the Endpoint in the Console:
https://console.cloud.google.com/vertex-ai/locations/us-central1/endpoints/5738047720153677824?project=paulkamau


## Deploy the Model

In [None]:
# deploying the model to the endpoint may take 10-15 minutes

# Check if Model Deployment Exists
deployments = endpoint.list(filter = f"display_name={model_display_name}")

if deployments:
    deployment = deployments[0]
    print(f"Deployment Exists: {deployments[0].display_name}")
else:
    deployment = endpoint.deploy(
    model=model,
    min_replica_count=1,
    max_replica_count=1,
    machine_type='n1-standard-4',
    accelerator_type='NVIDIA_TESLA_K80',
    accelerator_count=1
    )
    print(f"Model deployment Created: {endpoint.display_name}")

## Predict on the endpoint


* This sample instance is taken from an observation in which `Deposit` = **1**
* Note that the values are all strings. Since the original data was in CSV format, everything is treated as a string. The transformations you defined when creating your `AutoMLTabularTrainingJob` inform Vertex AI to transform the inputs to their defined types.


#### Quick of what the training dataset looks like 

In [87]:
# Read the data
dataset_sample = pd.read_csv(dataset_source_public,nrows=3)
                                  
# Use the next code cell to print the first five rows of the data.
dataset_sample.head()

print("\n5 rows sample")
print(dataset_sample.to_string())

## list columns 
print("\nThese are the columns")
cols = list(pd.read_csv(dataset_source_public, nrows =1))
for i in cols:
       print(i)

# Shape of training data (num_rows, num_columns)
#print(dataset_sample.head())


5 rows sample
   Age           Job MaritalStatus  Education Default  Balance Housing Loan  Contact  Day Month  Duration  Campaign  PDays  Previous POutcome  Deposit
0   58    management       married   tertiary      no     2143     yes   no  unknown    5   may       261         1     -1         0  unknown        1
1   44    technician        single  secondary      no       29     yes   no  unknown    5   may       151         1     -1         0  unknown        1
2   33  entrepreneur       married  secondary      no        2     yes  yes  unknown    5   may        76         1     -1         0  unknown        1

These are the columns
Age
Job
MaritalStatus
Education
Default
Balance
Housing
Loan
Contact
Day
Month
Duration
Campaign
PDays
Previous
POutcome
Deposit


### Create new predictions with the model

We are attempting to predict the "Deposit" Value. Either 1 or 0

In [114]:
df_sample_requests_list = [
    {
      "Age": "33",
      "Job":"Technician",
      "MaritalStatus":"single",
      "Education":"secondary",
      "Default":"no",
      "Balance": "3000",
      "Housing":"yes",
      "Loan":"No",
      "Contact":"unknown",
      "Day": "7",
      "Month":"June",
      "Duration":"200",
      "Campaign":"1",
      "PDays":"-1",
      "Previous":"0",
      "POutcome":"unknown",
   }
]

In [None]:
prediction = endpoint.predict(df_sample_requests_list)

print(prediction.predictions)

### Undeploy the model

To undeploy your `Model` resource from the serving `Endpoint` resource, use the endpoint's `undeploy` method with the following parameter:

- `deployed_model_id`: The model deployment identifier returned by the prediction service when the `Model` resource is deployed. You can retrieve the `deployed_model_id` using the prediction object's `deployed_model_id` property.

In [None]:
endpoint.undeploy(deployed_model_id=prediction.deployed_model_id)

# Cleaning up

To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.

Otherwise, you can delete the individual resources you created in this tutorial:

- Training Job
- Model
- Endpoint
- Cloud Storage Bucket

**Note**: You must delete any `Model` resources deployed to the `Endpoint` resource before deleting the `Endpoint` resource.

In [None]:
delete_training_job = True
delete_model = True
delete_endpoint = True

# Warning: Setting this to true will delete everything in your bucket
delete_bucket = False

# Delete the training job
job.delete()

# Delete the model
model.delete()

# Delete the endpoint
endpoint.delete()

if delete_bucket and "BUCKET_NAME" in globals():
    ! gsutil -m rm -r $BUCKET_NAME