By deploying or using this software you agree to comply with the [AI Hub Terms of Service]( https://aihub.cloud.google.com/u/0/aihub-tos) and the [Google APIs Terms of Service](https://developers.google.com/terms/). To the extent of a direct conflict of terms, the AI Hub Terms of Service will control.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/GoogleCloudPlatform/ai-platform-samples/blob/master/notebooks/samples/aihub/tabular_data_inspection/tabular_data_inspection.ipynb)

# Overview

This notebook provides an example workflow of using the [Tabular data inspection](https://aihub.cloud.google.com/u/0/p/products%2F19b6a156-3ede-47de-9aa4-ace6b351849b) for a visual inspection of datasets.

### Dataset

The notebook uses the [Boston housing price regression dataset](https://www.kaggle.com/vikrishnan/boston-house-prices). It containers 506 observations with 13 features describing a house in Boston and a corresponding house price, stored in a 506x14 table.


### Objective

The goal of this notebook is to go through a common training workflow:
- Create a dataset
- Use [AI Platform Training](https://cloud.google.com/ai-platform/training/docs) service to create a visual "Run Report" from the dataset
- Inspect the dataset by looking at the generated "Run Report"

### Costs 

This tutorial uses billable components of Google Cloud Platform (GCP):

* Cloud AI Platform
* Cloud Storage

Learn about [Cloud AI Platform
pricing](https://cloud.google.com/ml-engine/docs/pricing) and [Cloud Storage
pricing](https://cloud.google.com/storage/pricing), and use the [Pricing
Calculator](https://cloud.google.com/products/calculator/)
to generate a cost estimate based on your projected usage.

### Set up your local development environment

**If you are using Colab or AI Platform Notebooks**, your environment already meets
all the requirements to run this notebook. You can skip this step.

**Otherwise**, make sure your environment meets this notebook's requirements.
You need the following:

* The Google Cloud SDK
* Git
* Python 3
* virtualenv
* Jupyter notebook running in a virtual environment with Python 3

The Google Cloud guide to [Setting up a Python development
environment](https://cloud.google.com/python/setup) and the [Jupyter
installation guide](https://jupyter.org/install) provide detailed instructions
for meeting these requirements. The following steps provide a condensed set of
instructions:

1. [Install and initialize the Cloud SDK.](https://cloud.google.com/sdk/docs/)

2. [Install Python 3.](https://cloud.google.com/python/setup#installing_python)

3. [Install
   virtualenv](https://cloud.google.com/python/setup#installing_and_using_virtualenv)
   and create a virtual environment that uses Python 3.

4. Activate that environment and run `pip install jupyter` in a shell to install
   Jupyter.

5. Run `jupyter notebook` in a shell to launch Jupyter.

6. Open this notebook in the Jupyter Notebook Dashboard.

### Set up your GCP project

**The following steps are required, regardless of your notebook environment.**

1. [Select or create a GCP project.](https://console.cloud.google.com/cloud-resource-manager). When you first create an account, you get a $300 free credit towards your compute/storage costs.

2. [Make sure that billing is enabled for your project.](https://cloud.google.com/billing/docs/how-to/modify-project)

3. [Enable the AI Platform APIs and Compute Engine APIs.](https://console.cloud.google.com/flows/enableapi?apiid=ml.googleapis.com,compute_component)

4. Enter your project ID in the cell below. Then run the  cell to make sure the
Cloud SDK uses the right project for all the commands in this notebook.

**Note**: Jupyter runs lines prefixed with `!` as shell commands, and it interpolates Python variables prefixed with `$` into these commands.

In [None]:
PROJECT_ID = "[your-project-id]" #@param {type:"string"}
! gcloud config set project $PROJECT_ID

### Authenticate your GCP account

**If you are using AI Platform Notebooks**, your environment is already
authenticated. Skip this step.

**If you are using Colab**, run the cell below and follow the instructions
when prompted to authenticate your account via oAuth.

**Otherwise**, follow these steps:

1. In the GCP Console, go to the [**Create service account key**
   page](https://console.cloud.google.com/apis/credentials/serviceaccountkey).

2. From the **Service account** drop-down list, select **New service account**.

3. In the **Service account name** field, enter a name.

4. From the **Role** drop-down list, select
   **Machine Learning Engine > AI Platform Admin** and
   **Storage > Storage Object Admin**.

5. Click *Create*. A JSON file that contains your key downloads to your
local environment.

6. Enter the path to your service account key as the
`GOOGLE_APPLICATION_CREDENTIALS` variable in the cell below and run the cell.

In [None]:
import sys

# If you are running this notebook in Colab, run this cell and follow the
# instructions to authenticate your GCP account. This provides access to your
# Cloud Storage bucket and lets you submit training jobs and prediction
# requests.

if 'google.colab' in sys.modules:
  from google.colab import auth as google_auth
  google_auth.authenticate_user()

# If you are running this notebook locally, replace the string below with the
# path to your service account key and run this cell to authenticate your GCP
# account.
else:
  %env GOOGLE_APPLICATION_CREDENTIALS ''

### Create a Cloud Storage bucket

**The following steps are required, regardless of your notebook environment.**

You need to have a "workspace" bucket that will hold the dataset and the output from the ML Container. Set the name of your Cloud Storage bucket below. It must be unique across all Cloud Storage buckets. 

You may also change the `REGION` variable, which is used for operations
throughout the rest of this notebook. Make sure to [choose a region where Cloud AI Platform services are available](https://cloud.google.com/ml-engine/docs/tensorflow/regions). You may not use a Multi-Regional Storage bucket for training with AI Platform.

In [None]:
BUCKET_NAME = "[your-bucket-name]" #@param {type:"string"}
REGION = 'us-central1' #@param {type:"string"}

**Only if your bucket doesn't already exist**: Run the following cell to create your Cloud Storage bucket.

In [None]:
! gsutil mb -l $REGION gs://$BUCKET_NAME

Finally, validate access to your Cloud Storage bucket by examining its contents:

In [None]:
! gsutil ls -al gs://$BUCKET_NAME

### Import libraries and define constants

In [None]:
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import os
import time
import pandas as pd
import tensorflow as tf
from IPython.core.display import HTML

## Create a dataset

In [2]:
bh = tf.keras.datasets.boston_housing
(X_train, y_train), (X_eval, y_eval) = bh.load_data()

training = pd.DataFrame(X_train)
training['target'] = y_train

validation = pd.DataFrame(X_eval)
validation['target'] = y_eval

print('Data head:')
display(training.head(2))

data = os.path.join(BUCKET_NAME, 'data.csv')

print('Copy the data in bucket ...')
with tf.io.gfile.GFile(data, 'w') as f:
  training.append(validation).to_csv(f, index=False)

Data head


Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,target
0,1.23247,0.0,8.14,0.0,0.538,6.142,91.7,3.9769,4.0,307.0,21.0,396.9,18.72,15.2
1,0.02177,82.5,2.03,0.0,0.415,7.61,15.7,6.27,2.0,348.0,14.7,395.38,3.11,42.3


Copy the data in bucket ...


## Cloud Run

### Accelerator and distribution support

| GPU | Multi-GPU Node | TPU | Workers | Parameter Server |
|---|---|---|---|---|
| No | No | No | No | No |


### AI Platform training

- [Tabular data inspection](https://aihub.cloud.google.com/u/0/p/products%2F19b6a156-3ede-47de-9aa4-ace6b351849b).
- [AI Platform training documentation](https://cloud.google.com/sdk/gcloud/reference/ai-platform/jobs/submit/training).

## Local Run

In [10]:
output_location = os.path.join(BUCKET_NAME, 'job_dir')

job_name = "data_inspection_{}".format(time.strftime("%Y%m%d%H%M%S"))
!gcloud ai-platform jobs submit training $job_name \
    --master-image-uri gcr.io/aihub-c2t-containers/kfp-components/oob_algorithm/tabular_data_inspection:latest \
    --region $REGION \
    --scale-tier CUSTOM \
    --master-machine-type standard \
    -- \
    --output-location {output_location} \
    --data {data} \
    --data-type csv

Job [data_inspection_20200206152006] submitted successfully.
Your job is still active. You may view the status of your job with the command

  $ gcloud ai-platform jobs describe data_inspection_20200206152006

or continue streaming the logs with the command

  $ gcloud ai-platform jobs stream-logs data_inspection_20200206152006
jobId: data_inspection_20200206152006
state: QUEUED


### Local training snippet

Note that the training can also be done locally with Docker
```bash
docker run \
    -v /tmp:/tmp \
    -it gcr.io/aihub-c2t-containers/kfp-components/oob_algorithm/tabular_data_inspection:latest \
    --output-location /tmp/tabular_data_inspection \
    --data /tmp/data.csv \
    --data-type csv
```

## Inspect the Run Report

The "Run Report" will help you identify if the model was successfully trained.

In [11]:
if not tf.io.gfile.exists(os.path.join(output_location, 'report.html')):
  raise RuntimeError('The file report.html was not found. Did the training job finish?')

with tf.io.gfile.GFile(os.path.join(output_location, 'report.html')) as f:
  display(HTML(f.read()))

0,1
Number of variables,14
Number of observations,506
Total Missing (%),0.0%
Total size in memory,55.4 KiB
Average record size in memory,112.2 B

0,1
Numeric,12
Categorical,0
Boolean,1
Date,0
Text (Unique),0
Rejected,1
Unsupported,0

0,1
Distinct count,504
Unique (%),99.6%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,3.6135
Minimum,0.00632
Maximum,88.976
Zeros (%),0.0%

0,1
Minimum,0.00632
5-th percentile,0.02791
Q1,0.082045
Median,0.25651
Q3,3.6771
95-th percentile,15.789
Maximum,88.976
Range,88.97
Interquartile range,3.595

0,1
Standard deviation,8.6015
Coef of variation,2.3804
Kurtosis,37.131
Mean,3.6135
MAD,4.7841
Skewness,5.2231
Sum,1828.4
Variance,73.987
Memory size,4.0 KiB

Value,Count,Frequency (%),Unnamed: 3
14.3337,2,0.4%,
0.015009999999999999,2,0.4%,
0.08265,1,0.2%,
9.96654,1,0.2%,
0.537,1,0.2%,
0.9761700000000001,1,0.2%,
1.3547200000000001,1,0.2%,
0.035019999999999996,1,0.2%,
0.29819,1,0.2%,
0.03615,1,0.2%,

Value,Count,Frequency (%),Unnamed: 3
0.00632,1,0.2%,
0.0090599999999999,1,0.2%,
0.01096,1,0.2%,
0.0130099999999999,1,0.2%,
0.01311,1,0.2%,

Value,Count,Frequency (%),Unnamed: 3
45.7461,1,0.2%,
51.1358,1,0.2%,
67.9208,1,0.2%,
73.5341,1,0.2%,
88.9762,1,0.2%,

0,1
Distinct count,26
Unique (%),5.1%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,11.364
Minimum,0
Maximum,100
Zeros (%),73.5%

0,1
Minimum,0.0
5-th percentile,0.0
Q1,0.0
Median,0.0
Q3,12.5
95-th percentile,80.0
Maximum,100.0
Range,100.0
Interquartile range,12.5

0,1
Standard deviation,23.322
Coef of variation,2.0524
Kurtosis,4.0315
Mean,11.364
MAD,16.709
Skewness,2.2257
Sum,5750
Variance,543.94
Memory size,4.0 KiB

Value,Count,Frequency (%),Unnamed: 3
0.0,372,73.5%,
20.0,21,4.2%,
80.0,15,3.0%,
12.5,10,2.0%,
22.0,10,2.0%,
25.0,10,2.0%,
40.0,7,1.4%,
30.0,6,1.2%,
45.0,6,1.2%,
90.0,5,1.0%,

Value,Count,Frequency (%),Unnamed: 3
0.0,372,73.5%,
12.5,10,2.0%,
17.5,1,0.2%,
18.0,1,0.2%,
20.0,21,4.2%,

Value,Count,Frequency (%),Unnamed: 3
82.5,2,0.4%,
85.0,2,0.4%,
90.0,5,1.0%,
95.0,4,0.8%,
100.0,1,0.2%,

0,1
Distinct count,46
Unique (%),9.1%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,18.456
Minimum,12.6
Maximum,22
Zeros (%),0.0%

0,1
Minimum,12.6
5-th percentile,14.7
Q1,17.4
Median,19.05
Q3,20.2
95-th percentile,21.0
Maximum,22.0
Range,9.4
Interquartile range,2.8

0,1
Standard deviation,2.1649
Coef of variation,0.11731
Kurtosis,-0.28509
Mean,18.456
MAD,1.7873
Skewness,-0.80232
Sum,9338.5
Variance,4.687
Memory size,4.0 KiB

Value,Count,Frequency (%),Unnamed: 3
20.2,140,27.7%,
14.7,34,6.7%,
21.0,27,5.3%,
17.8,23,4.5%,
19.2,19,3.8%,
17.4,18,3.6%,
18.6,17,3.4%,
19.1,17,3.4%,
18.4,16,3.2%,
16.6,16,3.2%,

Value,Count,Frequency (%),Unnamed: 3
12.6,3,0.6%,
13.0,12,2.4%,
13.6,1,0.2%,
14.4,1,0.2%,
14.7,34,6.7%,

Value,Count,Frequency (%),Unnamed: 3
20.9,11,2.2%,
21.0,27,5.3%,
21.1,1,0.2%,
21.2,15,3.0%,
22.0,2,0.4%,

0,1
Distinct count,357
Unique (%),70.6%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,356.67
Minimum,0.32
Maximum,396.9
Zeros (%),0.0%

0,1
Minimum,0.32
5-th percentile,84.59
Q1,375.38
Median,391.44
Q3,396.23
95-th percentile,396.9
Maximum,396.9
Range,396.58
Interquartile range,20.848

0,1
Standard deviation,91.295
Coef of variation,0.25596
Kurtosis,7.2268
Mean,356.67
MAD,54.629
Skewness,-2.8904
Sum,180480
Variance,8334.8
Memory size,4.0 KiB

Value,Count,Frequency (%),Unnamed: 3
396.9,121,23.9%,
393.74,3,0.6%,
395.24,3,0.6%,
395.11,2,0.4%,
395.63,2,0.4%,
392.78,2,0.4%,
394.12,2,0.4%,
391.34,2,0.4%,
390.94,2,0.4%,
395.62,2,0.4%,

Value,Count,Frequency (%),Unnamed: 3
0.32,1,0.2%,
2.52,1,0.2%,
2.6,1,0.2%,
3.5,1,0.2%,
3.65,1,0.2%,

Value,Count,Frequency (%),Unnamed: 3
396.28,1,0.2%,
396.3,1,0.2%,
396.33,1,0.2%,
396.42,1,0.2%,
396.9,121,23.9%,

0,1
Distinct count,455
Unique (%),89.9%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,12.653
Minimum,1.73
Maximum,37.97
Zeros (%),0.0%

0,1
Minimum,1.73
5-th percentile,3.7075
Q1,6.95
Median,11.36
Q3,16.955
95-th percentile,26.808
Maximum,37.97
Range,36.24
Interquartile range,10.005

0,1
Standard deviation,7.1411
Coef of variation,0.56437
Kurtosis,0.49324
Mean,12.653
MAD,5.7153
Skewness,0.90646
Sum,6402.4
Variance,50.995
Memory size,4.0 KiB

Value,Count,Frequency (%),Unnamed: 3
8.05,3,0.6%,
7.79,3,0.6%,
6.36,3,0.6%,
18.13,3,0.6%,
14.1,3,0.6%,
12.43,2,0.4%,
10.11,2,0.4%,
21.32,2,0.4%,
9.97,2,0.4%,
15.02,2,0.4%,

Value,Count,Frequency (%),Unnamed: 3
1.73,1,0.2%,
1.92,1,0.2%,
1.98,1,0.2%,
2.47,1,0.2%,
2.87,1,0.2%,

Value,Count,Frequency (%),Unnamed: 3
34.37,1,0.2%,
34.41,1,0.2%,
34.77,1,0.2%,
36.98,1,0.2%,
37.97,1,0.2%,

0,1
Distinct count,76
Unique (%),15.0%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,11.137
Minimum,0.46
Maximum,27.74
Zeros (%),0.0%

0,1
Minimum,0.46
5-th percentile,2.18
Q1,5.19
Median,9.69
Q3,18.1
95-th percentile,21.89
Maximum,27.74
Range,27.28
Interquartile range,12.91

0,1
Standard deviation,6.8604
Coef of variation,0.61601
Kurtosis,-1.2335
Mean,11.137
MAD,6.202
Skewness,0.29502
Sum,5635.2
Variance,47.064
Memory size,4.0 KiB

Value,Count,Frequency (%),Unnamed: 3
18.1,132,26.1%,
19.58,30,5.9%,
8.14,22,4.3%,
6.2,18,3.6%,
21.89,15,3.0%,
3.97,12,2.4%,
9.9,12,2.4%,
10.59,11,2.2%,
8.56,11,2.2%,
5.86,10,2.0%,

Value,Count,Frequency (%),Unnamed: 3
0.46,1,0.2%,
0.74,1,0.2%,
1.21,1,0.2%,
1.22,1,0.2%,
1.25,2,0.4%,

Value,Count,Frequency (%),Unnamed: 3
18.1,132,26.1%,
19.58,30,5.9%,
21.89,15,3.0%,
25.65,7,1.4%,
27.74,5,1.0%,

0,1
Distinct count,2
Unique (%),0.4%
Missing (%),0.0%
Missing (n),0

0,1
Mean,0.06917

0,1
0.0,471
1.0,35

Value,Count,Frequency (%),Unnamed: 3
0.0,471,93.1%,
1.0,35,6.9%,

0,1
Distinct count,81
Unique (%),16.0%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,0.5547
Minimum,0.385
Maximum,0.871
Zeros (%),0.0%

0,1
Minimum,0.385
5-th percentile,0.40925
Q1,0.449
Median,0.538
Q3,0.624
95-th percentile,0.74
Maximum,0.871
Range,0.486
Interquartile range,0.175

0,1
Standard deviation,0.11588
Coef of variation,0.2089
Kurtosis,-0.064667
Mean,0.5547
MAD,0.095695
Skewness,0.72931
Sum,280.68
Variance,0.013428
Memory size,4.0 KiB

Value,Count,Frequency (%),Unnamed: 3
0.5379999999999999,23,4.5%,
0.713,18,3.6%,
0.43700000000000006,17,3.4%,
0.871,16,3.2%,
0.489,15,3.0%,
0.624,15,3.0%,
0.605,14,2.8%,
0.693,14,2.8%,
0.74,13,2.6%,
0.544,12,2.4%,

Value,Count,Frequency (%),Unnamed: 3
0.385,1,0.2%,
0.389,1,0.2%,
0.392,2,0.4%,
0.3939999999999999,1,0.2%,
0.3979999999999999,2,0.4%,

Value,Count,Frequency (%),Unnamed: 3
0.713,18,3.6%,
0.718,6,1.2%,
0.74,13,2.6%,
0.77,8,1.6%,
0.871,16,3.2%,

0,1
Distinct count,446
Unique (%),88.1%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,6.2846
Minimum,3.561
Maximum,8.78
Zeros (%),0.0%

0,1
Minimum,3.561
5-th percentile,5.314
Q1,5.8855
Median,6.2085
Q3,6.6235
95-th percentile,7.5875
Maximum,8.78
Range,5.219
Interquartile range,0.738

0,1
Standard deviation,0.70262
Coef of variation,0.1118
Kurtosis,1.8915
Mean,6.2846
MAD,0.51329
Skewness,0.40361
Sum,3180
Variance,0.49367
Memory size,4.0 KiB

Value,Count,Frequency (%),Unnamed: 3
6.167000000000001,3,0.6%,
6.127000000000001,3,0.6%,
6.417000000000001,3,0.6%,
5.712999999999999,3,0.6%,
6.405,3,0.6%,
6.229,3,0.6%,
6.431,2,0.4%,
5.926,2,0.4%,
6.376,2,0.4%,
6.0089999999999995,2,0.4%,

Value,Count,Frequency (%),Unnamed: 3
3.5610000000000004,1,0.2%,
3.863,1,0.2%,
4.138,2,0.4%,
4.368,1,0.2%,
4.519,1,0.2%,

Value,Count,Frequency (%),Unnamed: 3
8.375,1,0.2%,
8.398,1,0.2%,
8.704,1,0.2%,
8.725,1,0.2%,
8.78,1,0.2%,

0,1
Distinct count,356
Unique (%),70.4%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,68.575
Minimum,2.9
Maximum,100
Zeros (%),0.0%

0,1
Minimum,2.9
5-th percentile,17.725
Q1,45.025
Median,77.5
Q3,94.075
95-th percentile,100.0
Maximum,100.0
Range,97.1
Interquartile range,49.05

0,1
Standard deviation,28.149
Coef of variation,0.41048
Kurtosis,-0.96772
Mean,68.575
MAD,24.611
Skewness,-0.59896
Sum,34699
Variance,792.36
Memory size,4.0 KiB

Value,Count,Frequency (%),Unnamed: 3
100.0,43,8.5%,
98.8,4,0.8%,
95.4,4,0.8%,
98.2,4,0.8%,
97.9,4,0.8%,
87.9,4,0.8%,
96.0,4,0.8%,
76.5,3,0.6%,
88.0,3,0.6%,
98.9,3,0.6%,

Value,Count,Frequency (%),Unnamed: 3
2.9,1,0.2%,
6.0,1,0.2%,
6.2,1,0.2%,
6.5,1,0.2%,
6.6,2,0.4%,

Value,Count,Frequency (%),Unnamed: 3
98.8,4,0.8%,
98.9,3,0.6%,
99.1,1,0.2%,
99.3,1,0.2%,
100.0,43,8.5%,

0,1
Distinct count,412
Unique (%),81.4%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,3.795
Minimum,1.1296
Maximum,12.127
Zeros (%),0.0%

0,1
Minimum,1.1296
5-th percentile,1.462
Q1,2.1002
Median,3.2074
Q3,5.1884
95-th percentile,7.8278
Maximum,12.127
Range,10.997
Interquartile range,3.0883

0,1
Standard deviation,2.1057
Coef of variation,0.55486
Kurtosis,0.48794
Mean,3.795
MAD,1.7194
Skewness,1.0118
Sum,1920.3
Variance,4.434
Memory size,4.0 KiB

Value,Count,Frequency (%),Unnamed: 3
3.4952,5,1.0%,
6.8147,4,0.8%,
5.7209,4,0.8%,
5.2873,4,0.8%,
5.4007,4,0.8%,
3.9454,3,0.6%,
7.309,3,0.6%,
6.4798,3,0.6%,
4.8122,3,0.6%,
5.1167,3,0.6%,

Value,Count,Frequency (%),Unnamed: 3
1.1296,1,0.2%,
1.137,1,0.2%,
1.1691,1,0.2%,
1.1742,1,0.2%,
1.1781,1,0.2%,

Value,Count,Frequency (%),Unnamed: 3
9.2203,2,0.4%,
9.2229,1,0.2%,
10.5857,2,0.4%,
10.7103,2,0.4%,
12.1265,1,0.2%,

0,1
Distinct count,9
Unique (%),1.8%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,9.5494
Minimum,1
Maximum,24
Zeros (%),0.0%

0,1
Minimum,1
5-th percentile,2
Q1,4
Median,5
Q3,24
95-th percentile,24
Maximum,24
Range,23
Interquartile range,20

0,1
Standard deviation,8.7073
Coef of variation,0.91181
Kurtosis,-0.86723
Mean,9.5494
MAD,7.5394
Skewness,1.0048
Sum,4832
Variance,75.816
Memory size,4.0 KiB

Value,Count,Frequency (%),Unnamed: 3
24.0,132,26.1%,
5.0,115,22.7%,
4.0,110,21.7%,
3.0,38,7.5%,
6.0,26,5.1%,
8.0,24,4.7%,
2.0,24,4.7%,
1.0,20,4.0%,
7.0,17,3.4%,

Value,Count,Frequency (%),Unnamed: 3
1.0,20,4.0%,
2.0,24,4.7%,
3.0,38,7.5%,
4.0,110,21.7%,
5.0,115,22.7%,

Value,Count,Frequency (%),Unnamed: 3
5.0,115,22.7%,
6.0,26,5.1%,
7.0,17,3.4%,
8.0,24,4.7%,
24.0,132,26.1%,

0,1
Correlation,0.91023

0,1
Distinct count,229
Unique (%),45.3%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,22.533
Minimum,5
Maximum,50
Zeros (%),0.0%

0,1
Minimum,5.0
5-th percentile,10.2
Q1,17.025
Median,21.2
Q3,25.0
95-th percentile,43.4
Maximum,50.0
Range,45.0
Interquartile range,7.975

0,1
Standard deviation,9.1971
Coef of variation,0.40817
Kurtosis,1.4952
Mean,22.533
MAD,6.6472
Skewness,1.1081
Sum,11402
Variance,84.587
Memory size,4.0 KiB

Value,Count,Frequency (%),Unnamed: 3
50.0,16,3.2%,
25.0,8,1.6%,
21.7,7,1.4%,
23.1,7,1.4%,
22.0,7,1.4%,
19.4,6,1.2%,
20.6,6,1.2%,
19.3,5,1.0%,
21.4,5,1.0%,
15.6,5,1.0%,

Value,Count,Frequency (%),Unnamed: 3
5.0,2,0.4%,
5.6,1,0.2%,
6.3,1,0.2%,
7.0,2,0.4%,
7.2,3,0.6%,

Value,Count,Frequency (%),Unnamed: 3
46.7,1,0.2%,
48.3,1,0.2%,
48.5,1,0.2%,
48.8,1,0.2%,
50.0,16,3.2%,

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,target
0,1.23247,0.0,8.14,0.0,0.538,6.142,91.7,3.9769,4.0,307.0,21.0,396.9,18.72,15.2
1,0.02177,82.5,2.03,0.0,0.415,7.61,15.7,6.27,2.0,348.0,14.7,395.38,3.11,42.3
2,4.89822,0.0,18.1,0.0,0.631,4.97,100.0,1.3325,24.0,666.0,20.2,375.52,3.26,50.0
3,0.03961,0.0,5.19,0.0,0.515,6.037,34.5,5.9853,5.0,224.0,20.2,396.9,8.01,21.1
4,3.69311,0.0,18.1,0.0,0.713,6.376,88.4,2.5671,24.0,666.0,20.2,391.43,14.65,17.7


# Cleaning up

To clean up all GCP resources used in this project, you can [delete the GCP
project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.

In [None]:
# If training job is still running, cancel it
! gcloud ai-platform jobs cancel $job_name --quiet

# Delete Cloud Storage objects that were created
! gsutil -m rm -r $BUCKET_NAME