# California Housing Regression Model Building - Abbreviated

This demo shows how you can use SageMaker Studio Notebooks to build machine learning models. We'll cover jupyter extensions, local model building, scaled SageMaker training jobs, Hyperparameter optimnization, and model deployment.

Now we will demonstrate these capabilities through a `California Housing` regression example. The experiment will be organized as follows:

Make sure you selected `Python 3 (TensorFlow 2.3 Python 3.7 CPU Optimized)` kernel.

### Setup

In [None]:
# Installed Libraries
import os
import time
import boto3
import itertools
import numpy as np
import pandas as pd
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sagemaker.tensorflow import TensorFlow
import sagemaker
from sagemaker import get_execution_role

# Project Imports
from california_housing_tf2 import get_model, train_model

## Exploratory Data Analysis

### Download California Housing dataset

In [None]:
data_dir = os.path.join(os.getcwd(), "data")
os.makedirs(data_dir, exist_ok=True)

train_dir = os.path.join(os.getcwd(), "data/train")
os.makedirs(train_dir, exist_ok=True)

test_dir = os.path.join(os.getcwd(), "data/test")
os.makedirs(test_dir, exist_ok=True)

data_set = fetch_california_housing(as_frame=True)

In [None]:
data_set.frame.head()

#### Objective
The target contains the median of the house value for each district. Therefore, this problem is a regression problem.

### Install Plotting Libraries

In [None]:
%pip install -q plotly nbformat matplotlib

### Visualize Data with Matplotlib

In [None]:
import matplotlib.pyplot as plt

data_set.frame.hist(figsize=(12, 10), bins=15, edgecolor="black")
plt.subplots_adjust(hspace=0.7, wspace=0.4)

### Interactively Visualize Data with Plotly

In [None]:
import plotly.express as px

fig = px.histogram(data_set.frame["HouseAge"], x="HouseAge", nbins=15)
fig.show()

### Data Transformations

In [None]:
X = pd.DataFrame(data_set.data, columns=data_set.feature_names)
Y = pd.DataFrame(data_set.target)

# We partition the dataset into 2/3 training and 1/3 test set.
x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size=0.33)

scaler = StandardScaler()
scaler.fit(x_train)
x_train = scaler.transform(x_train)
x_test = scaler.transform(x_test)

np.save(os.path.join(train_dir, "x_train.npy"), x_train)
np.save(os.path.join(test_dir, "x_test.npy"), x_test)
np.save(os.path.join(train_dir, "y_train.npy"), y_train)
np.save(os.path.join(test_dir, "y_test.npy"), y_test)

## Build Model Locally

In [None]:
my_model = get_model()
print(my_model.summary())

In [None]:
learning_rate = 0.1
epochs = 10
batch_size = 64
train_model(model=my_model, learning_rate=learning_rate, epochs=epochs,
            batch_size=batch_size,
            x_train=x_train, y_train=y_train, x_test=x_test,
            y_test=y_test, output_dir=os.getcwd())

## Perform Automatic Model Tuning with SageMaker

In [None]:
# boto3.
sess = boto3.Session()
sm = sess.client("sagemaker")
role = get_execution_role()
sagemaker_session = sagemaker.Session(boto_session=sess)
bucket = sagemaker_session.default_bucket()
prefix = "tf2-california-housing-experiment"

### Upload Data to S3

In [None]:
s3_inputs_train = sagemaker.Session().upload_data(
    path="data/train", bucket=bucket, key_prefix=prefix + "/train"
)
s3_inputs_test = sagemaker.Session().upload_data(
    path="data/test", bucket=bucket, key_prefix=prefix + "/test"
)
inputs = {"train": s3_inputs_train, "test": s3_inputs_test}
print(inputs)

In [None]:
from sagemaker.tuner import ContinuousParameter, HyperparameterTuner

objective_metric_name = "loss"
objective_type = "Minimize"
metric_definitions = [
    {"Name": "loss", "Regex": "loss: ([0-9\\.]+)"},
    {"Name": "accuracy", "Regex": "accuracy: ([0-9\\.]+)"},
    {"Name": "val_loss", "Regex": "val_loss: ([0-9\\.]+)"},
    {"Name": "val_accuracy", "Regex": "val_accuracy: ([0-9\\.]+)"},
]

static_hyperparameters = {'epochs': 100}
hyperparamter_range = {"learning_rate": ContinuousParameter(1e-4, 1e-3)}

tf2_california_housing_estimator = TensorFlow(
    entry_point="california_housing_tf2.py",
    role=sagemaker.get_execution_role(),
    hyperparameters=static_hyperparameters,
    instance_count=1,
    instance_type="ml.m5.large",
    framework_version="2.4.1",
    py_version="py37",
)

tuner = HyperparameterTuner(
    tf2_california_housing_estimator,
    objective_metric_name,
    hyperparamter_range,
    metric_definitions,
    base_tuning_job_name="housing-hpo",
    strategy="Bayesian",
    max_jobs=3,
    max_parallel_jobs=3,
    objective_type=objective_type,
)

tuner.fit(inputs)

In [None]:
# results = tuner.
results = tuner.analytics()
results.training_job_summaries()