d-sandbox

<div style="text-align: center; line-height: 0; padding-top: 9px;">
  <img src="https://databricks.com/wp-content/uploads/2018/03/db-academy-rgb-1200px.png" alt="Databricks Learning" style="width: 1200px">
</div>

# Keras

In this notebook, we will build upon the concepts introduced in the previous lab to build a neural network that is more powerful than a simple linear regression model!

We will use the California Housing Dataset.

## ![Spark Logo Tiny](https://files.training.databricks.com/images/105/logo_spark_tiny.png) In this lesson you:<br>
  - Will modify these parameters for increased model performance:
    - Activation functions
    - Loss functions
    - Optimizer
    - Batch Size
  -  Save and load models

In [3]:
%run "./Includes/Classroom-Setup"

In [4]:
from sklearn.datasets.california_housing import fetch_california_housing
from sklearn.model_selection import train_test_split

cal_housing = fetch_california_housing()

# split 80/20 train-test
X_train, X_test, y_train, y_test = train_test_split(cal_housing.data,
                                                    cal_housing.target,
                                                    test_size=0.2,
                                                    random_state=1)

print(cal_housing.DESCR)

-sandbox

## Recall from Last Lab

##### Steps to build a Keras model
<img style="width:20%" src="https://files.training.databricks.com/images/5_cycle.jpg" >

## Define a Network

Let's not just reinvent linear regression. Let's build a model, but with multiple layers using the [Sequential model](https://www.tensorflow.org/api_docs/python/tf/keras/Sequential) from Keras.

![](https://files.training.databricks.com/images/Neural_network.svg)

## 1. Activation Function

If we keep the activation as linear, then we aren't utilizing the power of neural networks!! The power of neural networks derives from the non-linear combinations of linear functions.

**RECAP:** So what are our options for [activation functions](http://cs231n.github.io/neural-networks-1/#actfun)?

In [8]:
from tensorflow.keras.layers import Dense
from tensorflow.keras.models import Sequential
import tensorflow as tf
tf.random.set_seed(42)

model = Sequential()

# Input layer
model.add(Dense(20, input_dim=8, activation="relu")) 

# Automatically infers the input_dim based on the layer before it
model.add(Dense(20, activation="relu")) 

# Output layer
model.add(Dense(1, activation="linear")) 

#### Alternative Keras Model Syntax

In [10]:
def build_model():
  return Sequential([Dense(20, input_dim=8, activation="relu"),
                     Dense(20, activation="relu"),
                     Dense(1, activation="linear")]) # Keep the last layer as linear because this is a regression problem

We can check the model definition by calling `.summary()`

In [12]:
model = build_model()
model.summary()

## 2. Loss Functions + Metrics

In Keras, the *loss function* is the function for our optimizer to minimize. *[Metrics](https://www.tensorflow.org/api_docs/python/tf/keras/metrics)* are similar to a loss function, except that the results from evaluating a metric are not used when training the model.

**Recap:** Which loss functions should we use for regression? Classification?

In [14]:
from tensorflow.keras import metrics
from tensorflow.keras import losses

loss = "mse" # Or loss = losses.mse
metrics = ["mae", "mse"] # Or metrics = [metrics.mae, metrics.mse]

model.compile(optimizer="sgd", loss=loss, metrics=metrics)
model.fit(X_train, y_train, epochs=10)

## 3. Optimizer

WOW! We got a lot of NANs! Let's try this again, but using the Adam optimizer. There are a lot of optimizers out there, and here is a [great blog post](http://ruder.io/optimizing-gradient-descent/) illustrating the various optimizers.

When in doubt, the Adam optimizer does a very good job. If you want to adjust any of the hyperparameters, you will need to import the optimizer from `optimizers` instead of passing in the name as a string.

In [16]:
from tensorflow.keras import optimizers

model = build_model()
optimizer = optimizers.Adam(lr=0.001)

model.compile(optimizer=optimizer, loss=loss, metrics=metrics)
history = model.fit(X_train, y_train, epochs=20)

In [17]:
import matplotlib.pyplot as plt

def viewModelLoss():
  plt.clf()
  plt.plot(history.history["loss"])
  plt.title("Model Loss")
  plt.ylabel("Loss")
  plt.xlabel("Epoch")
  plt.show()
  
viewModelLoss()

## 4. Batch Size

Let's set our `batch_size` (how much data to be processed simultaneously by the model) to 64, and increase our `epochs` to 20. Mini-batches are often a power of 2, to facilitate memory allocation on GPU (typically between 16 and 512).


Also, if you don't want to see all of the intermediate values print out, you can set the `verbose` parameter: 0 = silent, 1 = progress bar, 2 = one line per epoch (defaults to 1)

In [19]:
model = build_model()
model.compile(optimizer=optimizer, loss=loss, metrics=metrics)
history = model.fit(X_train, y_train, epochs=20, batch_size=64, verbose=2)

## 5. Evaluate

In [21]:
model.evaluate(X_test, y_test)

## 6. Save Model, Load Model and Train More

Whenever you train neural networks, you want to save them. This way, you can reuse them later! 

In our case, we want to save  need to save both the architecture and the weights, so we will use `model.save`. If you only want to save the weights, you can use `model.save_weights`.

In [23]:
filepath = f"{working_dir}/keras_checkpoint_weights.ckpt"

model.save(filepath)

You can load both the model and architecture together using `load_model()`

In [25]:
from tensorflow.keras.models import load_model

new_model = load_model(filepath)

Check that the model architecture is the same.

In [27]:
new_model.summary()

Let's train it for one more epoch (we need to recompile), and then save those weights.  This is a *warm start.*

In [29]:
new_model.compile(optimizer="adam", loss="mse")
new_model.fit(X_train, y_train, validation_split=.2, epochs=1, verbose=2)
new_model.save_weights(filepath)


-sandbox
&copy; 2020 Databricks, Inc. All rights reserved.<br/>
Apache, Apache Spark, Spark and the Spark logo are trademarks of the <a href="http://www.apache.org/">Apache Software Foundation</a>.<br/>
<br/>
<a href="https://databricks.com/privacy-policy">Privacy Policy</a> | <a href="https://databricks.com/terms-of-use">Terms of Use</a> | <a href="http://help.databricks.com/">Support</a>