## Learning Objectives:

After doing this Colab, you'll know how to do the following:

  * Create a simple deep neural network.
  * Tune the hyperparameters for a simple deep neural network.

## The Dataset
  
This Colab uses the California Housing Dataset

## Step 1: import relevant modules and load the dataset


In [None]:
import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow.keras import layers
from matplotlib import pyplot as plt
import seaborn as sns

#mount google drive
from google.colab import drive
drive.mount('/content/drive')

## Load the dataset

This exercise uses the California Housing Dataset.  The following code cell loads the separate .csv files and creates the following two pandas DataFrames:

* `df_train`, which contains the training set
* `df_test`, which contains the test set
   

In [None]:
df_train =pd.read_csv("drive/My Drive/Colab Notebooks/Lab5_dataset_train.csv")
df_test =pd.read_csv("drive/My Drive/Colab Notebooks/Lab5_dataset_test.csv")

In [None]:
# shuffle the examples
df_train = df_train.reindex(np.random.permutation(df_train.index)) 


# Calculate the Z-scores of each column in the training set:
df_train_mean = df_train.mean()
df_train_std = df_train.std()
df_train_norm = (df_train - df_train_mean)/df_train_std

# Calculate the Z-scores of each column in the test set.
df_test_norm = (df_test - df_train_mean)/df_train_std


df_train_norm.head()


## Create binary labels

In [None]:
# Create Binary label
#75th percentile of median house value
print(df_train["median_house_value"].quantile(q=0.75))
threshold = 265000.0   
df_train_norm["median_house_value_is_high"] = (df_train["median_house_value"] > threshold).astype(float)
df_test_norm["median_house_value_is_high"] = (df_test["median_house_value"] > threshold).astype(float)
df_train_norm["median_house_value_is_high"].head()


## Represent data

The following code cell creates a feature layer containing three features:

* `latitude` X `longitude` (a feature cross)
* `median_income`
* `population`

This code cell specifies the features that you'll ultimately train the model on and how each of those features will be represented. The transformations (collected in `my_feature_layer`) don't actually get applied until you pass a DataFrame to it, which will happen when we train the model. 

In [None]:
# Create an empty list that will eventually hold all created feature columns.
feature_columns = []

# We scaled all the columns, including latitude and longitude, into their
# Z scores. So, instead of picking a resolution in degrees, we're going
# to use resolution_in_Zs.  A resolution_in_Zs of 1 corresponds to 
# a full standard deviation. 
resolution_in_Zs = 0.3  # 3/10 of a standard deviation.

# Create a bucket feature column for latitude.
latitude_as_a_numeric_column = tf.feature_column.numeric_column("latitude")
latitude_boundaries = list(np.arange(int(min(df_train_norm['latitude'])), 
                                     int(max(df_train_norm['latitude'])), 
                                     resolution_in_Zs))
latitude = tf.feature_column.bucketized_column(latitude_as_a_numeric_column, latitude_boundaries)

# Create a bucket feature column for longitude.
longitude_as_a_numeric_column = tf.feature_column.numeric_column("longitude")
longitude_boundaries = list(np.arange(int(min(df_train_norm['longitude'])), 
                                      int(max(df_train_norm['longitude'])), 
                                      resolution_in_Zs))
longitude = tf.feature_column.bucketized_column(longitude_as_a_numeric_column, 
                                                longitude_boundaries)

# Create a feature cross of latitude and longitude.
latitude_x_longitude = tf.feature_column.crossed_column([latitude, longitude], hash_bucket_size=100)
crossed_feature = tf.feature_column.indicator_column(latitude_x_longitude)
feature_columns.append(crossed_feature)  

# Represent median_income as a floating-point value.
median_income = tf.feature_column.numeric_column("median_income")
feature_columns.append(median_income)

# Represent population as a floating-point value.
population = tf.feature_column.numeric_column("population")
feature_columns.append(population)

# Convert the list of feature columns into a layer that will later be fed into
# the model. 
my_feature_layer = tf.keras.layers.DenseFeatures(feature_columns)

## Step 2: build your NN




In [None]:

# Define the plotting function
def plot_curve(epochs, hist, list_of_metrics):
    """Plot a curve of one or more classification metrics vs epoch"""
    plt.figure()
    plt.xlabel("Epoch")
    plt.ylabel("Value")
    
    for m in list_of_metrics:
        x = hist[m]
        plt.plot(epochs[1:], x[1:], label=m)
        
    plt.legend()

In [None]:
def create_model(my_learning_rate, feature_layer ,my_metrics):
  """Create and compile a simple neural network model."""
  # Most simple tf.keras models are sequential.
  model = tf.keras.models.Sequential()

  # Add the layer containing the feature columns to the model.
  model.add(feature_layer)

  # Add one linear layer 
  model.add(tf.keras.layers.Dense(units=1, input_shape=(1,),activation=tf.sigmoid),)

  # Construct the layers into a model that TensorFlow can execute.
  model.compile(optimizer=tf.keras.optimizers.RMSprop(lr=my_learning_rate),
                loss=tf.keras.losses.BinaryCrossentropy(),
                metrics=my_metrics)

  return model           


def train_model(model, dataset, epochs, batch_size, label_name):
  """Feed a dataset into the model in order to train it."""

  # Split the dataset into features and label.
  features = {name:np.array(value) for name, value in dataset.items()}
  label = np.array(features.pop(label_name))
  # Store your model.fit results in a 'history' variable.
  history = model.fit(x=features, y=label, batch_size=batch_size,
                      epochs=epochs, shuffle=True)

  # Get details that will be useful for plotting the loss curve.
  epochs = history.epoch
  # Convert the history.history dictionary to a pandas dataframe.
  hist = pd.DataFrame(history.history)


  return epochs, hist   

print("Defined the create_model and train_model functions.")

Run the following code cell to invoke the functions defined in the preceding two code cells. (Ignore the warning messages.)

**Note:** Depending on the version of TensorFlow, running this cell might generate WARNING messages. Please ignore these warnings. 

In [None]:
# Hyperparameters
learning_rate = 0.001
epochs = 20
batch_size = 100
label_name = "median_house_value_is_high"
classification_threshold = 0.50


# Establish the metrics the model will measure
METRICS = [
           tf.keras.metrics.BinaryAccuracy(name='accuracy', threshold=classification_threshold),
           tf.keras.metrics.Precision(thresholds=classification_threshold, name='precision'),
           tf.keras.metrics.Recall(thresholds=classification_threshold, name='recall'),
          ]

my_model = create_model(learning_rate, my_feature_layer, METRICS)

epochs, hist = train_model(my_model, df_train_norm, epochs, batch_size, label_name)


list_of_metrics_to_plot = ['accuracy','precision','recall']
plot_curve(epochs, hist, list_of_metrics_to_plot)



## Step 3: evaluate your model

In [None]:
test_features = {name:np.array(value) for name, value in df_test_norm.items()}
# isolate the label
test_label = np.array(test_features.pop(label_name)) 
print("\n Evaluate the NN model against the test set:")
my_model.evaluate(x = test_features, y = test_label, batch_size=batch_size)