<a href="https://colab.research.google.com/github/rkrissada/google_ml_training/blob/master/c_neuralnetwork.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Neural Network

**Learning Objectives:**
  * Use the `DNNRegressor` class in TensorFlow to predict median housing price

The data is based on 1990 census data from California. This data is at the city block level, so these features reflect the total number of rooms in that block, or the total number of people who live on that block, respectively.
<p>
Let's use a set of features to predict house value.

## Set Up
In this first cell, we'll load the necessary libraries.

In [0]:
import math
import shutil
import numpy as np
import pandas as pd
import tensorflow as tf

tf.logging.set_verbosity(tf.logging.INFO)
pd.options.display.max_rows = 10
pd.options.display.float_format = '{:.1f}'.format

  from ._conv import register_converters as _register_converters


Next, we'll load our data set.

In [0]:
df = pd.read_csv("https://storage.googleapis.com/ml_universities/california_housing_train.csv", sep=",")

## Examine the data

It's a good idea to get to know your data a little bit before you work with it.

We'll print out a quick summary of a few useful statistics on each column.

This will include things like mean, standard deviation, max, min, and various quantiles.

In [0]:
df.head()

Unnamed: 0,longitude,latitude,housing_median_age,total_rooms,total_bedrooms,population,households,median_income,median_house_value
0,-114.3,34.2,15.0,5612.0,1283.0,1015.0,472.0,1.5,66900.0
1,-114.5,34.4,19.0,7650.0,1901.0,1129.0,463.0,1.8,80100.0
2,-114.6,33.7,17.0,720.0,174.0,333.0,117.0,1.7,85700.0
3,-114.6,33.6,14.0,1501.0,337.0,515.0,226.0,3.2,73400.0
4,-114.6,33.6,20.0,1454.0,326.0,624.0,262.0,1.9,65500.0


In [0]:
df.describe()

Unnamed: 0,longitude,latitude,housing_median_age,total_rooms,total_bedrooms,population,households,median_income,median_house_value
count,17000.0,17000.0,17000.0,17000.0,17000.0,17000.0,17000.0,17000.0,17000.0
mean,-119.6,35.6,28.6,2643.7,539.4,1429.6,501.2,3.9,207300.9
std,2.0,2.1,12.6,2179.9,421.5,1147.9,384.5,1.9,115983.8
min,-124.3,32.5,1.0,2.0,1.0,3.0,1.0,0.5,14999.0
25%,-121.8,33.9,18.0,1462.0,297.0,790.0,282.0,2.6,119400.0
50%,-118.5,34.2,29.0,2127.0,434.0,1167.0,409.0,3.5,180400.0
75%,-118.0,37.7,37.0,3151.2,648.2,1721.0,605.2,4.8,265000.0
max,-114.3,42.0,52.0,37937.0,6445.0,35682.0,6082.0,15.0,500001.0


This data is at the city block level, so these features reflect the total number of rooms in that block, or the total number of people who live on that block, respectively.  Let's create a different, more appropriate feature.  Because we are predicing the price of a single house, we should try to make all our features correspond to a single house as well

In [0]:
df['num_rooms'] = df['total_rooms'] / df['households']
df['num_bedrooms'] = df['total_bedrooms'] / df['households']
df['persons_per_house'] = df['population'] / df['households']
df.describe()

Unnamed: 0,longitude,latitude,housing_median_age,total_rooms,total_bedrooms,population,households,median_income,median_house_value,num_rooms,num_bedrooms,persons_per_house
count,17000.0,17000.0,17000.0,17000.0,17000.0,17000.0,17000.0,17000.0,17000.0,17000.0,17000.0,17000.0
mean,-119.6,35.6,28.6,2643.7,539.4,1429.6,501.2,3.9,207300.9,5.4,1.1,3.0
std,2.0,2.1,12.6,2179.9,421.5,1147.9,384.5,1.9,115983.8,2.5,0.5,4.0
min,-124.3,32.5,1.0,2.0,1.0,3.0,1.0,0.5,14999.0,0.8,0.3,0.7
25%,-121.8,33.9,18.0,1462.0,297.0,790.0,282.0,2.6,119400.0,4.4,1.0,2.4
50%,-118.5,34.2,29.0,2127.0,434.0,1167.0,409.0,3.5,180400.0,5.2,1.0,2.8
75%,-118.0,37.7,37.0,3151.2,648.2,1721.0,605.2,4.8,265000.0,6.1,1.1,3.3
max,-114.3,42.0,52.0,37937.0,6445.0,35682.0,6082.0,15.0,500001.0,141.9,34.1,502.5


In [0]:
df.drop(['total_rooms', 'total_bedrooms', 'population', 'households'], axis = 1, inplace = True)
df.describe()

Unnamed: 0,longitude,latitude,housing_median_age,median_income,median_house_value,num_rooms,num_bedrooms,persons_per_house
count,17000.0,17000.0,17000.0,17000.0,17000.0,17000.0,17000.0,17000.0
mean,-119.6,35.6,28.6,3.9,207300.9,5.4,1.1,3.0
std,2.0,2.1,12.6,1.9,115983.8,2.5,0.5,4.0
min,-124.3,32.5,1.0,0.5,14999.0,0.8,0.3,0.7
25%,-121.8,33.9,18.0,2.6,119400.0,4.4,1.0,2.4
50%,-118.5,34.2,29.0,3.5,180400.0,5.2,1.0,2.8
75%,-118.0,37.7,37.0,4.8,265000.0,6.1,1.1,3.3
max,-114.3,42.0,52.0,15.0,500001.0,141.9,34.1,502.5


## Build a neural network model

In this exercise, we'll be trying to predict `median_house_value`. It will be our label (sometimes also called a target). We'll use the remaining columns as our input features.

To train our model, we'll first use the [LinearRegressor](https://www.tensorflow.org/api_docs/python/tf/contrib/learn/LinearRegressor) interface. Then, we'll change to DNNRegressor


In [0]:
featcols = {
  colname : tf.feature_column.numeric_column(colname) \
    for colname in 'housing_median_age,median_income,num_rooms,num_bedrooms,persons_per_house'.split(',')
}
# Bucketize lat, lon so it's not so high-res; California is mostly N-S, so more lats than lons
featcols['longitude'] = tf.feature_column.bucketized_column(tf.feature_column.numeric_column('longitude'),
                                                   np.linspace(-124.3, -114.3, 5).tolist())
featcols['latitude'] = tf.feature_column.bucketized_column(tf.feature_column.numeric_column('latitude'),
                                                  np.linspace(32.5, 42, 10).tolist())

In [0]:
featcols.keys()

['median_income',
 'persons_per_house',
 'num_rooms',
 'housing_median_age',
 'longitude',
 'num_bedrooms',
 'latitude']

In [0]:
# Split into train and eval
msk = np.random.rand(len(df)) < 0.8
traindf = df[msk]
evaldf = df[~msk]

SCALE = 100000
BATCH_SIZE= 100
OUTDIR = './housing_trained'
train_input_fn = tf.estimator.inputs.pandas_input_fn(x = traindf[list(featcols.keys())],
                                                    y = traindf["median_house_value"] / SCALE,
                                                    num_epochs = None,
                                                    batch_size = BATCH_SIZE,
                                                    shuffle = True)
eval_input_fn = tf.estimator.inputs.pandas_input_fn(x = evaldf[list(featcols.keys())],
                                                    y = evaldf["median_house_value"] / SCALE,  # note the scaling
                                                    num_epochs = 1, 
                                                    batch_size = len(evaldf), 
                                                    shuffle=False)

In [0]:
# Linear Regressor
def train_and_evaluate(output_dir, num_train_steps):
  myopt = tf.train.FtrlOptimizer(learning_rate = 0.01) # note the learning rate
  estimator = tf.estimator.LinearRegressor(
                       model_dir = output_dir, 
                       feature_columns = featcols.values(),
                       optimizer = myopt)
  
  #Add rmse evaluation metric
  def rmse(labels, predictions):
    pred_values = tf.cast(predictions['predictions'],tf.float64)
    return {'rmse': tf.metrics.root_mean_squared_error(labels*SCALE, pred_values*SCALE)}
  estimator = tf.contrib.estimator.add_metrics(estimator,rmse)
  
  train_spec=tf.estimator.TrainSpec(
                       input_fn = train_input_fn,
                       max_steps = num_train_steps)
  eval_spec=tf.estimator.EvalSpec(
                       input_fn = eval_input_fn,
                       steps = None,
                       start_delay_secs = 1, # start evaluating after N seconds
                       throttle_secs = 10,  # evaluate every N seconds
                       )
  tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)

# Run training    
shutil.rmtree(OUTDIR, ignore_errors = True) # start fresh each time
train_and_evaluate(OUTDIR, num_train_steps = (100 * len(traindf)) / BATCH_SIZE) 

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_save_checkpoints_secs': 600, '_session_config': None, '_keep_checkpoint_max': 5, '_task_type': 'worker', '_train_distribute': None, '_is_chief': True, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f44a2293b90>, '_evaluation_master': '', '_save_checkpoints_steps': None, '_keep_checkpoint_every_n_hours': 10000, '_service': None, '_num_ps_replicas': 0, '_tf_random_seed': None, '_master': '', '_num_worker_replicas': 1, '_task_id': 0, '_log_step_count_steps': 100, '_model_dir': './housing_trained', '_global_id_in_cluster': 0, '_save_summary_steps': 100}
INFO:tensorflow:Using config: {'_save_checkpoints_secs': 600, '_session_config': None, '_keep_checkpoint_max': 5, '_task_type': 'worker', '_train_distribute': None, '_is_chief': True, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f44a2293bd0>, '_evaluation_master': '', '_save_checkpoints_steps': None, '

INFO:tensorflow:Restoring parameters from ./housing_trained/model.ckpt-4263
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 4264 into ./housing_trained/model.ckpt.
INFO:tensorflow:loss = 64.494576, step = 4264
INFO:tensorflow:global_step/sec: 189.55
INFO:tensorflow:loss = 39.13523, step = 4364 (0.531 sec)
INFO:tensorflow:global_step/sec: 273.27
INFO:tensorflow:loss = 28.833477, step = 4464 (0.367 sec)
INFO:tensorflow:global_step/sec: 249.415
INFO:tensorflow:loss = 32.985798, step = 4564 (0.402 sec)
INFO:tensorflow:global_step/sec: 223.674
INFO:tensorflow:loss = 33.826633, step = 4664 (0.446 sec)
INFO:tensorflow:global_step/sec: 238.027
INFO:tensorflow:loss = 155.46606, step = 4764 (0.421 sec)
INFO:tensorflow:global_step/sec: 240.804
INFO:tensorflow:loss = 34.319412, step = 4864 (0.415 sec)
INFO:tensorflow:global_step/sec: 273.676
INFO:tensorflow:loss = 105.47386, step = 4964 (0.364 sec)
INFO:tensorflow:global_ste

INFO:tensorflow:loss = 51.75376, step = 9933 (0.373 sec)
INFO:tensorflow:global_step/sec: 270.753
INFO:tensorflow:loss = 38.03592, step = 10033 (0.369 sec)
INFO:tensorflow:global_step/sec: 256.451
INFO:tensorflow:loss = 87.416374, step = 10133 (0.390 sec)
INFO:tensorflow:global_step/sec: 290.835
INFO:tensorflow:loss = 52.78492, step = 10233 (0.343 sec)
INFO:tensorflow:global_step/sec: 305.912
INFO:tensorflow:loss = 58.572643, step = 10333 (0.327 sec)
INFO:tensorflow:global_step/sec: 282.423
INFO:tensorflow:loss = 79.86046, step = 10433 (0.357 sec)
INFO:tensorflow:global_step/sec: 240.878
INFO:tensorflow:loss = 27.952656, step = 10533 (0.414 sec)
INFO:tensorflow:global_step/sec: 229.878
INFO:tensorflow:loss = 65.3454, step = 10633 (0.436 sec)
INFO:tensorflow:global_step/sec: 247.302
INFO:tensorflow:loss = 118.04323, step = 10733 (0.403 sec)
INFO:tensorflow:global_step/sec: 307.46
INFO:tensorflow:loss = 46.9958, step = 10833 (0.326 sec)
INFO:tensorflow:Saving checkpoints for 10913 into .

In [0]:
# DNN Regressor
def train_and_evaluate(output_dir, num_train_steps):
  myopt = tf.train.FtrlOptimizer(learning_rate = 0.01) # note the learning rate
  estimator = tf.estimator.DNNRegressor(
                       model_dir = output_dir, 
                       hidden_units=[7, 7, 7],
                       feature_columns = featcols.values(),
                       optimizer = myopt)# TODO: Implement DNN Regressor model
  
  #Add rmse evaluation metric
  def rmse(labels, predictions):
    pred_values = tf.cast(predictions['predictions'],tf.float64)
    return {'rmse': tf.metrics.root_mean_squared_error(labels*SCALE, pred_values*SCALE)}
  estimator = tf.contrib.estimator.add_metrics(estimator,rmse)
  
  train_spec=tf.estimator.TrainSpec(
                       input_fn = train_input_fn,
                       max_steps = num_train_steps)
  eval_spec=tf.estimator.EvalSpec(
                       input_fn = eval_input_fn,
                       steps = None,
                       start_delay_secs = 1, # start evaluating after N seconds
                       throttle_secs = 10,  # evaluate every N seconds
                       )
  tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)

# Run training    
shutil.rmtree(OUTDIR, ignore_errors = True) # start fresh each time
tf.summary.FileWriterCache.clear() # ensure filewriter cache is clear for TensorBoard events file
train_and_evaluate(OUTDIR, num_train_steps = (100 * len(traindf)) / BATCH_SIZE) 

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_save_checkpoints_secs': 600, '_session_config': None, '_keep_checkpoint_max': 5, '_task_type': 'worker', '_train_distribute': None, '_is_chief': True, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f4496f3df90>, '_evaluation_master': '', '_save_checkpoints_steps': None, '_keep_checkpoint_every_n_hours': 10000, '_service': None, '_num_ps_replicas': 0, '_tf_random_seed': None, '_master': '', '_num_worker_replicas': 1, '_task_id': 0, '_log_step_count_steps': 100, '_model_dir': './housing_trained', '_global_id_in_cluster': 0, '_save_summary_steps': 100}
INFO:tensorflow:Using config: {'_save_checkpoints_secs': 600, '_session_config': None, '_keep_checkpoint_max': 5, '_task_type': 'worker', '_train_distribute': None, '_is_chief': True, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f4496f3dbd0>, '_evaluation_master': '', '_save_checkpoints_steps': None, '

INFO:tensorflow:global_step/sec: 95.2391
INFO:tensorflow:loss = 32.254303, step = 4143 (1.048 sec)
INFO:tensorflow:global_step/sec: 97.7765
INFO:tensorflow:loss = 41.006035, step = 4243 (1.024 sec)
INFO:tensorflow:global_step/sec: 186.901
INFO:tensorflow:loss = 127.63993, step = 4343 (0.534 sec)
INFO:tensorflow:global_step/sec: 261.438
INFO:tensorflow:loss = 61.32528, step = 4443 (0.382 sec)
INFO:tensorflow:global_step/sec: 108.977
INFO:tensorflow:loss = 99.807686, step = 4543 (0.917 sec)
INFO:tensorflow:Saving checkpoints for 4577 into ./housing_trained/model.ckpt.
INFO:tensorflow:Loss for final step: 47.9686.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2019-05-02-02:24:26
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from ./housing_trained/model.ckpt-4577
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done run

INFO:tensorflow:loss = 77.770386, step = 7637 (0.339 sec)
INFO:tensorflow:global_step/sec: 284.724
INFO:tensorflow:loss = 36.561245, step = 7737 (0.352 sec)
INFO:tensorflow:global_step/sec: 279.919
INFO:tensorflow:loss = 68.21945, step = 7837 (0.357 sec)
INFO:tensorflow:global_step/sec: 268.239
INFO:tensorflow:loss = 32.258266, step = 7937 (0.371 sec)
INFO:tensorflow:global_step/sec: 304.381
INFO:tensorflow:loss = 26.853617, step = 8037 (0.328 sec)
INFO:tensorflow:global_step/sec: 287.781
INFO:tensorflow:loss = 31.573317, step = 8137 (0.348 sec)
INFO:tensorflow:global_step/sec: 272.543
INFO:tensorflow:loss = 125.668686, step = 8237 (0.370 sec)
INFO:tensorflow:global_step/sec: 311.227
INFO:tensorflow:loss = 34.386436, step = 8337 (0.318 sec)
INFO:tensorflow:global_step/sec: 278.247
INFO:tensorflow:loss = 68.667816, step = 8437 (0.360 sec)
INFO:tensorflow:global_step/sec: 275.623
INFO:tensorflow:loss = 16.684097, step = 8537 (0.362 sec)
INFO:tensorflow:global_step/sec: 285.855
INFO:tenso

INFO:tensorflow:loss = 100.95195, step = 13516 (0.327 sec)
INFO:tensorflow:global_step/sec: 286.554
INFO:tensorflow:loss = 30.216936, step = 13616 (0.349 sec)
INFO:tensorflow:Saving checkpoints for 13651 into ./housing_trained/model.ckpt.
INFO:tensorflow:Loss for final step: 75.59654.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2019-05-02-02:25:50
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from ./housing_trained/model.ckpt-13651
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Finished evaluation at 2019-05-02-02:25:50
INFO:tensorflow:Saving dict for global step 13651: average_loss = 0.5133354, global_step = 13651, loss = 1719.1603, rmse = 71647.43


In [0]:
from google.datalab.ml import TensorBoard
pid = TensorBoard().start(OUTDIR)

In [0]:
TensorBoard().stop(pid)