<a href="https://colab.research.google.com/github/squeeko/DL_TF20_KerasCNNGANSRNNNLP/blob/in_progress/DL_TF2_Ch2_TF_Estimators_LinReg_HousePricing.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Predicting house price using linear regression

Now that we have the basics covered, let us apply these concepts to a real dataset.
We will consider the Boston housing price dataset [http://lib.stat.cmu.edu/
datasets/boston](http://lib.stat.cmu.edu/datasets/boston) collected by Harrison and Rubinfield in 1978. The dataset
contains 506 sample cases. Each house is assigned 14 attributes:


• CRIM – per capita crime rate by town

• ZN – proportion of residential land zoned for lots over 25,000 sq.ft.

• INDUS – proportion of non-retail business acres per town

• CHAS – Charles River dummy variable (1 if tract bounds river; 0 otherwise)

• NOX – nitric oxide concentration (parts per 10 million)

• RM – average number of rooms per dwelling

• AGE – proportion of owner-occupied units built prior to 1940

• DIS – weighted distances to five Boston employment centers

• RAD – index of accessibility to radial highways

• TAX – full-value property-tax rate per 10,000 dollars

• PTRATIO – pupil-teacher ratio by town

• B – 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town

• LSTAT – percentage of lower status citizens in the population

• MEDV – median value of owner-occupied homes in $1,000s

In [2]:
# Use the TensorFlow Estimator to build the Linear Regression model
# Import the modules

import tensorflow as tf
import pandas as pd
import tensorflow.feature_column as fc
from tensorflow.keras.datasets import boston_housing

In [3]:
# Download the dataset
(x_train, y_train), (x_test, y_test) = boston_housing.load_data()

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/boston_housing.npz


In [5]:
# Define the features in the data and convert into pandas DataFrame

features = ['CRIM', 'ZN','INDUS','CHAS','NOX','RM','AGE',
'DIS', 'RAD', 'TAX', 'PTRATIO', 'B', 'LSTAT']

x_train_df = pd.DataFrame(x_train, columns = features)
x_test_df = pd.DataFrame(x_test, columns= features)
y_train_df = pd.DataFrame(y_train, columns= ['MEDV'])
y_test_df = pd.DataFrame(y_test, columns= ['MEDV'])


In [6]:
x_train_df.head()

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT
0,1.23247,0.0,8.14,0.0,0.538,6.142,91.7,3.9769,4.0,307.0,21.0,396.9,18.72
1,0.02177,82.5,2.03,0.0,0.415,7.61,15.7,6.27,2.0,348.0,14.7,395.38,3.11
2,4.89822,0.0,18.1,0.0,0.631,4.97,100.0,1.3325,24.0,666.0,20.2,375.52,3.26
3,0.03961,0.0,5.19,0.0,0.515,6.037,34.5,5.9853,5.0,224.0,20.2,396.9,8.01
4,3.69311,0.0,18.1,0.0,0.713,6.376,88.4,2.5671,24.0,666.0,20.2,391.43,14.65


In [7]:
x_test_df.head()

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT
0,18.0846,0.0,18.1,0.0,0.679,6.434,100.0,1.8347,24.0,666.0,20.2,27.25,29.05
1,0.12329,0.0,10.01,0.0,0.547,5.913,92.9,2.3534,6.0,432.0,17.8,394.95,16.21
2,0.05497,0.0,5.19,0.0,0.515,5.985,45.4,4.8122,5.0,224.0,20.2,396.9,9.74
3,1.27346,0.0,19.58,1.0,0.605,6.25,92.6,1.7984,5.0,403.0,14.7,338.92,5.5
4,0.07151,0.0,4.49,0.0,0.449,6.121,56.8,3.7476,3.0,247.0,18.5,395.15,8.44


In [8]:
y_train_df.head()

Unnamed: 0,MEDV
0,15.2
1,42.3
2,50.0
3,21.1
4,17.7


In [9]:
y_test_df.head()

Unnamed: 0,MEDV
0,7.2
1,18.8
2,19.0
3,27.0
4,22.2


In [10]:
# NOTE: We are using all of the features but we can take only the ones that matter later if we choose.

feature_columns = []
for feature_name in features:
  feature_columns.append(fc.numeric_column(feature_name, dtype=tf.float32))

In [11]:
feature_columns

[NumericColumn(key='CRIM', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 NumericColumn(key='ZN', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 NumericColumn(key='INDUS', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 NumericColumn(key='CHAS', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 NumericColumn(key='NOX', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 NumericColumn(key='RM', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 NumericColumn(key='AGE', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 NumericColumn(key='DIS', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 NumericColumn(key='RAD', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 NumericColumn(key='TAX', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 NumericColumn(key='PTRATIO'

In [15]:
# Create the input function for the estimator, returns the tf.Data.Dataset object with a tuple
# , features and labels in batches which is then used to create "train_input_fn" and "val_input_fn"

def estimator_input_fn(df_data, df_label, epochs=10, shuffle=True, batch_size=32):
  def input_function():
    ds = tf.data.Dataset.from_tensor_slices((dict(df_data), df_label))
    if shuffle:
        ds = ds.shuffle(100)
    ds = ds.batch(batch_size).repeat(epochs)
    return ds

  return input_function

train_input_fn = estimator_input_fn(x_train_df, y_train_df)
val_input_fn = estimator_input_fn(x_test_df, y_test_df, epochs=1, shuffle=False)


In [17]:
# Instantiate a LinearRegessor estimator by using "train_input_fn" and get the validation set
# results by evaluating the trained model using "val_input_fn"

linear_est = tf.estimator.LinearRegressor(feature_columns=feature_columns)
linear_est.train(train_input_fn, steps=100)
result = linear_est.evaluate(val_input_fn)

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': '/tmp/tmpsn5ifiuc', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and 

In [19]:
# Make a prediction

result = linear_est.predict(val_input_fn)
for pred, exp in zip(result, y_test[:32]):
  print("Predicted Value: ", pred['predictions'][0], "Expected: ", exp)

INFO:tensorflow:Calling model_fn.


To change all layers to have dtype float64 by default, call `tf.keras.backend.set_floatx('float64')`. To change just this layer, pass dtype='float64' to the layer constructor. If you are the author of this layer, you can disable autocasting by passing autocast=False to the base Layer constructor.

INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /tmp/tmpsn5ifiuc/model.ckpt-100
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
Predicted Value:  6.012524 Expected:  7.2
Predicted Value:  25.9571 Expected:  18.8
Predicted Value:  23.776726 Expected:  19.0
Predicted Value:  24.282825 Expected:  27.0
Predicted Value:  24.420626 Expected:  22.2
Predicted Value:  24.00523 Expected:  24.5
Predicted Value:  31.369152 Expected:  31.2
Predicted Value:  27.383957 Expected:  22.9
Predicted Value:  22.906439 Expected:  20.5
Predicted Value:  26.68546 Expected:  23.

The graph shows the flow of data, ops, and nodes used in the whole process. To get the TensorBoard graph for the estimator, you just need to define model_dir while instantiating the Estimator class:

In [21]:
linear_est = tf.estimator.LinearRegressor(feature_columns=feature_columns, model_dir = 'logs/func/')

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': 'logs/func/', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
