## Regression using TF-Regression model

California Housing Data

This data set contains information about all the block groups in California from the 1990 Census. In this sample a block group on average includes 1425.5 individuals living in a geographically compact area. 

The task is to aproximate the median house value of each block from the values of the rest of the variables. 

It has been obtained from the LIACC repository. The original page where the data set can be found is: http://www.liaad.up.pt/~ltorgo/Regression/DataSets.html.


In [1]:
import tensorflow as tf
import pandas as pd

In [2]:
hd=pd.read_csv('cal_housing_clean.csv')

In [3]:
hd.head()

Unnamed: 0,housingMedianAge,totalRooms,totalBedrooms,population,households,medianIncome,medianHouseValue
0,41.0,880.0,129.0,322.0,126.0,8.3252,452600.0
1,21.0,7099.0,1106.0,2401.0,1138.0,8.3014,358500.0
2,52.0,1467.0,190.0,496.0,177.0,7.2574,352100.0
3,52.0,1274.0,235.0,558.0,219.0,5.6431,341300.0
4,52.0,1627.0,280.0,565.0,259.0,3.8462,342200.0


In [4]:
hd['housingMedianAge'].hist(bins=20)

<matplotlib.axes._subplots.AxesSubplot at 0x181135da7b8>

# Normalise the data

In [5]:
#hd.columns

In [6]:
colmns_to_norm=['housingMedianAge', 'totalRooms', 'totalBedrooms', 'population', 'households', 'medianIncome']

In [7]:
hd[colmns_to_norm]=hd[colmns_to_norm].apply(lambda x: (x-x.min())/(x.max()-x.min()))

In [8]:
x_data=hd.drop('medianHouseValue', axis=1)

In [9]:
y_val=hd['medianHouseValue']

In [10]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(x_data,y_val,test_size=0.3,random_state=101)

In [11]:
#from sklearn.preprocessing import MinMaxScaler
#scaler = MinMaxScaler()
#scaler.fit(X_train)

In [12]:
X_train.head()

Unnamed: 0,housingMedianAge,totalRooms,totalBedrooms,population,households,medianIncome
6761,0.352941,0.069688,0.117163,0.039043,0.115442,0.142508
3010,0.607843,0.011242,0.015673,0.006699,0.014142,0.045027
7812,0.666667,0.02523,0.031347,0.016789,0.030258,0.212866
8480,0.666667,0.03253,0.03383,0.019816,0.030094,0.298651
1051,0.294118,0.031919,0.035692,0.015583,0.034863,0.272631


# Assign the Feature columns

In [13]:
hma=tf.feature_column.numeric_column('housingMedianAge')
tr=tf.feature_column.numeric_column('totalRooms')
tbdr=tf.feature_column.numeric_column('totalBedrooms')
popln=tf.feature_column.numeric_column('population')
hholds=tf.feature_column.numeric_column('households')
mi=tf.feature_column.numeric_column('medianIncome')


In [14]:
feat_cols=[hma,tr,tbdr,popln,hholds,mi]


In [15]:
## Built InPut function

In [16]:
input_func=tf.estimator.inputs.pandas_input_fn(X_train,y_train,batch_size=10,num_epochs=1000,shuffle=True)


In [17]:
model = tf.estimator.DNNRegressor(hidden_units=[6,6,6],feature_columns=feat_cols)


INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': 'C:\\Users\\raprabhu\\AppData\\Local\\Temp\\tmpfd8yec0o', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x000001811AB93FD0>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}


In [18]:
model.train(input_fn=input_func,steps=25000)

Instructions for updating:
Colocations handled automatically by placer.
Instructions for updating:
To construct input pipelines, use the `tf.data` module.
Instructions for updating:
To construct input pipelines, use the `tf.data` module.
INFO:tensorflow:Calling model_fn.
Instructions for updating:
Use tf.cast instead.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
Instructions for updating:
To construct input pipelines, use the `tf.data` module.
INFO:tensorflow:Saving checkpoints for 0 into C:\Users\raprabhu\AppData\Local\Temp\tmpfd8yec0o\model.ckpt.
INFO:tensorflow:loss = 709571900000.0, step = 1
INFO:tensorflow:global_step/sec: 272.022
INFO:tensorflow:loss = 358656500000.0, step = 101 (0.381 sec)
INFO:tensorflow:global_step/sec: 367.888
INFO:tensorflow:loss = 426357700000.0, step = 201 (0.260 sec)
INFO:tensorflow:global_step/sec: 

INFO:tensorflow:global_step/sec: 364.658
INFO:tensorflow:loss = 66728610000.0, step = 6001 (0.269 sec)
INFO:tensorflow:global_step/sec: 354.271
INFO:tensorflow:loss = 93127960000.0, step = 6101 (0.283 sec)
INFO:tensorflow:global_step/sec: 394.89
INFO:tensorflow:loss = 58089914000.0, step = 6201 (0.270 sec)
INFO:tensorflow:global_step/sec: 269.103
INFO:tensorflow:loss = 23833584000.0, step = 6301 (0.355 sec)
INFO:tensorflow:global_step/sec: 471.74
INFO:tensorflow:loss = 146701480000.0, step = 6401 (0.225 sec)
INFO:tensorflow:global_step/sec: 374.445
INFO:tensorflow:loss = 48553615000.0, step = 6501 (0.252 sec)
INFO:tensorflow:global_step/sec: 377.012
INFO:tensorflow:loss = 84700594000.0, step = 6601 (0.283 sec)
INFO:tensorflow:global_step/sec: 355.065
INFO:tensorflow:loss = 106794210000.0, step = 6701 (0.265 sec)
INFO:tensorflow:global_step/sec: 350.278
INFO:tensorflow:loss = 69176690000.0, step = 6801 (0.285 sec)
INFO:tensorflow:global_step/sec: 402.477
INFO:tensorflow:loss = 669912500

INFO:tensorflow:global_step/sec: 402.806
INFO:tensorflow:loss = 124720280000.0, step = 13901 (0.249 sec)
INFO:tensorflow:global_step/sec: 400.579
INFO:tensorflow:loss = 70453790000.0, step = 14001 (0.265 sec)
INFO:tensorflow:global_step/sec: 373.37
INFO:tensorflow:loss = 101294270000.0, step = 14101 (0.251 sec)
INFO:tensorflow:global_step/sec: 375.808
INFO:tensorflow:loss = 66258900000.0, step = 14201 (0.268 sec)
INFO:tensorflow:global_step/sec: 414.457
INFO:tensorflow:loss = 57907750000.0, step = 14301 (0.256 sec)
INFO:tensorflow:global_step/sec: 382.175
INFO:tensorflow:loss = 58923934000.0, step = 14401 (0.249 sec)
INFO:tensorflow:global_step/sec: 358.078
INFO:tensorflow:loss = 88214020000.0, step = 14501 (0.275 sec)
INFO:tensorflow:global_step/sec: 399.691
INFO:tensorflow:loss = 29235804000.0, step = 14601 (0.250 sec)
INFO:tensorflow:global_step/sec: 397.859
INFO:tensorflow:loss = 82348590000.0, step = 14701 (0.251 sec)
INFO:tensorflow:global_step/sec: 402.595
INFO:tensorflow:loss =

INFO:tensorflow:global_step/sec: 411.726
INFO:tensorflow:loss = 64568430000.0, step = 21801 (0.241 sec)
INFO:tensorflow:global_step/sec: 385.38
INFO:tensorflow:loss = 55225836000.0, step = 21901 (0.277 sec)
INFO:tensorflow:global_step/sec: 383.711
INFO:tensorflow:loss = 80297290000.0, step = 22001 (0.256 sec)
INFO:tensorflow:global_step/sec: 345.846
INFO:tensorflow:loss = 82651460000.0, step = 22101 (0.276 sec)
INFO:tensorflow:global_step/sec: 375.349
INFO:tensorflow:loss = 46054863000.0, step = 22201 (0.279 sec)
INFO:tensorflow:global_step/sec: 369.879
INFO:tensorflow:loss = 40531260000.0, step = 22301 (0.265 sec)
INFO:tensorflow:global_step/sec: 366.226
INFO:tensorflow:loss = 108961210000.0, step = 22401 (0.267 sec)
INFO:tensorflow:global_step/sec: 384.877
INFO:tensorflow:loss = 134784720000.0, step = 22501 (0.260 sec)
INFO:tensorflow:global_step/sec: 384.1
INFO:tensorflow:loss = 86715270000.0, step = 22601 (0.260 sec)
INFO:tensorflow:global_step/sec: 395.381
INFO:tensorflow:loss = 4

<tensorflow_estimator.python.estimator.canned.dnn.DNNRegressor at 0x1811ab93c50>

# Model Evalauation

In [19]:
pred_input_func = tf.estimator.inputs.pandas_input_fn(
      x=X_test,
      batch_size=10,
      num_epochs=1,
      shuffle=False)

In [20]:
results=model.predict(pred_input_func)

In [21]:
predictions = list(results)

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Graph was finalized.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
INFO:tensorflow:Restoring parameters from C:\Users\raprabhu\AppData\Local\Temp\tmpfd8yec0o\model.ckpt-25000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.


In [25]:
predictions

[{'predictions': array([244515.84], dtype=float32)},
 {'predictions': array([337564.22], dtype=float32)},
 {'predictions': array([209822.61], dtype=float32)},
 {'predictions': array([192088.72], dtype=float32)},
 {'predictions': array([301004.03], dtype=float32)},
 {'predictions': array([198079.25], dtype=float32)},
 {'predictions': array([223456.16], dtype=float32)},
 {'predictions': array([206814.52], dtype=float32)},
 {'predictions': array([227207.75], dtype=float32)},
 {'predictions': array([213698.61], dtype=float32)},
 {'predictions': array([205372.47], dtype=float32)},
 {'predictions': array([220907.38], dtype=float32)},
 {'predictions': array([193364.78], dtype=float32)},
 {'predictions': array([181072.67], dtype=float32)},
 {'predictions': array([263386.5], dtype=float32)},
 {'predictions': array([185005.64], dtype=float32)},
 {'predictions': array([198551.17], dtype=float32)},
 {'predictions': array([193652.45], dtype=float32)},
 {'predictions': array([186372.3], dtype=float3

In [22]:
final_preds = []
for pred in predictions:
    final_preds.append(pred['predictions'])

In [23]:
from sklearn.metrics import mean_squared_error


In [24]:
mean_squared_error(y_test,final_preds)**0.5

95791.14880209537

# Median house value of the house in California from the 1990 Census ~=100k