## To Find Median Housevalue using Regression Method

California Housing Data

This data set contains information about all the block groups in California from the 1990 Census. In this sample a block group on average includes 1425.5 individuals living in a geographically compact area. 

The task is to aproximate the median house value of each block from the values of the rest of the variables. 

It has been obtained from the LIACC repository. The original page where the data set can be found is: http://www.liaad.up.pt/~ltorgo/Regression/DataSets.html.


In [1]:
import tensorflow as tf
import pandas as pd
import numpy as np

In [2]:
hd=pd.read_csv('cal_housing_clean.csv')

In [3]:
hd.head()

Unnamed: 0,housingMedianAge,totalRooms,totalBedrooms,population,households,medianIncome,medianHouseValue
0,41.0,880.0,129.0,322.0,126.0,8.3252,452600.0
1,21.0,7099.0,1106.0,2401.0,1138.0,8.3014,358500.0
2,52.0,1467.0,190.0,496.0,177.0,7.2574,352100.0
3,52.0,1274.0,235.0,558.0,219.0,5.6431,341300.0
4,52.0,1627.0,280.0,565.0,259.0,3.8462,342200.0


In [4]:
hd['housingMedianAge'].hist(bins=20)

<matplotlib.axes._subplots.AxesSubplot at 0x16242f29a90>

# Normalise the data

In [5]:
colmns_to_norm=['housingMedianAge', 'totalRooms', 'totalBedrooms', 'population', 'households', 'medianIncome']

In [6]:
hd[colmns_to_norm].head()

Unnamed: 0,housingMedianAge,totalRooms,totalBedrooms,population,households,medianIncome
0,41.0,880.0,129.0,322.0,126.0,8.3252
1,21.0,7099.0,1106.0,2401.0,1138.0,8.3014
2,52.0,1467.0,190.0,496.0,177.0,7.2574
3,52.0,1274.0,235.0,558.0,219.0,5.6431
4,52.0,1627.0,280.0,565.0,259.0,3.8462


In [7]:
hd[colmns_to_norm]=hd[colmns_to_norm].apply(lambda x: (x-x.min())/(x.max()-x.min()))

In [8]:
x_data=hd.drop('medianHouseValue', axis=1)

In [9]:
x_data.head()

Unnamed: 0,housingMedianAge,totalRooms,totalBedrooms,population,households,medianIncome
0,0.784314,0.022331,0.019863,0.008941,0.020556,0.539668
1,0.392157,0.180503,0.171477,0.06721,0.186976,0.538027
2,1.0,0.03726,0.02933,0.013818,0.028943,0.466028
3,1.0,0.032352,0.036313,0.015555,0.035849,0.354699
4,1.0,0.04133,0.043296,0.015752,0.042427,0.230776


In [10]:
y_val=hd['medianHouseValue']

In [11]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(x_data,y_val,test_size=0.3,random_state=101)

In [12]:
np.median(y_val)

179700.0

# Assign the Feature columns

In [13]:
hma=tf.feature_column.numeric_column('housingMedianAge')
tr=tf.feature_column.numeric_column('totalRooms')
tbdr=tf.feature_column.numeric_column('totalBedrooms')
popln=tf.feature_column.numeric_column('population')
hholds=tf.feature_column.numeric_column('households')
mi=tf.feature_column.numeric_column('medianIncome')


In [14]:
feat_cols=[hma,tr,tbdr,popln,hholds,mi]


In [15]:
## Built InPut function

In [16]:
input_func=tf.estimator.inputs.pandas_input_fn(X_train,y_train,batch_size=10,num_epochs=1000,shuffle=True)


In [17]:
model = tf.estimator.DNNRegressor(hidden_units=[6,6,6],feature_columns=feat_cols)


INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': 'C:\\Users\\raprabhu\\AppData\\Local\\Temp\\tmpy8gl4ffv', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x000001624B460588>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}


In [18]:
model.train(input_fn=input_func,steps=25000)

Instructions for updating:
Colocations handled automatically by placer.
Instructions for updating:
To construct input pipelines, use the `tf.data` module.
Instructions for updating:
To construct input pipelines, use the `tf.data` module.
INFO:tensorflow:Calling model_fn.
Instructions for updating:
Use tf.cast instead.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
Instructions for updating:
To construct input pipelines, use the `tf.data` module.
INFO:tensorflow:Saving checkpoints for 0 into C:\Users\raprabhu\AppData\Local\Temp\tmpy8gl4ffv\model.ckpt.
INFO:tensorflow:loss = 476129260000.0, step = 1
INFO:tensorflow:global_step/sec: 249.433
INFO:tensorflow:loss = 430207400000.0, step = 101 (0.406 sec)
INFO:tensorflow:global_step/sec: 333.802
INFO:tensorflow:loss = 386817030000.0, step = 201 (0.295 sec)
INFO:tensorflow:global_step/sec: 

INFO:tensorflow:global_step/sec: 376.339
INFO:tensorflow:loss = 115922650000.0, step = 6001 (0.266 sec)
INFO:tensorflow:global_step/sec: 334.731
INFO:tensorflow:loss = 178327390000.0, step = 6101 (0.301 sec)
INFO:tensorflow:global_step/sec: 359.1
INFO:tensorflow:loss = 134497080000.0, step = 6201 (0.277 sec)
INFO:tensorflow:global_step/sec: 403.295
INFO:tensorflow:loss = 170343460000.0, step = 6301 (0.265 sec)
INFO:tensorflow:global_step/sec: 391.239
INFO:tensorflow:loss = 120395825000.0, step = 6401 (0.239 sec)
INFO:tensorflow:global_step/sec: 362.084
INFO:tensorflow:loss = 142894960000.0, step = 6501 (0.276 sec)
INFO:tensorflow:global_step/sec: 422.45
INFO:tensorflow:loss = 63773900000.0, step = 6601 (0.252 sec)
INFO:tensorflow:global_step/sec: 377.792
INFO:tensorflow:loss = 89161940000.0, step = 6701 (0.254 sec)
INFO:tensorflow:global_step/sec: 429.346
INFO:tensorflow:loss = 166271830000.0, step = 6801 (0.244 sec)
INFO:tensorflow:global_step/sec: 398.717
INFO:tensorflow:loss = 64221

INFO:tensorflow:global_step/sec: 393.686
INFO:tensorflow:loss = 113376264000.0, step = 13901 (0.271 sec)
INFO:tensorflow:global_step/sec: 386.751
INFO:tensorflow:loss = 117818510000.0, step = 14001 (0.257 sec)
INFO:tensorflow:global_step/sec: 370.46
INFO:tensorflow:loss = 132329865000.0, step = 14101 (0.254 sec)
INFO:tensorflow:global_step/sec: 313.284
INFO:tensorflow:loss = 82097550000.0, step = 14201 (0.338 sec)
INFO:tensorflow:global_step/sec: 335.727
INFO:tensorflow:loss = 51399180000.0, step = 14301 (0.279 sec)
INFO:tensorflow:global_step/sec: 385.384
INFO:tensorflow:loss = 94380270000.0, step = 14401 (0.259 sec)
INFO:tensorflow:global_step/sec: 412.387
INFO:tensorflow:loss = 73296400000.0, step = 14501 (0.258 sec)
INFO:tensorflow:global_step/sec: 374.987
INFO:tensorflow:loss = 119344820000.0, step = 14601 (0.267 sec)
INFO:tensorflow:global_step/sec: 404.922
INFO:tensorflow:loss = 110933830000.0, step = 14701 (0.268 sec)
INFO:tensorflow:global_step/sec: 352.27
INFO:tensorflow:loss

INFO:tensorflow:global_step/sec: 376.445
INFO:tensorflow:loss = 222605250000.0, step = 21801 (0.250 sec)
INFO:tensorflow:global_step/sec: 398.767
INFO:tensorflow:loss = 75303180000.0, step = 21901 (0.251 sec)
INFO:tensorflow:global_step/sec: 406.127
INFO:tensorflow:loss = 160643650000.0, step = 22001 (0.263 sec)
INFO:tensorflow:global_step/sec: 398.241
INFO:tensorflow:loss = 31958082000.0, step = 22101 (0.250 sec)
INFO:tensorflow:global_step/sec: 393.091
INFO:tensorflow:loss = 93599220000.0, step = 22201 (0.254 sec)
INFO:tensorflow:global_step/sec: 395.419
INFO:tensorflow:loss = 128828670000.0, step = 22301 (0.265 sec)
INFO:tensorflow:global_step/sec: 337.928
INFO:tensorflow:loss = 188706010000.0, step = 22401 (0.269 sec)
INFO:tensorflow:global_step/sec: 379.83
INFO:tensorflow:loss = 181994550000.0, step = 22501 (0.263 sec)
INFO:tensorflow:global_step/sec: 380.44
INFO:tensorflow:loss = 137349530000.0, step = 22601 (0.263 sec)
INFO:tensorflow:global_step/sec: 385.222
INFO:tensorflow:los

<tensorflow_estimator.python.estimator.canned.dnn.DNNRegressor at 0x1624b460198>

# Model Evalauation

In [19]:
pred_input_func = tf.estimator.inputs.pandas_input_fn(
      x=X_test,
      batch_size=10,
      num_epochs=1,
      shuffle=False)

In [20]:
results=model.predict(pred_input_func)

In [21]:
predictions = list(results)

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Graph was finalized.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
INFO:tensorflow:Restoring parameters from C:\Users\raprabhu\AppData\Local\Temp\tmpy8gl4ffv\model.ckpt-25000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.


In [22]:
predictions

[{'predictions': array([233443.67], dtype=float32)},
 {'predictions': array([293636.56], dtype=float32)},
 {'predictions': array([217560.64], dtype=float32)},
 {'predictions': array([185508.08], dtype=float32)},
 {'predictions': array([265491.6], dtype=float32)},
 {'predictions': array([201729.33], dtype=float32)},
 {'predictions': array([228409.11], dtype=float32)},
 {'predictions': array([207482.31], dtype=float32)},
 {'predictions': array([215216.83], dtype=float32)},
 {'predictions': array([185370.], dtype=float32)},
 {'predictions': array([207139.14], dtype=float32)},
 {'predictions': array([226489.25], dtype=float32)},
 {'predictions': array([194422.94], dtype=float32)},
 {'predictions': array([179554.83], dtype=float32)},
 {'predictions': array([259554.39], dtype=float32)},
 {'predictions': array([177357.97], dtype=float32)},
 {'predictions': array([203315.58], dtype=float32)},
 {'predictions': array([187924.16], dtype=float32)},
 {'predictions': array([182379.33], dtype=float32

In [23]:
final_preds = []
for pred in predictions:
    final_preds.append(pred['predictions'][0])

In [24]:
from sklearn.metrics import mean_squared_error

In [25]:
mean_squared_error(y_test,final_preds)**0.5

101618.6954195526

# Median house value of the house in California on basis of 1990 Census ~=100k