# Regression of House prices in California with TensorFlow 

Code Created by Luis Enrique Acevedo Galicia

Date: 2019-10-04

Here, I present a simple and easy way to create a regression with TensorFlow. In this case the data based on file cal_housing_clean.csv. The targets are represented by the Median house value and the inputs are the house median age, total of rooms, total of bedrooms, population, households, median income.

For more information about the data: http://www.liaad.up.pt/~ltorgo/Regression/DataSets.html.

# The Libraries

In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error
import tensorflow as tf

# The data 

In [2]:
data = pd.read_csv('cal_housing_clean.csv')
data.head()

Unnamed: 0,housingMedianAge,totalRooms,totalBedrooms,population,households,medianIncome,medianHouseValue
0,41.0,880.0,129.0,322.0,126.0,8.3252,452600.0
1,21.0,7099.0,1106.0,2401.0,1138.0,8.3014,358500.0
2,52.0,1467.0,190.0,496.0,177.0,7.2574,352100.0
3,52.0,1274.0,235.0,558.0,219.0,5.6431,341300.0
4,52.0,1627.0,280.0,565.0,259.0,3.8462,342200.0


Verifying that data is completed

In [3]:
data[pd.isnull(data['housingMedianAge'])]

Unnamed: 0,housingMedianAge,totalRooms,totalBedrooms,population,households,medianIncome,medianHouseValue


learning about this data set

In [4]:
data.describe()

Unnamed: 0,housingMedianAge,totalRooms,totalBedrooms,population,households,medianIncome,medianHouseValue
count,20640.0,20640.0,20640.0,20640.0,20640.0,20640.0,20640.0
mean,28.639486,2635.763081,537.898014,1425.476744,499.53968,3.870671,206855.816909
std,12.585558,2181.615252,421.247906,1132.462122,382.329753,1.899822,115395.615874
min,1.0,2.0,1.0,3.0,1.0,0.4999,14999.0
25%,18.0,1447.75,295.0,787.0,280.0,2.5634,119600.0
50%,29.0,2127.0,435.0,1166.0,409.0,3.5348,179700.0
75%,37.0,3148.0,647.0,1725.0,605.0,4.74325,264725.0
max,52.0,39320.0,6445.0,35682.0,6082.0,15.0001,500001.0


In [5]:
#Inputs

Inputs_data = data.drop(['medianHouseValue'], axis=1)

#Targets

Targets_data = data['medianHouseValue']

#create train, and test data

Inputs_train, Inputs_test, Targets_train, Targets_test = train_test_split(Inputs_data,Targets_data,test_size=0.3,random_state=101)

## Preprocessing data

In [6]:
scale_data = MinMaxScaler()
scale_data.fit(Inputs_train)

MinMaxScaler(copy=True, feature_range=(0, 1))

In [7]:
Inputs_train_scl = pd.DataFrame(data=scale_data.transform(Inputs_train), columns= Inputs_train.columns, index=Inputs_train.index)
Inputs_test_scl = pd.DataFrame(data=scale_data.transform(Inputs_test), columns= Inputs_test.columns, index=Inputs_test.index)

## Features

In [8]:
age = tf.feature_column.numeric_column('housingMedianAge')
rooms = tf.feature_column.numeric_column('totalRooms')
bedrooms = tf.feature_column.numeric_column('totalBedrooms')
pop = tf.feature_column.numeric_column('population')
households = tf.feature_column.numeric_column('households')
income = tf.feature_column.numeric_column('medianIncome')
feat_cols = [ age,rooms,bedrooms,pop,households,income]

# The model (DNN)

In [9]:
#The activation function

Input_function = tf.estimator.inputs.pandas_input_fn(x=Inputs_train_scl, y=Targets_train, batch_size= 10, num_epochs=1000, shuffle=True)

#The model

DNN_model = tf.estimator.DNNRegressor(hidden_units=[15,15,15],feature_columns=feat_cols)


INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': '/tmp/tmpoif72s8w', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fc76bb631d0>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}


In [10]:
DNN_model.train(input_fn=Input_function, steps=35000)

Instructions for updating:
To construct input pipelines, use the `tf.data` module.
Instructions for updating:
To construct input pipelines, use the `tf.data` module.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
Instructions for updating:
To construct input pipelines, use the `tf.data` module.
INFO:tensorflow:Saving checkpoints for 0 into /tmp/tmpoif72s8w/model.ckpt.
INFO:tensorflow:loss = 501110870000.0, step = 1
INFO:tensorflow:global_step/sec: 260.083
INFO:tensorflow:loss = 483307450000.0, step = 101 (0.388 sec)
INFO:tensorflow:global_step/sec: 377.323
INFO:tensorflow:loss = 289554300000.0, step = 201 (0.265 sec)
INFO:tensorflow:global_step/sec: 327.723
INFO:tensorflow:loss = 120488510000.0, step = 301 (0.305 sec)
INFO:tensorflow:global_step/sec: 364.996
INFO:tensorflow:loss = 277802780000.0, st

INFO:tensorflow:global_step/sec: 399.427
INFO:tensorflow:loss = 121745440000.0, step = 6701 (0.250 sec)
INFO:tensorflow:global_step/sec: 437.801
INFO:tensorflow:loss = 59010920000.0, step = 6801 (0.229 sec)
INFO:tensorflow:global_step/sec: 395.173
INFO:tensorflow:loss = 214187400000.0, step = 6901 (0.252 sec)
INFO:tensorflow:global_step/sec: 387.372
INFO:tensorflow:loss = 75020830000.0, step = 7001 (0.257 sec)
INFO:tensorflow:global_step/sec: 357.464
INFO:tensorflow:loss = 58744807000.0, step = 7101 (0.282 sec)
INFO:tensorflow:global_step/sec: 413.941
INFO:tensorflow:loss = 67464364000.0, step = 7201 (0.241 sec)
INFO:tensorflow:global_step/sec: 381.762
INFO:tensorflow:loss = 87581770000.0, step = 7301 (0.261 sec)
INFO:tensorflow:global_step/sec: 408.896
INFO:tensorflow:loss = 121848726000.0, step = 7401 (0.246 sec)
INFO:tensorflow:global_step/sec: 373.239
INFO:tensorflow:loss = 173411070000.0, step = 7501 (0.267 sec)
INFO:tensorflow:global_step/sec: 382.179
INFO:tensorflow:loss = 39883

INFO:tensorflow:global_step/sec: 335.238
INFO:tensorflow:loss = 108751470000.0, step = 14601 (0.298 sec)
INFO:tensorflow:global_step/sec: 352.959
INFO:tensorflow:loss = 57662130000.0, step = 14701 (0.285 sec)
INFO:tensorflow:global_step/sec: 325.896
INFO:tensorflow:loss = 42986850000.0, step = 14801 (0.307 sec)
INFO:tensorflow:global_step/sec: 400.03
INFO:tensorflow:loss = 90999110000.0, step = 14901 (0.249 sec)
INFO:tensorflow:global_step/sec: 323.78
INFO:tensorflow:loss = 43122434000.0, step = 15001 (0.309 sec)
INFO:tensorflow:global_step/sec: 339.332
INFO:tensorflow:loss = 78022164000.0, step = 15101 (0.295 sec)
INFO:tensorflow:global_step/sec: 352.65
INFO:tensorflow:loss = 82482586000.0, step = 15201 (0.282 sec)
INFO:tensorflow:global_step/sec: 388.293
INFO:tensorflow:loss = 35799618000.0, step = 15301 (0.258 sec)
INFO:tensorflow:global_step/sec: 389.155
INFO:tensorflow:loss = 192831460000.0, step = 15401 (0.257 sec)
INFO:tensorflow:global_step/sec: 395.523
INFO:tensorflow:loss = 9

INFO:tensorflow:global_step/sec: 376.418
INFO:tensorflow:loss = 71643685000.0, step = 22501 (0.265 sec)
INFO:tensorflow:global_step/sec: 364.211
INFO:tensorflow:loss = 32805327000.0, step = 22601 (0.274 sec)
INFO:tensorflow:global_step/sec: 390.852
INFO:tensorflow:loss = 45775740000.0, step = 22701 (0.256 sec)
INFO:tensorflow:global_step/sec: 393.329
INFO:tensorflow:loss = 203409380000.0, step = 22801 (0.254 sec)
INFO:tensorflow:global_step/sec: 325.827
INFO:tensorflow:loss = 70482610000.0, step = 22901 (0.308 sec)
INFO:tensorflow:global_step/sec: 391.133
INFO:tensorflow:loss = 36596140000.0, step = 23001 (0.255 sec)
INFO:tensorflow:global_step/sec: 355.514
INFO:tensorflow:loss = 132102280000.0, step = 23101 (0.281 sec)
INFO:tensorflow:global_step/sec: 404.944
INFO:tensorflow:loss = 78805450000.0, step = 23201 (0.248 sec)
INFO:tensorflow:global_step/sec: 344.742
INFO:tensorflow:loss = 120925250000.0, step = 23301 (0.290 sec)
INFO:tensorflow:global_step/sec: 369.35
INFO:tensorflow:loss 

INFO:tensorflow:global_step/sec: 369.481
INFO:tensorflow:loss = 46601310000.0, step = 30401 (0.271 sec)
INFO:tensorflow:global_step/sec: 297.138
INFO:tensorflow:loss = 28282638000.0, step = 30501 (0.338 sec)
INFO:tensorflow:global_step/sec: 259.24
INFO:tensorflow:loss = 106464660000.0, step = 30601 (0.385 sec)
INFO:tensorflow:global_step/sec: 273.042
INFO:tensorflow:loss = 42290885000.0, step = 30701 (0.364 sec)
INFO:tensorflow:global_step/sec: 337.155
INFO:tensorflow:loss = 63800783000.0, step = 30801 (0.297 sec)
INFO:tensorflow:global_step/sec: 356.045
INFO:tensorflow:loss = 95479240000.0, step = 30901 (0.281 sec)
INFO:tensorflow:global_step/sec: 422.315
INFO:tensorflow:loss = 51569037000.0, step = 31001 (0.235 sec)
INFO:tensorflow:global_step/sec: 351.468
INFO:tensorflow:loss = 57755296000.0, step = 31101 (0.288 sec)
INFO:tensorflow:global_step/sec: 375.86
INFO:tensorflow:loss = 39094256000.0, step = 31201 (0.264 sec)
INFO:tensorflow:global_step/sec: 432.441
INFO:tensorflow:loss = 3

<tensorflow.python.estimator.canned.dnn.DNNRegressor at 0x7fc76bbf9588>

## Prediction

In [11]:
Input_function_prediction = tf.estimator.inputs.pandas_input_fn(x=Inputs_test_scl, batch_size=10, num_epochs=1,shuffle=False)
Prediction = DNN_model.predict(Input_function_prediction)
Predictions = list(Prediction)

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /tmp/tmpoif72s8w/model.ckpt-35000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.


In [12]:
#Convert predictions list to array
Array_pred = []
for pred in Predictions:
    Array_pred.append(pred['predictions'])


In [15]:
#Compute the Error
mean_squared_error(Targets_test,Array_pred)**.5

82581.78489312639