# TensorFlow Regression  

California Housing Data

This data set contains information about all the block groups in California from the 1990 Census. In this sample a block group on average includes 1425.5 individuals living in a geographically compact area. 

The task is to aproximate the median house value of each block from the values of the rest of the variables. 

 It has been obtained from the LIACC repository. The original page where the data set can be found is: http://www.liaad.up.pt/~ltorgo/Regression/DataSets.html.
 

The Features:
 
* housingMedianAge: continuous. 
* totalRooms: continuous. 
* totalBedrooms: continuous. 
* population: continuous. 
* households: continuous. 
* medianIncome: continuous. 
* medianHouseValue: continuous. 

## The Data

** Import the cal_housing_clean.csv file with pandas. Separate it into a training (70%) and testing set(30%).**

In [1]:
import pandas as pd

In [2]:
data = pd.read_csv('cal_housing_clean.csv')

In [3]:
data.head()

Unnamed: 0,housingMedianAge,totalRooms,totalBedrooms,population,households,medianIncome,medianHouseValue
0,41.0,880.0,129.0,322.0,126.0,8.3252,452600.0
1,21.0,7099.0,1106.0,2401.0,1138.0,8.3014,358500.0
2,52.0,1467.0,190.0,496.0,177.0,7.2574,352100.0
3,52.0,1274.0,235.0,558.0,219.0,5.6431,341300.0
4,52.0,1627.0,280.0,565.0,259.0,3.8462,342200.0


In [4]:
data.describe()

Unnamed: 0,housingMedianAge,totalRooms,totalBedrooms,population,households,medianIncome,medianHouseValue
count,20640.0,20640.0,20640.0,20640.0,20640.0,20640.0,20640.0
mean,28.639486,2635.763081,537.898014,1425.476744,499.53968,3.870671,206855.816909
std,12.585558,2181.615252,421.247906,1132.462122,382.329753,1.899822,115395.615874
min,1.0,2.0,1.0,3.0,1.0,0.4999,14999.0
25%,18.0,1447.75,295.0,787.0,280.0,2.5634,119600.0
50%,29.0,2127.0,435.0,1166.0,409.0,3.5348,179700.0
75%,37.0,3148.0,647.0,1725.0,605.0,4.74325,264725.0
max,52.0,39320.0,6445.0,35682.0,6082.0,15.0001,500001.0


In [5]:
from sklearn.model_selection import train_test_split

In [6]:
data.columns

Index(['housingMedianAge', 'totalRooms', 'totalBedrooms', 'population',
       'households', 'medianIncome', 'medianHouseValue'],
      dtype='object')

In [7]:
X_data = data.drop(['medianHouseValue'], axis=1)

In [8]:
X_data.head()

Unnamed: 0,housingMedianAge,totalRooms,totalBedrooms,population,households,medianIncome
0,41.0,880.0,129.0,322.0,126.0,8.3252
1,21.0,7099.0,1106.0,2401.0,1138.0,8.3014
2,52.0,1467.0,190.0,496.0,177.0,7.2574
3,52.0,1274.0,235.0,558.0,219.0,5.6431
4,52.0,1627.0,280.0,565.0,259.0,3.8462


In [9]:
y = data['medianHouseValue']

In [10]:
X_train, X_test, y_train, y_test = train_test_split(X_data, y, test_size=0.3, random_state=101)

### Scale the Feature Data



In [11]:
from sklearn.preprocessing import MinMaxScaler

In [12]:
scaler = MinMaxScaler()

In [13]:
scaler.fit(X_train)

MinMaxScaler(copy=True, feature_range=(0, 1))

In [14]:
X_train = pd.DataFrame(data=scaler.transform(X_train), columns=X_train.columns, index=X_train.index)

In [15]:
X_test = pd.DataFrame(data=scaler.transform(X_test), columns=X_test.columns, index=X_test.index)

### Create Feature Columns



In [16]:
data.columns

Index(['housingMedianAge', 'totalRooms', 'totalBedrooms', 'population',
       'households', 'medianIncome', 'medianHouseValue'],
      dtype='object')

In [17]:
import tensorflow as tf

  from ._conv import register_converters as _register_converters


In [18]:
age = tf.feature_column.numeric_column('housingMedianAge')
rooms = tf.feature_column.numeric_column('totalRooms')
bedrooms = tf.feature_column.numeric_column('totalBedrooms')
pop = tf.feature_column.numeric_column('population')
households = tf.feature_column.numeric_column('households')
income = tf.feature_column.numeric_column('medianIncome')

In [19]:
feat_cols = [age, rooms, bedrooms, pop, households, income]

** Create the input function for the estimator object. **

In [20]:
input_func = tf.estimator.inputs.pandas_input_fn(x=X_train, y=y_train, batch_size=10, num_epochs=1000, shuffle=True)

** Create the estimator model. **

In [21]:
model = tf.estimator.DNNRegressor(hidden_units=[6,10,10,6,6], feature_columns=feat_cols)

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': '/var/folders/2v/xzpgsprx4dgft47fp23gnl_r0000gn/T/tmpgplg3vf4', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': None, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x10edb39b0>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}


##### ** Train the model**

In [22]:
model.train(input_fn=input_func, steps = 10000)

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 1 into /var/folders/2v/xzpgsprx4dgft47fp23gnl_r0000gn/T/tmpgplg3vf4/model.ckpt.
INFO:tensorflow:loss = 545948270000.0, step = 1
INFO:tensorflow:global_step/sec: 409.157
INFO:tensorflow:loss = 222199450000.0, step = 101 (0.249 sec)
INFO:tensorflow:global_step/sec: 495.871
INFO:tensorflow:loss = 125040710000.0, step = 201 (0.204 sec)
INFO:tensorflow:global_step/sec: 464.009
INFO:tensorflow:loss = 95329040000.0, step = 301 (0.214 sec)
INFO:tensorflow:global_step/sec: 494.028
INFO:tensorflow:loss = 126492490000.0, step = 401 (0.202 sec)
INFO:tensorflow:global_step/sec: 497.383
INFO:tensorflow:loss = 41093320000.0, step = 501 (0.197 sec)
INFO:tensorflow:global_step/sec: 441.88
INFO:tensorflow:loss = 176599040000.0, step

INFO:tensorflow:global_step/sec: 379.458
INFO:tensorflow:loss = 47721406000.0, step = 7701 (0.264 sec)
INFO:tensorflow:global_step/sec: 381.58
INFO:tensorflow:loss = 92333425000.0, step = 7801 (0.265 sec)
INFO:tensorflow:global_step/sec: 401.154
INFO:tensorflow:loss = 75333190000.0, step = 7901 (0.245 sec)
INFO:tensorflow:global_step/sec: 450.483
INFO:tensorflow:loss = 55630643000.0, step = 8001 (0.223 sec)
INFO:tensorflow:global_step/sec: 354.051
INFO:tensorflow:loss = 91412775000.0, step = 8101 (0.285 sec)
INFO:tensorflow:global_step/sec: 432.882
INFO:tensorflow:loss = 119803340000.0, step = 8201 (0.229 sec)
INFO:tensorflow:global_step/sec: 440.468
INFO:tensorflow:loss = 108067500000.0, step = 8301 (0.229 sec)
INFO:tensorflow:global_step/sec: 424.946
INFO:tensorflow:loss = 59911350000.0, step = 8401 (0.235 sec)
INFO:tensorflow:global_step/sec: 404.604
INFO:tensorflow:loss = 71687550000.0, step = 8501 (0.243 sec)
INFO:tensorflow:global_step/sec: 382.605
INFO:tensorflow:loss = 52353100

<tensorflow.python.estimator.canned.dnn.DNNRegressor at 0x1a205dca20>

** Create a prediction input function  **

In [23]:
pred_input_func = tf.estimator.inputs.pandas_input_fn(x=X_test, batch_size=10, num_epochs=1, shuffle=False)

In [24]:
pred_gen = model.predict(pred_input_func)

In [25]:
predictions = list(pred_gen)

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /var/folders/2v/xzpgsprx4dgft47fp23gnl_r0000gn/T/tmpgplg3vf4/model.ckpt-10000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.


** Calculate the RMSE. **

In [26]:
final_preds = []
for pred in predictions:
    final_preds.append(pred['predictions'])

In [27]:
results = pd.DataFrame({'House Value':final_preds, 'House Index_No':X_test.index})
results

Unnamed: 0,House Index_No,House Value
0,16086,[283343.84]
1,8816,[512924.88]
2,7175,[174961.89]
3,16714,[204410.06]
4,14491,[436907.25]
5,11807,[172013.12]
6,19109,[199990.0]
7,6926,[195731.6]
8,11649,[265420.66]
9,11961,[309582.06]


In [28]:
from sklearn.metrics import mean_squared_error

In [29]:
mean_squared_error(y_test, final_preds)**0.5

81949.14521144531