# Regression Project

The dataset contains information about the house prices in California for different house blocks.

The task is to aproximate the median house value of each block from the values of the following features: 
 
* housingMedianAge: continuous. 
* totalRooms: continuous. 
* totalBedrooms: continuous. 
* population: continuous. 
* households: continuous. 
* medianIncome: continuous. 
* medianHouseValue: continuous. 


In [1]:
import pandas as pd

In [2]:
housing = pd.read_csv('cal_housing_clean.csv')

In [3]:
housing.head()

Unnamed: 0,housingMedianAge,totalRooms,totalBedrooms,population,households,medianIncome,medianHouseValue
0,41.0,880.0,129.0,322.0,126.0,8.3252,452600.0
1,21.0,7099.0,1106.0,2401.0,1138.0,8.3014,358500.0
2,52.0,1467.0,190.0,496.0,177.0,7.2574,352100.0
3,52.0,1274.0,235.0,558.0,219.0,5.6431,341300.0
4,52.0,1627.0,280.0,565.0,259.0,3.8462,342200.0


In [4]:
housing.describe().transpose()

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
housingMedianAge,20640.0,28.639486,12.585558,1.0,18.0,29.0,37.0,52.0
totalRooms,20640.0,2635.763081,2181.615252,2.0,1447.75,2127.0,3148.0,39320.0
totalBedrooms,20640.0,537.898014,421.247906,1.0,295.0,435.0,647.0,6445.0
population,20640.0,1425.476744,1132.462122,3.0,787.0,1166.0,1725.0,35682.0
households,20640.0,499.53968,382.329753,1.0,280.0,409.0,605.0,6082.0
medianIncome,20640.0,3.870671,1.899822,0.4999,2.5634,3.5348,4.74325,15.0001
medianHouseValue,20640.0,206855.816909,115395.615874,14999.0,119600.0,179700.0,264725.0,500001.0


In [5]:
x_data = housing.drop(['medianHouseValue'],axis=1)

In [6]:
y_val = housing['medianHouseValue']

In [7]:
from sklearn.model_selection import train_test_split

In [8]:
X_train, X_test, y_train, y_test = train_test_split(x_data,y_val,test_size=0.3,random_state=101)

### Scale the Feature Data using MinMaxScaler

In [9]:
from sklearn.preprocessing import MinMaxScaler

In [10]:
scaler = MinMaxScaler()

In [11]:
scaler.fit(X_train)

MinMaxScaler(copy=True, feature_range=(0, 1))

In [12]:
X_train = pd.DataFrame(data=scaler.transform(X_train),columns = X_train.columns,index=X_train.index)

In [13]:
X_test = pd.DataFrame(data=scaler.transform(X_test),columns = X_test.columns,index=X_test.index)

In [14]:
housing.columns

Index(['housingMedianAge', 'totalRooms', 'totalBedrooms', 'population',
       'households', 'medianIncome', 'medianHouseValue'],
      dtype='object')

In [15]:
import tensorflow as tf

  from ._conv import register_converters as _register_converters


** Create the feature columns **

In [16]:
age = tf.feature_column.numeric_column('housingMedianAge')
rooms = tf.feature_column.numeric_column('totalRooms')
bedrooms = tf.feature_column.numeric_column('totalBedrooms')
pop = tf.feature_column.numeric_column('population')
households = tf.feature_column.numeric_column('households')
income = tf.feature_column.numeric_column('medianIncome')

In [17]:
feat_cols = [ age,rooms,bedrooms,pop,households,income]

** Create the input function for the estimator object **

In [18]:
input_func = tf.estimator.inputs.pandas_input_fn(x=X_train,y=y_train ,batch_size=10,num_epochs=1000,
                                            shuffle=True)

** Create the estimator model using DNNRegressor. **

In [19]:
model = tf.estimator.DNNRegressor(hidden_units=[6,6,6],feature_columns=feat_cols)

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': '/var/folders/sq/pvvqtq413t55v2xzwlp73j9m0000gn/T/tmpe4zydix9', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': None, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x10f057c50>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}


** Train the model **

In [20]:
model.train(input_fn=input_func,steps=25000)

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 0 into /var/folders/sq/pvvqtq413t55v2xzwlp73j9m0000gn/T/tmpe4zydix9/model.ckpt.
INFO:tensorflow:loss = 558713860000.0, step = 1
INFO:tensorflow:global_step/sec: 714.659
INFO:tensorflow:loss = 322871100000.0, step = 101 (0.141 sec)
INFO:tensorflow:global_step/sec: 983.408
INFO:tensorflow:loss = 579079050000.0, step = 201 (0.101 sec)
INFO:tensorflow:global_step/sec: 1003.93
INFO:tensorflow:loss = 446302800000.0, step = 301 (0.101 sec)
INFO:tensorflow:global_step/sec: 961.555
INFO:tensorflow:loss = 449926330000.0, step = 401 (0.104 sec)
INFO:tensorflow:global_step/sec: 1008.21
INFO:tensorflow:loss = 243319420000.0, step = 501 (0.100 sec)
INFO:tensorflow:global_step/sec: 947.266
INFO:tensorflow:loss = 337564270000.0, s

INFO:tensorflow:loss = 70582125000.0, step = 7601 (0.102 sec)
INFO:tensorflow:global_step/sec: 981.941
INFO:tensorflow:loss = 79129190000.0, step = 7701 (0.102 sec)
INFO:tensorflow:global_step/sec: 1007.67
INFO:tensorflow:loss = 62433030000.0, step = 7801 (0.099 sec)
INFO:tensorflow:global_step/sec: 990.337
INFO:tensorflow:loss = 79237480000.0, step = 7901 (0.102 sec)
INFO:tensorflow:global_step/sec: 1005.04
INFO:tensorflow:loss = 108901210000.0, step = 8001 (0.098 sec)
INFO:tensorflow:global_step/sec: 1001.68
INFO:tensorflow:loss = 134534780000.0, step = 8101 (0.100 sec)
INFO:tensorflow:global_step/sec: 966.071
INFO:tensorflow:loss = 118951990000.0, step = 8201 (0.104 sec)
INFO:tensorflow:global_step/sec: 999.11
INFO:tensorflow:loss = 71741210000.0, step = 8301 (0.100 sec)
INFO:tensorflow:global_step/sec: 993.671
INFO:tensorflow:loss = 100524384000.0, step = 8401 (0.101 sec)
INFO:tensorflow:global_step/sec: 970.317
INFO:tensorflow:loss = 112495930000.0, step = 8501 (0.103 sec)
INFO:te

INFO:tensorflow:loss = 48052187000.0, step = 15501 (0.100 sec)
INFO:tensorflow:global_step/sec: 979.719
INFO:tensorflow:loss = 86720120000.0, step = 15601 (0.103 sec)
INFO:tensorflow:global_step/sec: 994.818
INFO:tensorflow:loss = 67667407000.0, step = 15701 (0.099 sec)
INFO:tensorflow:global_step/sec: 1016.04
INFO:tensorflow:loss = 171242600000.0, step = 15801 (0.098 sec)
INFO:tensorflow:global_step/sec: 1030.43
INFO:tensorflow:loss = 70470476000.0, step = 15901 (0.099 sec)
INFO:tensorflow:global_step/sec: 999.553
INFO:tensorflow:loss = 45846040000.0, step = 16001 (0.099 sec)
INFO:tensorflow:global_step/sec: 983.399
INFO:tensorflow:loss = 98197594000.0, step = 16101 (0.102 sec)
INFO:tensorflow:global_step/sec: 974.004
INFO:tensorflow:loss = 222634250000.0, step = 16201 (0.103 sec)
INFO:tensorflow:global_step/sec: 964.35
INFO:tensorflow:loss = 59362900000.0, step = 16301 (0.104 sec)
INFO:tensorflow:global_step/sec: 939.645
INFO:tensorflow:loss = 201313850000.0, step = 16401 (0.108 sec)

INFO:tensorflow:global_step/sec: 1006.55
INFO:tensorflow:loss = 32918098000.0, step = 23401 (0.099 sec)
INFO:tensorflow:global_step/sec: 986.962
INFO:tensorflow:loss = 54711587000.0, step = 23501 (0.101 sec)
INFO:tensorflow:global_step/sec: 1000.33
INFO:tensorflow:loss = 46817604000.0, step = 23601 (0.100 sec)
INFO:tensorflow:global_step/sec: 989.972
INFO:tensorflow:loss = 308327840000.0, step = 23701 (0.099 sec)
INFO:tensorflow:global_step/sec: 1016.14
INFO:tensorflow:loss = 52262384000.0, step = 23801 (0.099 sec)
INFO:tensorflow:global_step/sec: 981.335
INFO:tensorflow:loss = 61411017000.0, step = 23901 (0.103 sec)
INFO:tensorflow:global_step/sec: 1004.41
INFO:tensorflow:loss = 186028150000.0, step = 24001 (0.099 sec)
INFO:tensorflow:global_step/sec: 1004.89
INFO:tensorflow:loss = 127054496000.0, step = 24101 (0.100 sec)
INFO:tensorflow:global_step/sec: 986.289
INFO:tensorflow:loss = 122121870000.0, step = 24201 (0.099 sec)
INFO:tensorflow:global_step/sec: 1018.04
INFO:tensorflow:los

<tensorflow.python.estimator.canned.dnn.DNNRegressor at 0x1a20b8c240>

** Create a predict input function using Pandas **

In [21]:
predict_input_func = tf.estimator.inputs.pandas_input_fn(
      x=X_test,
      batch_size=10,
      num_epochs=1,
      shuffle=False)

In [22]:
pred_gen = model.predict(predict_input_func)

In [23]:
predictions = list(pred_gen)

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /var/folders/sq/pvvqtq413t55v2xzwlp73j9m0000gn/T/tmpe4zydix9/model.ckpt-25000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.


** Calculate the RMSE **

In [24]:
final_preds = []
for pred in predictions:
    final_preds.append(pred['predictions'])

In [25]:
from sklearn.metrics import mean_squared_error

In [26]:
mean_squared_error(y_test,final_preds)**0.5

100965.91946018438