# Regression Exercise - Solutions

California Housing Data

This data set contains information about all the block groups in California from the 1990 Census. In this sample a block group on average includes 1425.5 individuals living in a geographically compact area. 

The task is to aproximate the median house value of each block from the values of the rest of the variables. 

 It has been obtained from the LIACC repository. The original page where the data set can be found is: http://www.liaad.up.pt/~ltorgo/Regression/DataSets.html.
 

The Features:
 
* housingMedianAge: continuous. 
* totalRooms: continuous. 
* totalBedrooms: continuous. 
* population: continuous. 
* households: continuous. 
* medianIncome: continuous. 
* medianHouseValue: continuous. 

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import statsmodels.api as sm
import seaborn as sns
sns.set()

In [None]:
housing = pd.read_csv('../data/cal_housing_clean.csv')

In [None]:
housing.head()

In [None]:
y_val = housing['medianHouseValue']

In [None]:
x_data = housing.drop('medianHouseValue', axis=1)

In [None]:
from sklearn.model_selection import train_test_split

In [None]:
X_train, X_test, y_train, y_test = train_test_split(x_data, y_val, test_size=0.33, random_state=101)

In [None]:
from sklearn.preprocessing import MinMaxScaler

In [None]:
scaler = MinMaxScaler()

In [None]:
scaler.fit(X_train)

In [None]:
X_train = pd.DataFrame(data=scaler.transform(X_train),
                      columns=X_train.columns,
                      index=X_train.index)

In [None]:
X_test = pd.DataFrame(data=scaler.transform(X_test),
                      columns=X_test.columns,
                      index=X_test.index)

In [None]:
housing.columns

In [None]:
import tensorflow as tf

In [None]:
age = tf.feature_column.numeric_column('housingMedianAge')
rooms = tf.feature_column.numeric_column('totalRooms')
bedrooms = tf.feature_column.numeric_column('totalBedrooms')
population = tf.feature_column.numeric_column('population')
households = tf.feature_column.numeric_column('households')
income = tf.feature_column.numeric_column('medianIncome')

In [None]:
feature_cols = [age, rooms, bedrooms, population, households, income]

In [None]:
input_func = tf.estimator.inputs.pandas_input_fn(x=X_train,
                                                y=y_train,
                                                batch_size=10,
                                                num_epochs=1000,
                                                shuffle=True)

In [None]:
model = tf.estimator.DNNRegressor(hidden_units=[20,12,12], feature_columns=feature_cols)

In [None]:
model.train(input_fn=input_func, steps=20000)

In [None]:
predict_input_func = tf.estimator.inputs.pandas_input_fn(x=X_test,
                                                        batch_size=10,
                                                        num_epochs=1,
                                                        shuffle=False)

In [None]:
pred_gen = model.predict(predict_input_func)

In [None]:
predictions = list(pred_gen)

In [None]:
predictions[:10]

In [None]:
final_preds = []
for pred in predictions:
    final_preds.append(pred['predictions'])

In [None]:
from sklearn.metrics import mean_squared_error

In [None]:
mean_squared_error(y_test, final_preds)**0.5