<a href="https://colab.research.google.com/github/markstrefford/1.-Learning-Python-AI-by-Example/blob/master/Tensorflow_M21_BostonHousePricePrediction_v01.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Import Tensorflow libraries

In [1]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Dropout, Flatten, Dense


Import data handling libraries

In [2]:
import numpy as np
import pandas as pd

Import graphing libraries

In [4]:
import matplotlib.pyplot as plt
import seaborn as sns

Load house price data

In [6]:
(train_features, train_labels), (test_features, test_labels) = keras.datasets.boston_housing.load_data()

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/boston_housing.npz


Let's explore the shape of the data

We can see that:
1. There is a 80/20 split between training and test data
1. Each row has 13 features


In [8]:
train_features.shape, train_labels.shape, test_features.shape, test_labels.shape

((404, 13), (404,), (102, 13), (102,))

The features are:

 1. CRIM     per capita crime rate by town
 1. ZN       proportion of residential land zoned for lots over 25,000 sq.ft.
 1. INDUS    proportion of non-retail business acres per town
 1. CHAS     Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
 1. NOX      nitric oxides concentration (parts per 10 million)
 1. RM       average number of rooms per dwelling
 1. AGE      proportion of owner-occupied units built prior to 1940
 1. DIS      weighted distances to five Boston employment centres
 1. RAD      index of accessibility to radial highways
 1. TAX      full-value property-tax rate per \$10,000
 1. PTRATIO  pupil-teacher ratio by town
 1. B        1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
 1. LSTAT    % lower status of the population
 1. MEDV     Median value of owner-occupied homes in \$1000's




In [9]:
train_features[0]

(array([  1.23247,   0.     ,   8.14   ,   0.     ,   0.538  ,   6.142  ,
         91.7    ,   3.9769 ,   4.     , 307.     ,  21.     , 396.9    ,
         18.72   ]), 15.2)

We can validate this by looking for the maximum value in each column of the data

In [15]:
np.amax(train_features, axis=0)

array([ 88.9762, 100.    ,  27.74  ,   1.    ,   0.871 ,   8.725 ,
       100.    ,  10.7103,  24.    , 711.    ,  22.    , 396.9   ,
        37.97  ])

And also lets check the minimum, again we see a range of minimum values

In [16]:
np.amin(train_features, axis=0)

array([6.3200e-03, 0.0000e+00, 4.6000e-01, 0.0000e+00, 3.8500e-01,
       3.5610e+00, 2.9000e+00, 1.1296e+00, 1.0000e+00, 1.8800e+02,
       1.2600e+01, 3.2000e-01, 1.7300e+00])

To facilitate better training, let's normalise the data

In [17]:
train_mean = np.mean(train_features, axis=0)
train_std = np.std(train_features, axis=0)
train_features = (train_features - train_mean) / train_std

In [18]:
np.amax(train_features, axis=0)

array([9.23484718, 3.72899018, 2.44537425, 3.89358447, 2.67733525,
       3.46718635, 1.11048828, 3.43740568, 1.67588577, 1.83609694,
       1.60353052, 0.44807713, 3.48201936])

In [19]:
np.amin(train_features, axis=0)

array([-0.40510053, -0.48361547, -1.56469648, -0.25683275, -1.47126853,
       -3.81725032, -2.36904226, -1.28750316, -0.97156928, -1.31131055,
       -2.67375227, -3.77110135, -1.51966384])