# Deep Neural Networks second example - (Tensorflow)
## Chapter 6
### Predictive Analytics for the Modern Enterprise

The notebook has been tested using the following pre-requisite:

* Python V3.9.13 - https://www.python.org/
* Tensorflow V2.11.0 
* Keras V2.11.0
* Jupyter - V6.4.12 - https://jupyter.org/
* Desktop computer - macOS Ventura V13.1

In [1]:
import numpy as np #Pre-processing                 
import pandas as pd #Pre-processing
import seaborn as sns #Visualization
import matplotlib.pyplot as plt #Visualization

import tensorflow as tflow #Predictive Analytics
from tensorflow import keras #Modeling and Predicting
from tensorflow.keras import layers #Model Building

from pandas import options, read_csv #Data import

2024-04-26 07:51:18.494270: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


Dataset sourced from: https://www.kaggle.com/datasets/quantbruce/real-estate-price-prediction

You can download a copy of the data at: https://github.com/paforme/predictiveanalytics/blob/main/Chapter6/Datasets/abalone.data

In [2]:
local_ds = './Datasets/Real estate.csv' #Replace this with where you uncompressed the dataset
local_ds = read_csv(local_ds)
local_ds

Unnamed: 0,No,X1 transaction date,X2 house age,X3 distance to the nearest MRT station,X4 number of convenience stores,X5 latitude,X6 longitude,Y house price of unit area
0,1,2012.917,32.0,84.87882,10,24.98298,121.54024,37.9
1,2,2012.917,19.5,306.59470,9,24.98034,121.53951,42.2
2,3,2013.583,13.3,561.98450,5,24.98746,121.54391,47.3
3,4,2013.500,13.3,561.98450,5,24.98746,121.54391,54.8
4,5,2012.833,5.0,390.56840,5,24.97937,121.54245,43.1
...,...,...,...,...,...,...,...,...
409,410,2013.000,13.7,4082.01500,0,24.94155,121.50381,15.4
410,411,2012.667,5.6,90.45606,9,24.97433,121.54310,50.0
411,412,2013.250,18.8,390.96960,7,24.97923,121.53986,40.6
412,413,2013.000,8.1,104.81010,5,24.96674,121.54067,52.5


In [3]:
k_dataset = local_ds.copy()
k_dataset.pop('No')
k_dataset

Unnamed: 0,X1 transaction date,X2 house age,X3 distance to the nearest MRT station,X4 number of convenience stores,X5 latitude,X6 longitude,Y house price of unit area
0,2012.917,32.0,84.87882,10,24.98298,121.54024,37.9
1,2012.917,19.5,306.59470,9,24.98034,121.53951,42.2
2,2013.583,13.3,561.98450,5,24.98746,121.54391,47.3
3,2013.500,13.3,561.98450,5,24.98746,121.54391,54.8
4,2012.833,5.0,390.56840,5,24.97937,121.54245,43.1
...,...,...,...,...,...,...,...
409,2013.000,13.7,4082.01500,0,24.94155,121.50381,15.4
410,2012.667,5.6,90.45606,9,24.97433,121.54310,50.0
411,2013.250,18.8,390.96960,7,24.97923,121.53986,40.6
412,2013.000,8.1,104.81010,5,24.96674,121.54067,52.5


In [4]:
k_training_ds = k_dataset.sample(frac=0.8, random_state=0) #Create training dataset as 80% of whole dataset
k_testing_ds = k_dataset.drop(k_training_ds.index) #Create test dataset by dropping training dataset indexes from whole dataset

In [5]:
k_training_ds.describe()

Unnamed: 0,X1 transaction date,X2 house age,X3 distance to the nearest MRT station,X4 number of convenience stores,X5 latitude,X6 longitude,Y house price of unit area
count,331.0,331.0,331.0,331.0,331.0,331.0,331.0
mean,2013.150571,17.652266,1154.274016,3.996979,24.968588,121.532759,37.361329
std,0.278091,11.305017,1319.156009,2.964945,0.012823,0.015975,13.763673
min,2012.667,0.0,23.38284,0.0,24.93207,121.47353,7.6
25%,2012.917,9.95,289.3248,1.0,24.961965,121.52462,26.95
50%,2013.167,16.1,515.1122,4.0,24.9703,121.53844,37.9
75%,2013.417,27.55,1604.9025,6.0,24.97745,121.543245,45.5
max,2013.583,42.7,6488.021,10.0,25.01459,121.56627,117.5


In [6]:
k_training_ds = k_training_ds.copy() #Copy of the data
k_testing_ds = k_testing_ds.copy() #Copy of the data 

k_train_predict = k_training_ds.pop('Y house price of unit area') #Separate the features and labels (predictors and predicted)
k_test_predict = k_testing_ds.pop('Y house price of unit area') #Separate the features and labels (predictors and predicted)

In [7]:
k_training_ds.describe().transpose()[['mean', 'std']]

Unnamed: 0,mean,std
X1 transaction date,2013.150571,0.278091
X2 house age,17.652266,11.305017
X3 distance to the nearest MRT station,1154.274016,1319.156009
X4 number of convenience stores,3.996979,2.964945
X5 latitude,24.968588,0.012823
X6 longitude,121.532759,0.015975


### What is normalization ?/?

Normalization is a data preparation technique that is frequently used in machine learning. The process of transforming the columns in a dataset to the same scale is referred to as normalization. Every dataset does not need to be normalized for machine learning. It is only required when the ranges of characteristics are different.

Consider a data collection that includes two characteristics: age and income. Where the age spans from 0 to 80 years old, and the income extends from 0 to 80,000 dollars and up. Income is roughly 1,000 times that of age. As a result, the ranges of these two characteristics are vastly different.

Because of its bigger value, the attributed income will organically influence the conclusion more when we undertake further analysis, such as multivariate linear regression. However, this does not necessarily imply that it is a better predictor. As a result, we normalize the data so that all of the variables are in the same range.

We normalize training data to solve the model learning challenge. We make sure that the various features have similar value ranges (feature scaling) so that gradient descents can converge faster.


---


It is good practice to normalize features that use different scales and ranges.

One reason this is important is because the features are multiplied by the model weights. So, the scale of the outputs and the scale of the gradients are affected by the scale of the inputs.

Although a model might converge without feature normalization, normalization makes training much more stable.

Note: There is no advantage to normalizing the one-hot features—it is done here for simplicity. For more details on how to use the preprocessing layers, refer to the Working with preprocessing layers guide and the Classify structured data using Keras preprocessing layers tutorial.

In [8]:
normalizer = tflow.keras.layers.Normalization(axis=-1) #A preprocessing layer which normalizes continuous features 

This layer will shift and scale inputs into a distribution centered around 0 with standard deviation 1. It accomplishes this by precomputing the mean and variance of the data, and calling (input - mean) / sqrt(var) at runtime.

axis - Integer, tuple of integers, or None. The axis or axes that should have a separate mean and variance for each index in the shape. For example, if shape is (None, 5) and axis=1, the layer will track 5 separate mean and variance values for the last axis. If axis is set to None, the layer will normalize all elements in the input by a scalar mean and variance. Defaults to -1, where the last axis of the input is assumed to be a feature dimension and is normalized per index. Note that in the specific case of batched scalar inputs where the only axis is the batch axis, the default will normalize each index in the batch separately. In this case, consider passing axis=None.

In [9]:
normalizer.adapt(np.array(k_training_ds))

Computes the mean and variance of values in a dataset.

In [10]:
print(normalizer.mean)

tf.Tensor(
[[2013.1505      17.652266  1154.2739       3.9969788   24.968586
   121.532745 ]], shape=(1, 6), dtype=float32)


In [11]:
print('Original: ', np.array(k_training_ds[0:9]))
print('\nNormalized: ', normalizer(np.array(k_training_ds[0:9])).numpy())

Original:  [[2.0128330e+03 1.0300000e+01 2.1144730e+02 1.0000000e+00 2.4974170e+01
  1.2152999e+02]
 [2.0133330e+03 2.4000000e+01 4.5276870e+03 0.0000000e+00 2.4947410e+01
  1.2149628e+02]
 [2.0133330e+03 3.4500000e+01 3.2494190e+02 6.0000000e+00 2.4978140e+01
  1.2154170e+02]
 [2.0133330e+03 2.5600000e+01 4.5196900e+03 0.0000000e+00 2.4948260e+01
  1.2149587e+02]
 [2.0135000e+03 1.4400000e+01 1.6998030e+02 1.0000000e+00 2.4973690e+01
  1.2152979e+02]
 [2.0130830e+03 3.6600000e+01 4.8881930e+02 8.0000000e+00 2.4970150e+01
  1.2154494e+02]
 [2.0132500e+03 3.5800000e+01 1.7073110e+02 7.0000000e+00 2.4967190e+01
  1.2154269e+02]
 [2.0130830e+03 3.4800000e+01 4.0521340e+02 1.0000000e+00 2.4973490e+01
  1.2153372e+02]
 [2.0134170e+03 1.0500000e+01 2.7917260e+02 7.0000000e+00 2.4975280e+01
  1.2154541e+02]]

Normalized:  [[-1.1434238  -0.6513389  -0.71580166 -1.0123346   0.4361661  -0.1726767 ]
 [ 0.65721595  0.56234723  2.5611227  -1.3501196  -1.6537966  -2.2864118 ]
 [ 0.65721595  1.492544

input_shape=[1,] = we are passing only 1 features so input tensor has a shape 1,

units = 1 - Positive integer, dimensionality of the output space.

In [12]:
dnn_model = tflow.keras.Sequential() #Define the model
dnn_model.add(normalizer) #Add a pre-processing layer
dnn_model.add(layers.Dense(64, activation='relu')) #non linear layer 1
dnn_model.add(layers.Dense(64, activation='relu')) #non linear layer 2
dnn_model.add(layers.Dense(1)) #Apply linear transformation via a dense layer that produces 1 output

In [13]:
dnn_model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 normalization (Normalizatio  (None, 6)                13        
 n)                                                              
                                                                 
 dense (Dense)               (None, 64)                448       
                                                                 
 dense_1 (Dense)             (None, 64)                4160      
                                                                 
 dense_2 (Dense)             (None, 1)                 65        
                                                                 
Total params: 4,686
Trainable params: 4,673
Non-trainable params: 13
_________________________________________________________________


In [14]:
dnn_model.compile(   #compile the model to define the optimizer and the loss 
    optimizer=tflow.keras.optimizers.Adam(learning_rate=0.001),
    loss='mean_absolute_error')

In [15]:
%load_ext tensorboard 

In [16]:
import tensorflow as tf
import datetime

In [17]:
log_dir = "logs/fit/" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=log_dir, histogram_freq=1)

In [18]:
dnn_model.fit(   
    k_training_ds,
    k_train_predict,
    epochs=100,
    verbose=0, 
    validation_split = 0.2,
    callbacks=[tensorboard_callback])  # Calculate validation results on 20% of the training data.

<keras.callbacks.History at 0x7fe0186f6280>

In [19]:
%tensorboard --logdir logs/fit

Reusing TensorBoard on port 6006 (pid 26380), started 0:02:11 ago. (Use '!kill 26380' to kill it.)