# Modelling with TensorFlow

Based on the findings made when exploring the data, now it's time to train a regression model to try and predict the housing prices

In [1]:
# Importing the libraries and checking TF version for compatibility
import pandas as pd
import numpy as np
import tensorflow as tf
print(tf.__version__)

2.5.0


## Checking Hardware Configuration

Checking if TF has recognized the local GPU that will be used for training the model

In [2]:
tf.config.list_physical_devices()

[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'),
 PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

In [3]:
!nvidia-smi

Mon May 31 22:26:41 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 466.47       Driver Version: 466.47       CUDA Version: 11.3     |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  NVIDIA GeForce ... WDDM  | 00000000:04:00.0  On |                  N/A |
|  0%   51C    P8    15W / 125W |   1170MiB /  6144MiB |     12%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

## Loading the dataset from TF

In [9]:
# Getting the data without train test split to conduct initial analysis of whole data
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.boston_housing.load_data(
    path='boston_housing.npz', test_split=0.2, seed=113
)
COLUMNS = ['CRIM','ZN','INDUS','CHAS','NOX','RM','AGE','DIS','RAD','TAX','PTRATIO','B','LSTAT','MEDV']     
len(x_train),len(x_test)

(404, 102)

In [10]:
# Joining both the features and the results to analyze
x = pd.DataFrame(x_train,columns=COLUMNS[:-1])
x2 = pd.DataFrame(x_test,columns=COLUMNS[:-1])
y = pd.DataFrame(y_train,columns=COLUMNS[-1:])
y2 = pd.DataFrame(y_test,columns=COLUMNS[-1:])
x['MEDV'] = y
x2['MEDV'] = y2
# Getting the basic statistical info from DataFrame
x.describe()

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT,MEDV
count,404.0,404.0,404.0,404.0,404.0,404.0,404.0,404.0,404.0,404.0,404.0,404.0,404.0,404.0
mean,3.745111,11.480198,11.104431,0.061881,0.557356,6.267082,69.010644,3.740271,9.440594,405.898515,18.47599,354.783168,12.740817,22.39505
std,9.240734,23.767711,6.811308,0.241238,0.117293,0.709788,27.940665,2.030215,8.69836,166.374543,2.200382,94.111148,7.254545,9.210442
min,0.00632,0.0,0.46,0.0,0.385,3.561,2.9,1.1296,1.0,188.0,12.6,0.32,1.73,5.0
25%,0.081437,0.0,5.13,0.0,0.453,5.87475,45.475,2.0771,4.0,279.0,17.225,374.6725,6.89,16.675
50%,0.26888,0.0,9.69,0.0,0.538,6.1985,78.5,3.1423,5.0,330.0,19.1,391.25,11.395,20.75
75%,3.674808,12.5,18.1,0.0,0.631,6.609,94.1,5.118,24.0,666.0,20.2,396.1575,17.0925,24.8
max,88.9762,100.0,27.74,1.0,0.871,8.725,100.0,10.7103,24.0,711.0,22.0,396.9,37.97,50.0


## Checking Train/Test Split

Using SweetViz to guarantee that the train and test datasets are statistically similar so there are no biases that affect the training and evaluation of the model

In [12]:
import sweetviz as sv

compare_report = sv.compare([x,'Train'],[x2,'Test'],target_feat='MEDV')

[Step 3/3] Generating associations graph     |          | [  0%]   00:00 -> (? left)


In [13]:
compare_report.show_html(filepath='.\\Analysis\\Compare_report.html')
compare_report.show_notebook()

Report .\Analysis\Compare_report.html was generated! NOTEBOOK/COLAB USERS: the web browser MAY not pop up, regardless, the report IS saved in your notebook/colab files.
