<a href="https://colab.research.google.com/github/rcarrata/deeplearning_tf_examples/blob/master/1_Neural_Network_StepByStep.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# THEORY 1 - WARMING UP ####

**Classification**: given data and true labels or  categories for each data point, train a model that predicts for each data example what its label should be.

**Regression**: given data and true continuous value for each data point, train a model that can predict values for each data example.


In [None]:
### Exercise 0 - LOADING DATASET

import pandas as pd
from google.colab import drive
drive.mount('/content/drive')

!ls "/content/drive/My Drive/Colab/Insurance/insurance.csv"

root_folder = "/content/drive/My Drive/Colab/"
project_folder = "Insurance/"
csv_file = "insurance.csv"

csv_data = root_folder + project_folder + csv_file
print(csv_data)

dataset = pd.read_csv(csv_data)

from google.colab.data_table import DataTable
DataTable.max_columns = 60

dataset.head()

Mounted at /content/drive
'/content/drive/My Drive/Colab/Insurance/insurance.csv'
/content/drive/My Drive/Colab/Insurance/insurance.csv


Unnamed: 0,age,sex,bmi,children,smoker,region,charges
0,19,female,27.9,0,yes,southwest,16884.924
1,18,male,33.77,1,no,southeast,1725.5523
2,28,male,33.0,3,no,southeast,4449.462
3,33,male,22.705,0,no,northwest,21984.47061
4,32,male,28.88,0,no,northwest,3866.8552


In [None]:
#### EXERCISE 1 - WARMING UP ####

## Indexing just the rows. With a scalar integer.
# Choose first 7 columns as features
# Dataframe slicing using iloc
features = dataset.iloc[:,0:6]

# Choose the final column for prediction 
# We select the last column with -1 
labels = dataset.iloc[:,-1] 

# features.shape[]
# The pandas shape property tells us the shape of our data 
# — a vector of two values: the number of samples and the number of features.

# Print the number of features in the dataset
print("Number of features: ", features.shape[1])
# Print the number of samples in the dataset
print("Number of samples: ", features.shape[0])

# See useful summary statistics for numeric features
print("\n")
print("## Describe features")
print(features.describe())

# features.describe() -> Descriptive statistics include those that summarize 
# the central tendency, dispersion and shape of a dataset's distribution

# Print the number of samples of the labels Series.
print(labels.shape[0])



Number of features:  6
Number of samples:  1338


## Describe features
               age          bmi     children
count  1338.000000  1338.000000  1338.000000
mean     39.207025    30.663397     1.094918
std      14.049960     6.098187     1.205493
min      18.000000    15.960000     0.000000
25%      27.000000    26.296250     0.000000
50%      39.000000    30.400000     1.000000
75%      51.000000    34.693750     2.000000
max      64.000000    53.130000     5.000000
1338


# THEORY 2 - DATA PREPROCESSING ####

### **Data preprocessing**: one-hot encoding and standardization

* **One-hot encoding** of categorical features:

 *Since neural networks cannot work with string data directly, we need to convert our categorical features (“region”) into numerical*. 

 One-hot encoding **creates a binary column for each category**.

```python
features  = pd.get_dummies(features)
```

Example: https://interactivechaos.com/es/manual/tutorial-de-machine-learning/la-funcion-getdummies

### Split data into train and test sets:

```python
from sklearn.model_selection import train_test_split
features_train, features_test, labels_train, labels_test = train_test_split(features, labels, test_size=0.33, random_state=42)
```

### Standardize/normalize numerical features:

* The usual **preprocessing** step for numerical variables, among others, is standardization that rescales features to zero mean and unit variance.

* **Normalization** is another way of preprocessing numerical data: it scales the numerical features to a fixed range - usually between 0 and 1. 

To normalize the numerical features we use an exciting addition to scikit-learn, ColumnTransformer, in the following way:

```python
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import Normalizer
from sklearn.compose import ColumnTransformer
  
ct = ColumnTransformer([('normalize', Normalizer(), ['age', 'bmi', 'children'])], remainder='passthrough')
features_train = ct.fit_transform(features_train)
features_test = ct.transform(features_test)
```

The name of the column transformer is “only numeric”, it applies a Normalizer() to the ‘age’, ‘bmi’, and ‘children’ columns, and for the rest of the columns it just passes through. ColumnTransformer() returns NumPy arrays and we convert them back to a pandas DataFrame so we can see some useful summaries of the scaled data.

* To convert a NumPy array back into a pandas DataFrame, we can do:

```python
features_train_norm = pd.DataFrame(features_train_norm, columns = features_train.columns)
```

In [None]:
#### EXERCISE 2 - DATA PRE-PROCESSING ####
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import Normalizer

## Load the dataset
# dataset = pd.read_csv('insurance.csv') # Dataset loaded in early step

# Choose first 7 columns as features
features = dataset.iloc[:,0:6] 

# Choose the final column for prediction
labels = dataset.iloc[:,-1] 

# One-hot encoding for categorical variables
# Convert categorical variable into dummy/indicator variables.
# The get_dummies function allows you to eliminate the first of the columns 
# generated for each coded feature to avoid the so-called collinearity 
# (that one of the features is a linear combination of the others), 
# which makes it difficult for the algorithms to work correctly. 
# For this we have the drop_first argument.
features = pd.get_dummies(features) 

# Split the data into training and test data
features_train, features_test, labels_train, labels_test = train_test_split(features, labels, test_size=0.33, random_state=42) 
 
# Normalize the numeric columns using ColumnTransformer
ct = ColumnTransformer([('normalize', Normalizer(), ['age', 'bmi', 'children'])], remainder='passthrough')

# Fit the normalizer to the training data and convert from numpy arrays to pandas frame
features_train_norm = ct.fit_transform(features_train) 

# Applied the trained normalizer on the test data and convert from numpy arrays to pandas frame
features_test_norm = ct.transform(features_test) 

# ColumnTransformer returns numpy arrays. Convert the features to dataframes
features_train_norm = pd.DataFrame(features_train_norm, columns = features_train.columns)
features_test_norm = pd.DataFrame(features_test_norm, columns = features_test.columns)

my_ct = ColumnTransformer([('scale', StandardScaler(), ['age', 'bmi', 'children'])], remainder='passthrough')

# Use the .fit_transform() method of my_ct to fit the column transformer to the 
# features_train DataFrame and at the same time transform it. Assign the result 
# to a variable called features_train_scale.
features_train_scale = my_ct.fit_transform(features_train)

# Use the .transform() method to transform the trained column transformer my_ct 
# to the features_test DataFrame. Assign the result to a variable called features_test_scale.
features_test_scale = my_ct.transform(features_test)

# Transform the features_train_scale NumPy array back to a DataFrame using 
# pd.DataFrame() and assign the result back to a variable called features_train_scale. 
# For the columns attribute use the .columns property of features_train.
features_train_scale = pd.DataFrame(features_train_scale, columns = features_train.columns)

# Transform the features_test_scale NumPy array back to DataFrame using 
# pd.DataFrame() and assign the result back to a variable called features_test_scale. 
# For the columns attribute use the .columns property of features_test.
features_test_scale = pd.DataFrame(features_test_scale, columns = features_test.columns)

# Print the statistics summary of the resulting train and test DataFrames, 
# features_train_scale and features_test_scale.
# Observe the statistics of the numeric columns (mean, variance).

print("## features_train_scale\n")
print(features_train_scale.describe())

print("\n")
print("## features_test_scale")
print(features_test_scale.describe())

## features_train_scale

                age           bmi      children  sex_female    sex_male  \
count  8.960000e+02  8.960000e+02  8.960000e+02  896.000000  896.000000   
mean   9.417070e-18  6.835275e-16 -1.069333e-16    0.487723    0.512277   
std    1.000559e+00  1.000559e+00  1.000559e+00    0.500128    0.500128   
min   -1.494934e+00 -2.438281e+00 -9.126072e-01    0.000000    0.000000   
25%   -8.613199e-01 -7.139833e-01 -9.126072e-01    0.000000    0.000000   
50%   -1.650038e-02 -5.227104e-02 -8.245892e-02    0.000000    1.000000   
75%    8.987207e-01  6.598116e-01  7.476894e-01    1.000000    1.000000   
max    1.743540e+00  3.776715e+00  3.238134e+00    1.000000    1.000000   

        smoker_no  smoker_yes  region_northeast  region_northwest  \
count  896.000000  896.000000        896.000000        896.000000   
mean     0.790179    0.209821          0.256696          0.252232   
std      0.407408    0.407408          0.437054          0.434536   
min      0.000000    0.

# THEORY 3 - NEURAL NETWORK MODEL BASICS ####

## Neural network model: tf.keras.Sequential

Now that we have our data preprocessed we can start building the neural network model. The most frequently used model in TensorFlow is Keras Sequential.

* A **sequential model**, as the name suggests, **allows us to create models 
layer-by-layer in a step-by-step fashion**. This model can have only one input tensor and only one output tensor.

* To design a sequential model, we first need to import Sequential from keras.models:

```python
from tensorflow.keras.models import Sequential
```

* To improve readability, we will design the model in a separate Python function called design_model(). The following command initializes a Sequential model instance my_model:

```python
my_model = Sequential(name="my first model")
```

NOTE: name is an optional argument to any model in Keras.

* Finally, we invoke our function in the main program with:

```python
my_model = design_model(features_train)
```

* The model’s layers are accessed via the layers attribute:

```python
print(my_model.layers)
```

As expected, the list of layers is empty. In the next exercise, we will start adding layers to our model.


In [None]:
#### EXERCISE 3 - NEURAL NETWORK MODEL BASICS ####
#### Neural network model: tf.keras.Sequential Exercise
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler
from tensorflow.keras.models import Sequential
from tensorflow.keras import layers

# initialize an instance of Sequential() and assign it to a variable called model
def design_model(features):
  model = Sequential(name="my first model")
  return model
  
# dataset = pd.read_csv('insurance.csv') #load the dataset
features = dataset.iloc[:,0:6] #choose first 7 columns as features
labels = dataset.iloc[:,-1] #choose the final column for prediction

features = pd.get_dummies(features) #one-hot encoding for categorical variables

# split the data into training and test data
features_train, features_test, labels_train, labels_test = train_test_split(features, labels, test_size=0.33, random_state=42) 
 
# standardize
ct = ColumnTransformer([('standardize', StandardScaler(), ['age', 'bmi', 'children'])], remainder='passthrough')

# ct.fit_transform => Fit all transformers, transform the data and concatenate results.
# ct.transform => Transform X separately by each transformer, concatenate results.
features_train = ct.fit_transform(features_train)
features_test = ct.transform(features_test)

# Invoke the function for our model design
model = design_model(features_train)

# In the main program, using the layers attribute, print the layers of the model instance model.
print(model.layers)

# As expected, the list of layers is empty. In the next exercise, we will start adding layers to our model.

[]


In [None]:
#### THEORY 4 - NEURAL NETWORK MODEL - LAYERS ####

###### Neural network model: layers - 
# Layers are the building blocks of neural networks and can contain 1 or more 
# neurons. 

#### !!! Each layer is associated with parameters: weights, and bias !!
# that are tuned during the learning. A fully-connected layer in which all neurons connect 
# to all neurons in the next layer is created the following way in 

# PARAMETERS: weights, and bias
import tensorflow as tf
from tensorflow.keras import layers

# we chose 3 neurons here
layer = layers.Dense(3)

# Pay attention to the dimensions of the weight and bias parameter matrices. 
# Since we chose to create a layer with three neurons, the number of outputs of 
# this layer is 3. Hence, the bias parameter would be a vector of (3, 1) dimensions.
print(layer.weights)

# 13388 samples, 11 features as in our dataset
input = tf.ones((1338, 11))
# tf.ones => Creates a tensor with all elements set to one (1).

# a fully-connected layer with 3 neurons
layer = layers.Dense(3) 

# calculate the outputs
output = layer(input) 

# print the weights
print(layer.weights)

# we get that the weight matrix has shape = (11, 3) and the bias matrix has 
# shape=(3,). Compare these weights with the diagram above to make sure you 
# can associate the resulting shapes to it.

# Fortunately, we don’t have to worry about this. 
# TensorFlow will determine the shapes of the weight matrix and bias matrix 
# automatically the moment it encounters the first input.



[]
[<tf.Variable 'dense_1/kernel:0' shape=(11, 3) dtype=float32, numpy=
array([[ 0.32592195,  0.31091326,  0.44234145],
       [-0.5610312 , -0.29185963,  0.42624545],
       [ 0.45342636,  0.6380725 , -0.07350582],
       [-0.31981993, -0.02801275,  0.62853265],
       [ 0.15368855, -0.24774379, -0.14063168],
       [-0.5377222 , -0.10184151, -0.42027795],
       [-0.34179795,  0.4084469 ,  0.2444675 ],
       [-0.45660064, -0.5320892 , -0.12887317],
       [-0.61708075, -0.31729406, -0.48610944],
       [-0.5544605 , -0.21437901, -0.33785877],
       [ 0.5529678 , -0.26492321, -0.22141564]], dtype=float32)>, <tf.Variable 'dense_1/bias:0' shape=(3,) dtype=float32, numpy=array([0., 0., 0.], dtype=float32)>]


In [None]:
#### EXERCISE 4 - NEURAL NETWORK MODEL - LAYERS ####

import tensorflow as tf
from tensorflow.keras import layers

# 3 is the number we chose
layer = layers.Dense(3) 

# we get empty weight and bias arrays because tensorflow 
# doesn't know what the shape is of the input to this layer
print(layer.weights) 

# Change the number of samples in the input tensor from 1338 to 5000.
#### tf.ones => Creates a tensor with all elements set to one (1).
###### 5000 samples, 21 features
input = tf.ones((5000, 21))

# a fully-connected layer with 10 neurons

#### Dense => implements Just your regular densely-connected NN layer.
## Dense implements the operation:
## output = activation(dot(input, kernel) + bias)
## where activation is the element-wise activation function
## passed as the activation argument, kernel is a weights matrix
## created by the layer, and bias is a bias vector created by the layer
## (only applicable if use_bias is True). These are all attributes of Dense.
layer = layers.Dense(10) 

# calculate the outputs
output = layer(input)

# print the weights
print(layer.weights)

## we get that the weight matrix has shape = (11, 3) and the bias matrix has shape=(3,).
## Compare these weights with the diagram above to make sure you can associate 
## the resulting shapes to it.

## Fortunately, we don’t have to worry about this. TensorFlow will 
## determine the shapes of the weight matrix and bias matrix automatically 
## the moment it encounters the first input.

[]
[<tf.Variable 'dense_3/kernel:0' shape=(21, 10) dtype=float32, numpy=
array([[-0.2790981 ,  0.12694335, -0.42396972, -0.25296703, -0.22709335,
        -0.43407306, -0.00993699, -0.36362052,  0.05261731, -0.33542633],
       [ 0.3547274 , -0.19319305, -0.25650918,  0.06288075, -0.0245699 ,
         0.04370227,  0.10043794, -0.43751943, -0.3696518 ,  0.02677017],
       [-0.2283514 ,  0.24852931,  0.32857513,  0.3089441 ,  0.32831532,
        -0.3924433 ,  0.09203118, -0.01002866,  0.06180149, -0.18082145],
       [ 0.36311412, -0.06443456,  0.15578699, -0.24180126,  0.3103785 ,
        -0.27212167, -0.34078792,  0.18396866, -0.34347448, -0.34802   ],
       [ 0.08576816, -0.3811952 , -0.06821838,  0.11085957, -0.21277793,
         0.1770981 ,  0.34906322, -0.34313777, -0.13162178,  0.07922721],
       [ 0.4193967 ,  0.11826366,  0.43793678, -0.33148384,  0.3668936 ,
        -0.17717412, -0.39358744, -0.4258464 , -0.05565211, -0.38142458],
       [ 0.34537584,  0.260648  , -0.15222472

# THEORY 5 - Neural network model - Input Layer ####

## Neural network model: input layer Example

* Inputs to a neural network are usually not considered the actual transformative layers. They are merely placeholders for data. 

* In Keras, an input for a neural network can be specified with a tf.keras.layers.InputLayer object. 

* The following code initializes an input layer for a DataFrame my_data that has 15 columns:

```python
from tensorflow.keras.layers import InputLayer
my_input = InputLayer(input_shape=(15,))
```

* **IMPORTANT**: Notice that the input_shape parameter has to have its first dimension equal to the number of features in the data. You don’t need to specify the second dimension: the number of samples or batch size.

* The following code avoids hard-coding with using the .shape property of the my_data DataFrame:

```python
#get the number of features/dimensions in the data
num_features = my_data.shape[1] 

# without hard-coding
my_input = tf.keras.layers.InputLayer(input_shape=(num_features,)) 
```

* The following code adds this input layer to a model instance my_model:

```python
my_model.add(my_input)
```

* The following code prints a useful summary of a model instance my_model:

```python
print(my_model.summary())
```

As you can see, the summary shows that the total number of parameters is 0. 
This shows you that the input layer has no trainable parameters and is just a placeholder for data.

In [None]:
#### EXERCISE 5 - Neural network model - Input Layer ####

##### Neural network model: input layer Example

import pandas as pd
import tensorflow as tf
from sklearn.model_selection import train_test_split
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler
from tensorflow.keras.models import Sequential
from tensorflow.keras import layers


def design_model(features):
  model = Sequential(name = "my_first_model")
  #your code here
  # In the design_model() function, create a variable called num_features and 
  # assign it the number of columns in the features DataFrame using the .shape property

  ## get the number of features/dimensions in the data
  num_features = features.shape[1]

  # In the design_model() function:
  # create a variable called input. assign input an instance of InputLayer. 
  # set the first dimension of the input_shape parameter equal to num_features

  ## Notice that the input_shape parameter has to have its first dimension equal
  ## to the number of features in the data. You don’t need to specify the second
  #  dimension: the number of samples or batch size.
  input = layers.InputLayer(input_shape=(num_features,))

  # The following code adds this input layer to a model instance my_model:
  model.add(input)

  return model


# dataset = pd.read_csv('insurance.csv') #load the dataset
features = dataset.iloc[:,0:6] #choose first 7 columns as features
labels = dataset.iloc[:,-1] #choose the final column for prediction

# one-hot encoding for categorical variables
features = pd.get_dummies(features)

# split the data into training and test data
features_train, features_test, labels_train, labels_test = train_test_split(features, labels, test_size=0.33, random_state=42) 

# standardize
## ColumnTransformer Applies transformers to columns of an array or pandas DataFrame.
## This estimator allows different columns or column subsets of the input
## to be transformed separately and the features generated by each transformer
## will be concatenated to form a single feature space.
ct = ColumnTransformer([('standardize', StandardScaler(), ['age', 'bmi', 'children'])], remainder='passthrough')
features_train = ct.fit_transform(features_train)
features_test = ct.transform(features_test)

# Invoke the function for our model design
model = design_model(features_train)

## Use the .summary() method to print the summary of the model instance model.
print(model.summary())

# As you can see, the summary shows that the total number of parameters is 0. 
# This shows you that the input layer has no trainable parameters and is 
# just a placeholder for data.

Model: "my_first_model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
Total params: 0
Trainable params: 0
Non-trainable params: 0
_________________________________________________________________
None


# THEORY 6 - Neural network model: output layer ####

## Neural network model: output layer

* The **output layer** shape depends on your task. In the case of regression, we need one output for each sample. For example, if your data has 100 samples, you would expect your output to be a vector with 100 entries - a numerical prediction for each sample.

* In our case, we are doing regression and wish to predict one number for each data point: the medical cost billed by health insurance indicated in the  charges column in our data. Hence, our output layer has only one neuron.

* The following command adds a layer with one neuron to a model instance my_model:

```python
from tensorflow.keras.layers import Dense
my_model.add(Dense(1))
```

* Notice that you don’t need to specify the input shape of this layer since Tensorflow with Keras can automatically infer its shape from the previous layer.


In [None]:
#### EXERCISE 6 - Neural network model: output layer ####
##### Neural network model: output layer Example

# create and add an output layer to the model instance model as an instance of 
from tensorflow.keras.layers import Dense
model.add(Dense(1))

## -> THEORY 7 - Neural network model: hidden layers ####

So far we have added one input layer and one output layer to our model. If you think about it, our model currently represents a linear regression. 

\

To capture more complex or non-linear interactions among the inputs and outputs neural networks, we’ll need to incorporate hidden layers

```python
from tensorflow.keras.layers import Dense
my_model.add(Dense(64, activation='relu'))
```

We chose 64 (2^6) to be the number of neurons since it makes optimization 
more efficient due to the binary nature of computation.

\

### ACTIVATION FUNCTION

With the activation parameter, we specify which activation function we want to have in the output of our hidden layer. There are a number of activation functions such as softmax, sigmoid, but ReLU (relu) (Rectified Linear Unit) is very effective in many applications and we’ll use it here.

\

Adding more layers to a neural network naturally increases the number of parameters to be tuned. !!With every layer, there are associated weight and bias vectors.

\

In following diagram below we show the size of parameter vectors with each 
layer. In our case, the 1st layer’s weight matrix (red) has shape (11, 64) 
because we feed 11 features to 64 hidden neurons. The output layer (purple) 
has the weight matrix of shape (64, 1) because we have 64 input units and 1 neuron in the final layer.

In [None]:
#### EXERCISE 7 - Neural network model: hidden layers ####

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import InputLayer
from tensorflow.keras.layers import Dense


def design_model(features):
  model = Sequential(name = "my_first_model")

  # input_shape parameter has to have its first dimension equal
  # to the number of features in the data.
  input = InputLayer(input_shape=(features.shape[1],))

  #add the input layer
  model.add(input)

  # add a new hidden layer to the model instance model with the following parameters:
  # 128 hidden units a relu activation function
  model.add(Dense(128, activation='relu'))

  #adding an output layer to our model
  model.add(Dense(1)) 
  return model

#dataset = pd.read_csv('insurance.csv') #load the dataset
features = dataset.iloc[:,0:6] #choose first 7 columns as features
labels = dataset.iloc[:,-1] #choose the final column for prediction

features = pd.get_dummies(features) #one-hot encoding for categorical variables

#split the data into training and test data
features_train, features_test, labels_train, labels_test = train_test_split(features, labels, test_size=0.33, random_state=42) 
 
#standardize
ct = ColumnTransformer([('standardize', StandardScaler(), ['age', 'bmi', 'children'])], remainder='passthrough')
features_train = ct.fit_transform(features_train)
features_test = ct.transform(features_test)

#invoke the function for our model design
model = design_model(features_train)

#print the model summary here
print(model.summary())

Model: "my_first_model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_5 (Dense)             (None, 128)               1536      
                                                                 
 dense_6 (Dense)             (None, 1)                 129       
                                                                 
Total params: 1,665
Trainable params: 1,665
Non-trainable params: 0
_________________________________________________________________
None


## -> THEORY 8 - OPTIMIZERS

As we mentioned, our goal is for the network to effectively adjust its weights
or parameters in order to reach the best performance. Keras offers a variety of optimizers such as SGD (Stochastic Gradient Descent optimizer), Adam, RMSprop, and others.

\

We’ll start by introducing the Adam optimizer:

```python
from tensorflow.keras.optimizers import Adam
opt = Adam(learning_rate=0.01)
```

The **learning rate determines how big of jumps the optimizer makes in the 
parameter space** (weights and bias) and it is considered a hyperparameter that can be also tuned. While model parameters are the ones that the model uses to make predictions, hyperparameters determine the learning process (learning rate,number of iterations, optimizer type).

\

If the learning rate is set too high, the optimizer will make large jumps and 
possibly miss the solution. On the other hand, if set too low, the learning 
process is too slow and might not converge to a desirable solution with the 
allotted time. Here we’ll use a value of 0.01, which is often used.

\

Once the optimizer algorithm is chosen, a model instance my_model is compiled 
with the following code:

```python
my_model.compile(loss='mse',  metrics=['mae'], optimizer=opt)
```

**loss** denotes the measure of learning success and the lower the loss the better the performance. In the case of regression, the most often used loss function is the **Mean Squared Error** mse (the average squared difference between the estimated values and the actual value).

\

Additionally, we want to observe the progress of the **Mean Absolute Error** (mae) while training the model because MAE can give us a better idea than mse on how far off we are from the true values in the units we are predicting. 

In our case, we are predicting charge in dollars and MAE will tell us how many dollars we’re off, on average, from the actual values as the network is being trained.

In [None]:
#### EXERCISE 8 - OPTIMIZERS ####

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import InputLayer
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import Adam


def design_model(features):
  model = Sequential(name = "my_first_model")
  input = InputLayer(input_shape=(features.shape[1],))
   #add an input layer
  model.add(input)
  #add a hidden layer with 128 neurons
  model.add(Dense(128, activation='relu')) 
  #add an output layer
  model.add(Dense(1)) 
  #your code here
  opt = Adam(learning_rate=0.01)
  model.compile(loss='mse', metrics=['mae'], optimizer=opt)
  return model


#dataset = pd.read_csv('insurance.csv') #load the dataset
features = dataset.iloc[:,0:6] #choose first 7 columns as features
labels = dataset.iloc[:,-1] #choose the final column for prediction

features = pd.get_dummies(features) #one-hot encoding for categorical variables
features_train, features_test, labels_train, labels_test = train_test_split(features, labels, test_size=0.33, random_state=42) #split the data into training and test data
 
#standardize
ct = ColumnTransformer([('standardize', StandardScaler(), ['age', 'bmi', 'children'])], remainder='passthrough')
features_train = ct.fit_transform(features_train)
features_test = ct.transform(features_test)

#invoke the function for our model design
model = design_model(features_train)
print(model.summary())


Model: "my_first_model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_7 (Dense)             (None, 128)               1536      
                                                                 
 dense_8 (Dense)             (None, 1)                 129       
                                                                 
Total params: 1,665
Trainable params: 1,665
Non-trainable params: 0
_________________________________________________________________
None


## -> THEORY 9 - Training and evaluating the model

Now that we built the model we are ready to train the model using the training data.

\

The following command trains a model instance my_model using training data my_data and training labels my_labels :

```python
my_model.fit(my_data, my_labels, epochs=50, batch_size=3, verbose=1)
```

model.fit() takes the following parameters:
* my_data is the training data set.
* my_labels are true labels for the training data points.
* epochs refers to the number of cycles through the full training dataset. Since training of neural networks is an iterative process, you need multiple passes through data. Here we chose 50 epochs, but how do you pick a number of epochs? Well, it is hard to give one answer since it depends on your dataset. Amongst others, this is a hyperparameter that can be tuned — which we’ll cover later.
* batch_size is the number of data points to work through before updating the model parameters. It is also a hyperparameter that can be tuned.
* verbose = 1 will show you the progress bar of the training.


When the training is finalized, we use the trained model to predict values for samples that the training procedure haven’t seen: the test set.

\

The following commands evaluates the model instance my_model using the test data my_data and test labels my_labels:

```python
val_mse, val_mae = my_model.evaluate(my_data, my_labels, verbose = 0)
```

So what is the final result? We should get ~$3884.21. This means that on average we’re off with our prediction by around 3800 dollars. Is that a good result or a bad result?

\

Often you need an expert or domain knowledge to decide this. What is an acceptable error for the application? Is $3800 a big error when deciding on insurance charges? Can you do better and how? As you see, the process doesn’t stop here.

In [None]:
#### EXERCISE 9 - Training and evaluating the model ####

import pandas as pd
import tensorflow
from sklearn.model_selection import train_test_split
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import InputLayer
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import Adam

tensorflow.random.set_seed(35) #for the reproducibility of results

def design_model(features):
  model = Sequential(name = "my_first_model")
  #without hard-coding
  input = InputLayer(input_shape=(features.shape[1],)) 
  #add the input layer
  model.add(input) 
  #add a hidden layer with 128 neurons
  model.add(Dense(128, activation='relu')) 
  #add an output layer to our model
  model.add(Dense(1)) 
  opt = Adam(learning_rate=0.1)
  model.compile(loss='mse',  metrics=['mae'], optimizer=opt)
  return model

#dataset = pd.read_csv('insurance.csv') #load the dataset
features = dataset.iloc[:,0:6] #choose first 7 columns as features
labels = dataset.iloc[:,-1] #choose the final column for prediction

features = pd.get_dummies(features) #one-hot encoding for categorical variables
features_train, features_test, labels_train, labels_test = train_test_split(features, labels, test_size=0.33, random_state=42) #split the data into training and test data
 
# standardize
ct = ColumnTransformer([('standardize', StandardScaler(), ['age', 'bmi', 'children'])], remainder='passthrough')
features_train = ct.fit_transform(features_train)
features_test = ct.transform(features_test)

# invoke the function for our model design
model = design_model(features_train)
print(model.summary())

# fit the model using 40 epochs and batch size 1
model.fit(features_train, labels_train, epochs=40, batch_size=1, verbose=0)

# evaluate the model on the test data
val_mse, val_mae = model.evaluate(features_test, labels_test, verbose=1)

print("MAE: ", val_mae)

Model: "my_first_model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_9 (Dense)             (None, 128)               1536      
                                                                 
 dense_10 (Dense)            (None, 1)                 129       
                                                                 
Total params: 1,665
Trainable params: 1,665
Non-trainable params: 0
_________________________________________________________________
None
MAE:  2490.15869140625
