In this episode, we'll demonstrate how to process numerical data that we'll later use to train our very first artificial neural network. 


## Samples and Labels

To train any neural network in a supervised learning task, we first need a data set of samples and the corresponding labels for those samples.

When referring to samples, we're just referring to the underlying data set, where each individual item or data point within that set is called a sample. Labels are the corresponding labels for the samples.

**Note that in deep learning, samples are also commonly referred to as input data or inputs, and labels are also commonly referred to as target data or targets.**

###  Expected data format

When preparing data, we first need to understand the format that the data need to be in for the end goal we have in mind. In our case, we want our data to be in a format that we can pass to a neural network model.

The first model we'll build in an upcoming episode will be a **Sequential model** from the Keras API integrated within TensorFlow.

The Sequential model receives data during training, which occurs when we call the ***fit()*** function on the model.

[Documentation of fit() function](https://www.tensorflow.org/api_docs/python/tf/keras/Sequential#fit)

In the ***fit()*** function: **x** is the input data and **y** are the labels for that input data in the same format or data structure.

## Process data in code

We'll start out with a very simple classification task using a simple numerical data set.

We first need to import the libraries we'll be working with. 

In [16]:
import numpy as np
from random import randint
from sklearn.utils import shuffle
from sklearn.preprocessing import MinMaxScaler

Next, we create two empty lists. One will hold the **input data**, the other will hold the **target data or labels**. 

In [17]:
train_labels = []
train_samples = []

### Data Creation

For this simple task, we'll be creating our own example data set.

As motivation for this data, let's suppose that an experimental drug was tested on individuals ranging from age 13 to 100 in a clinical trial. The trial had **2100** participants. Half of the participants were under 65 years old, and the other half was 65 years of age or older.

The trial showed that around 95% of patients 65 or older experienced side effects from the drug, and around 95% of patients under 65 experienced no side effects, generally showing that elderly individuals were more likely to experience side effects.

Ultimately, we want to build a model to tell us whether or not a patient will experience side effects solely based on the patient's age. The judgement of the model will be based on the training data.

**Labels:**
- 1: patient did experience side effects
- 0: patient didn´t experience side effects

In [18]:
for i in range(50):
    # The ~5% of younger individuals who did experience side effects
    random_younger = randint(13,64)
    train_samples.append(random_younger)
    train_labels.append(1)

    # The ~5% of older individuals who did not experience side effects
    random_older = randint(65,100)
    train_samples.append(random_older)
    train_labels.append(0)

for i in range(1000):
    # The ~95% of younger individuals who did not experience side effects
    random_younger = randint(13,64)
    train_samples.append(random_younger)
    train_labels.append(0)

    # The ~95% of older individuals who did experience side effects
    random_older = randint(65,100)
    train_samples.append(random_older)
    train_labels.append(1)

This is what the train_samples data looks like.

In [19]:
for i in train_samples:
    print(i)

38
89
18
78
42
75
19
95
55
82
34
78
50
66
52
93
63
81
22
83
32
77
59
75
34
88
43
97
39
81
38
90
62
70
14
87
20
95
48
84
24
76
40
97
60
87
39
79
27
91
13
76
31
91
26
78
17
66
33
83
35
94
38
99
34
89
49
85
38
94
49
69
33
74
45
67
60
94
32
89
38
93
60
84
60
82
44
69
43
77
45
87
45
88
50
87
31
93
60
85
64
83
18
92
46
82
34
93
51
65
46
78
54
77
56
98
53
94
27
68
17
87
23
76
48
77
29
70
62
85
59
93
56
72
26
99
13
95
20
65
48
95
40
85
63
87
46
96
16
71
14
65
44
81
30
76
16
86
25
79
36
91
55
87
46
65
43
77
14
83
21
96
29
90
29
98
59
95
44
88
17
94
43
84
23
99
50
95
25
72
40
83
27
91
49
97
30
89
40
67
51
100
31
88
56
78
31
86
45
82
33
87
31
92
64
91
13
67
53
95
42
89
62
85
52
83
64
86
26
80
35
88
20
78
14
67
55
65
13
84
19
78
55
67
58
82
63
77
62
100
35
67
17
65
20
70
21
75
24
89
52
77
63
98
26
90
19
100
36
73
46
93
42
72
61
87
25
93
47
77
58
79
39
76
57
79
41
85
24
86
43
98
22
95
47
96
44
84
14
94
59
75
37
67
16
95
37
65
19
98
21
91
59
100
33
65
16
93
19
81
39
94
40
67
17
74
45
70
39
66
49
77


This is what the train_labels look like.

In [20]:
for i in train_labels:
    print(i)

1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1


### Data Processing

We now convert both lists into numpy arrays due to what we discussed the fit() function expects, and we then shuffle the arrays to remove any order that was imposed on the data during the creation process.

In [21]:
train_labels = np.array(train_labels)
train_samples = np.array(train_samples)
train_labels, train_samples = shuffle(train_labels, train_samples)

In this form, we now have the ability to pass the data to the model because it is now in the required format, however, before doing that, we'll first scale the data down to a range from 0 to 1.

We'll use **scikit-learn's MinMaxScaler class** to scale all of the data down from a scale ranging from 13 to 100 to be on a scale from 0 to 1.

We reshape the data as a technical requirement just since the **fit_transform()** function doesn't accept 1D data by default. 

In [22]:
scaler = MinMaxScaler(feature_range=(0,1))
scaled_train_samples = scaler.fit_transform(train_samples.reshape(-1,1))

Now that the data has been scaled, let's iterate over the scaled data to see what it looks like now. 

In [23]:
for i in scaled_train_samples:
    print(i)

[0.97701149]
[0.22988506]
[0.85057471]
[0.71264368]
[0.90804598]
[0.98850575]
[0.26436782]
[1.]
[0.59770115]
[0.52873563]
[0.5862069]
[0.90804598]
[1.]
[0.90804598]
[0.37931034]
[0.42528736]
[0.66666667]
[0.11494253]
[0.16091954]
[0.50574713]
[0.93103448]
[0.71264368]
[0.52873563]
[0.03448276]
[0.8045977]
[0.3908046]
[0.31034483]
[0.35632184]
[0.72413793]
[0.70114943]
[0.82758621]
[0.26436782]
[0.72413793]
[0.10344828]
[0.93103448]
[0.05747126]
[0.83908046]
[0.16091954]
[0.10344828]
[0.55172414]
[0.96551724]
[1.]
[0.22988506]
[1.]
[0.06896552]
[0.10344828]
[0.33333333]
[0.16091954]
[0.89655172]
[0.6091954]
[0.81609195]
[0.44827586]
[0.8045977]
[0.03448276]
[0.27586207]
[0.83908046]
[0.33333333]
[1.]
[0.03448276]
[0.56321839]
[0.97701149]
[0.67816092]
[0.45977011]
[0.4137931]
[0.98850575]
[0.32183908]
[0.36781609]
[0.]
[0.09195402]
[0.36781609]
[0.13793103]
[0.09195402]
[0.04597701]
[0.43678161]
[0.]
[0.49425287]
[0.62068966]
[0.89655172]
[0.63218391]
[0.16091954]
[0.56321839]
[0.977011

[0.06896552]
[0.05747126]
[0.68965517]
[0.91954023]
[0.79310345]
[1.]
[0.65517241]
[0.5862069]
[0.42528736]
[0.72413793]
[0.16091954]
[0.03448276]
[0.72413793]
[0.03448276]
[0.7816092]
[0.94252874]
[0.54022989]
[0.74712644]
[0.66666667]
[0.62068966]
[1.]
[0.05747126]
[0.7816092]
[0.35632184]
[1.]
[0.89655172]
[0.67816092]
[0.75862069]
[0.25287356]
[0.87356322]
[0.5862069]
[0.86206897]
[0.95402299]
[0.71264368]
[0.52873563]
[0.04597701]
[0.48275862]
[0.57471264]
[0.47126437]
[0.94252874]
[0.36781609]
[0.37931034]
[0.65517241]
[0.50574713]
[0.03448276]
[0.90804598]
[0.68965517]
[0.67816092]
[0.97701149]
[0.43678161]
[0.94252874]
[0.25287356]
[0.51724138]
[0.3908046]
[0.11494253]
[0.2183908]
[0.20689655]
[0.44827586]
[0.93103448]
[0.63218391]
[0.12643678]
[0.40229885]
[0.94252874]
[0.36781609]
[0.96551724]
[0.42528736]
[0.35632184]
[0.62068966]
[0.31034483]
[0.97701149]
[0.09195402]
[0.31034483]
[0.6091954]
[0.8045977]
[0.77011494]
[0.94252874]
[0.85057471]
[0.98850575]
[0.14942529]
[0.98

[0.10344828]
[0.04597701]
[0.74712644]
[0.44827586]
[0.27586207]
[0.05747126]
[0.31034483]
[0.32183908]
[0.74712644]
[0.71264368]
[0.74712644]
[0.73563218]
[0.88505747]
[0.73563218]
[0.72413793]
[0.90804598]
[0.73563218]
[0.81609195]
[0.95402299]
[0.12643678]
[0.62068966]
[0.05747126]
[0.87356322]
[0.27586207]
[0.09195402]
[0.44827586]
[0.52873563]
[0.08045977]
[0.20689655]
[0.05747126]
[0.14942529]
[0.42528736]
[0.91954023]
[0.34482759]
[0.24137931]
[0.13793103]
[0.18390805]
[0.3908046]
[0.01149425]
[0.65517241]
[0.13793103]
[0.24137931]
[0.85057471]
[0.74712644]
[0.71264368]
[0.87356322]
[0.50574713]
[0.29885057]
[0.20689655]
[0.85057471]
[0.4137931]
[0.67816092]
[0.6091954]
[0.97701149]
[0.35632184]
[0.64367816]
[0.37931034]
[0.35632184]
[0.5862069]
[0.45977011]
[0.96551724]
[0.90804598]
[0.42528736]
[0.08045977]
[0.09195402]
[0.68965517]
[0.49425287]
[0.28735632]
[0.85057471]
[0.89655172]
[0.40229885]
[0.66666667]
[0.98850575]
[0.10344828]
[0.31034483]
[0.48275862]
[0.64367816]
[0.

In [24]:
print(scaled_train_samples.shape)

(2100, 1)


At this point, we've generated some sample raw data, put it into the numpy format that our model will require, and rescaled it to a scale ranging from 0 to 1.

In an upcoming episode, we'll use this data to train a neural network and see what kind of results we can get. 

## Create an artificial neural network with TensorFlow's Keras API

In this episode, we'll demonstrate how to create a simple artificial neural network using a **Sequential model** from the Keras API integrated within TensorFlow.

https://deeplizard.com/images/png/deep%20neural%20network%20with%204%20layers.png

In the last episode, we generated some data from an imagined clinical trial, and now we'll build a simple model for which we can train on this data. 

## Code Setup

First, we need to import all the libraries we'll be making use of.

In [25]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Activation, Dense
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.metrics import categorical_crossentropy

We'll use all of these modules, except for the last two, to **build our neural network**. Note that we'll make use of the last two modules in the next episode when we **train** the model.

A GPU is not required to follow this course, but if you are using one, you'll need to first follow the GPU setup we covered in a previous episode. We can then check to be sure that TensorFlow is able to identify the GPU using the code below. It's also useful to enable memory growth on the GPU. 

In [26]:
physical_devices = tf.config.experimental.list_physical_devices('GPU')
print("Num GPUs Available: ", len(physical_devices))

Num GPUs Available:  0


##  Build a Sequential Model

Let's now create our model. We first create a variable named model and define it as follows. 

In [27]:
model = Sequential([
    Dense(units=16, input_shape=(1,), activation='relu'),
    Dense(units=32, activation='relu'),
    Dense(units=2, activation='softmax')
])

***model*** is an instance of a Sequential object. A tf.keras.Sequential model is a linear stack of layers. It accepts a list, and each element in the list should be a layer.

As you can see, we have passed a list of layers to the Sequential constructor. Let's go through each of the layers in this list now. 

### First Hidden Layer

Our first layer is a **Dense** layer. This type of layer is our standard **fully-connected or densely-connected** neural network layer. The first required parameter that the Dense layer expects is the number of neurons or units the layer has, and we're arbitrarily setting this to 16.

Additionally, the model needs to know the shape of the input data. For this reason, we specify the shape of the input data in the first hidden layer in the model (and only this layer). The parameter called input_shape is how we specify this.

As discussed, we'll be training our network on the data that we generated and processed in the previous episode, and recall, this data is one-dimensional. The input_shape parameter expects a tuple of integers that matches the shape of the input data, so we correspondingly specify (1,) as the input_shape of our one-dimensional data.

You can think of the way we specify the input_shape here as acting as an implicit input layer. The input layer of a neural network is the underlying raw data itself, therefore we don't create an explicit input layer. This first Dense layer that we're working with now is actually the first hidden layer.

Lastly, an optional parameter that we'll set for the Dense layer is the activation function to use after this layer. We'll use the popular choice of **relu**. Note, if you don't explicitly set an activation function, then Keras will use the linear activation function. 

### Second Hidden Layer

Our next layer will also be a Dense layer, and this one will have 32 nodes. The choice of how many neurons this node has is also arbitrary, as the idea is to create a simple model, and then test and experiment with it. If we notice that it is insufficient, then at that time, we can troubleshoot the issue and begin experimenting with changing parameters, like number of layers, nodes, etc.

This Dense layer will also use relu as its activation function.

### Output layer

Lastly, we specify the output layer. This layer is also a Dense layer, and it will have **2 neurons**. This is because we have two possible outputs: either a patient experienced side effects, or the patient did not experience side effects.

This time, the activation function we'll use is softmax, which will give us a probability distribution among the possible outputs. 

Note that we can call summary() on our model to get a quick visualization of it.

In [28]:
model.summary()

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_3 (Dense)              (None, 16)                32        
_________________________________________________________________
dense_4 (Dense)              (None, 32)                544       
_________________________________________________________________
dense_5 (Dense)              (None, 2)                 66        
Total params: 642
Trainable params: 642
Non-trainable params: 0
_________________________________________________________________


Now we've created our very first model using the intuitive tf.keras.Sequential model type. In the next episode we'll train this model on the data we created last time. 

## Train an Artificial Neural Network with TensorFlow's Keras API

In this episode, we'll demonstrate how to train an artificial neural network using the Keras API integrated within TensorFlow.

In the previous episode, we went through the steps to build a simple network, and now we'll focus on training it using data we generated in an even earlier episode.

##  Compiling the model

The first thing we need to do to get the model ready for training is call the **compile()** function on it. 

In [29]:
model.compile(optimizer=Adam(learning_rate=0.0001), loss='sparse_categorical_crossentropy', metrics=['accuracy'])

This function configures the model for training and expects a number of parameters. First, we specify the **optimizer Adam**. Adam accepts an optional parameter **learning_rate**, which we'll set to 0.0001. Adam optimization is a **stochastic gradient descent (SGD) method**.

The next parameter we specify is **loss**. We'll be using **sparse_categorical_crossentropy**, given that our labels are in integer format.

Note that when we have only two classes, we could instead configure our output layer to have only one output, rather than two, and use **binary_crossentropy as our loss**, rather than categorical_crossentropy. Both options work equally well and achieve the exact same result.

With **binary_crossentropy**, however, the last layer would need to use **sigmoid**, rather than softmax, as its activation function.

Moving on, the last parameter we specify in **compile()** is metrics. This parameter expects a list of metrics that we'd like to be evaluated by the model during training and testing. We'll set this to a list that contains the string **‘accuracy'**. 

## Training the Model

 Now that the model is compiled, we can train it using the fit() function.
 
The first item that we pass in to the fit() function is the training set **x**. Recall from a previous episode, we created the training set and gave it the name scaled_train_samples.

The next parameter that we set is the labels for the training set **y**, which we previously gave the name train_labels. We then specify the **batch_size**.

Next, we specify how many **epochs** we want to run. We set this to 30. Note that an epoch is a single pass of all the data to the network.

Lastly, we specify **verbose=2**. This just specifies how much output to the console we want to see during each epoch of training. The verbosity levels range from 0 to 2, so we're getting the most verbose output. 

In [30]:
# model.fit(x=scaled_train_samples, y=train_labels, batch_size=10, epochs=30, verbose=2)

We can see corresponding output for each of the 30 epochs. Judging by the loss and accuracy, we can see that both metrics steadily improve over time with accuracy reaching almost 94% and loss steadily decreasing until we reach 0.25.

Note that although this is a very simple model trained on simple data, without much effort, we were able to reach pretty good results in a relatively quick manner of time. In subsequent episodes, we'll demo more complex models as well as more complex data, but hopefully you've become encouraged by how easily we were able to get started with tf.keras. 

##  Build a validation set with TensorFlow's Keras API

In this episode, we'll demonstrate how to use TensorFlow's Keras API to create a validation set on-the-fly during training.

We'll continue working with the same model we built and trained in the previous episode, but first, let's discuss what exactly a validation set is.

## What is a validation set?

Recall that we previously built a training set on which we trained our model. With each epoch that our model is trained, the model will continue to learn the features and characteristics of the data in this training set.

The hope is that later we can take this model, apply it to new data, and have the model accurately predict on data that it hasn't seen before based solely on what it learned from the training set.

Now, let's discuss where the addition of a validation set comes into play.

Before training begins, we can choose to remove a portion of the training set and place it in a validation set. Then, during training, the model will train only on the training set, and it will validate by evaluating the data in the validation set.

Essentially, the model is learning the features of the data in the training set, taking what it's learned from this data, and then predicting on the validation set. During each epoch, we will see not only the loss and accuracy results for the training set, but also for the validation set.

This allows us to see how well the model is generalizing on data it wasn't trained on because, recall, the validation data should not be part of the training data.

This also helps us see whether or not the model is **overfitting**. Overfitting occurs when the model only learns the specifics of the training data and is unable to generalize well on data that it wasn't trained on. Now let's discuss how we can create a validation set. 

## Creating A Validation Set

There are two ways to create a validation set to use with a **tf.keras.Sequential** model. 

###  Manually create validation set

The first way is to create a data structure to hold a validation set, and place data directly in that structure in the same nature we did for the training set.

This data structure should be a tuple **valid_set = (x_val, y_val)** of Numpy arrays or tensors, where x_val is a numpy array or tensor containing validation samples, and y_val is a numpy array or tensor containing validation labels.

When we call **model.fit()**, we would pass in the validation set in addition to the training set. We pass the validation set by specifying the **validation_data** parameter.

model.fit(
      x=scaled_train_samples
    , y=train_labels
    , validation_data=valid_set
    , batch_size=10
    , epochs=30
    , verbose=2
)

When the model trains, it would continue to train only on the training set, but additionally, it would also be evaluating the validation set.

###  Create Validation Set With Keras

There is another way to create a validation set, and it saves a step!

If we don't already have a specified validation set created, then when we call **model.fit()**, we can set a value for the **validation_split** parameter. It expects a fractional number between 0 and 1. Suppose that we set this parameter to 0.1.

model.fit(
      x=scaled_train_samples
    , y=train_labels
    , validation_split=0.1
    , batch_size=10
    , epochs=30
    , verbose=2
)

With this parameter specified, Keras will split apart a fraction (10% in this example) of the training data to be used as validation data. The model will set apart this fraction of the training data, will not train on it, and will evaluate the loss and any model metrics on this data at the end of each epoch.

Note that the **fit()** function shuffles the data before each epoch by default. When specifying the **validation_split** parameter, however, the validation data is selected from the last samples in the x and y data before shuffling.

**Therefore, in the case we're using validation_split in this way to create our validation data, we need to be sure that our data has been shuffled ahead of time, like we previously did in an earlier episode**. 


##  Interpret Validation Metrics

Now, regardless of which method we use to create validation data, when we call **model.fit()**, then in addition to loss and accuracy being displayed for each epoch as we saw last time, we will now also see **val_loss** and **val_acc** to track the loss and accuracy on the validation set. 

In [31]:
model.fit(x=scaled_train_samples,
          y=train_labels,
          validation_split=0.1,
          batch_size=10,
          epochs=30,
          verbose=2)

Epoch 1/30
189/189 - 0s - loss: 0.7027 - accuracy: 0.5254 - val_loss: 0.6823 - val_accuracy: 0.6714
Epoch 2/30
189/189 - 0s - loss: 0.6670 - accuracy: 0.6788 - val_loss: 0.6459 - val_accuracy: 0.7095
Epoch 3/30
189/189 - 0s - loss: 0.6335 - accuracy: 0.7122 - val_loss: 0.6094 - val_accuracy: 0.7190
Epoch 4/30
189/189 - 0s - loss: 0.5990 - accuracy: 0.7550 - val_loss: 0.5744 - val_accuracy: 0.7619
Epoch 5/30
189/189 - 0s - loss: 0.5661 - accuracy: 0.7899 - val_loss: 0.5409 - val_accuracy: 0.7714
Epoch 6/30
189/189 - 0s - loss: 0.5338 - accuracy: 0.8138 - val_loss: 0.5082 - val_accuracy: 0.8000
Epoch 7/30
189/189 - 0s - loss: 0.5020 - accuracy: 0.8302 - val_loss: 0.4769 - val_accuracy: 0.8429
Epoch 8/30
189/189 - 0s - loss: 0.4709 - accuracy: 0.8455 - val_loss: 0.4473 - val_accuracy: 0.8476
Epoch 9/30
189/189 - 0s - loss: 0.4420 - accuracy: 0.8587 - val_loss: 0.4204 - val_accuracy: 0.8857
Epoch 10/30
189/189 - 0s - loss: 0.4161 - accuracy: 0.8778 - val_loss: 0.3965 - val_accuracy: 0.8952

<tensorflow.python.keras.callbacks.History at 0x20656343a00>

We can now see not only how well our model is learning the features of the training data, but also how well the model is generalizing to new, unseen data from the validation set. Next, we'll see how to use our model for inference. 