In this episode, we'll demonstrate how to process numerical data that we'll later use to train our very first artificial neural network. 


## Samples and Labels

To train any neural network in a supervised learning task, we first need a data set of samples and the corresponding labels for those samples.

When referring to samples, we're just referring to the underlying data set, where each individual item or data point within that set is called a sample. Labels are the corresponding labels for the samples.

**Note that in deep learning, samples are also commonly referred to as input data or inputs, and labels are also commonly referred to as target data or targets.**

###  Expected data format

When preparing data, we first need to understand the format that the data need to be in for the end goal we have in mind. In our case, we want our data to be in a format that we can pass to a neural network model.

The first model we'll build in an upcoming episode will be a **Sequential model** from the Keras API integrated within TensorFlow.

The Sequential model receives data during training, which occurs when we call the ***fit()*** function on the model.

[Documentation of fit() function](https://www.tensorflow.org/api_docs/python/tf/keras/Sequential#fit)

In the ***fit()*** function: **x** is the input data and **y** are the labels for that input data in the same format or data structure.

## Process data in code

We'll start out with a very simple classification task using a simple numerical data set.

We first need to import the libraries we'll be working with. 

In [1]:
import numpy as np
from random import randint
from sklearn.utils import shuffle
from sklearn.preprocessing import MinMaxScaler

Next, we create two empty lists. One will hold the **input data**, the other will hold the **target data or labels**. 

In [2]:
train_labels = []
train_samples = []

### Data Creation

For this simple task, we'll be creating our own example data set.

As motivation for this data, let's suppose that an experimental drug was tested on individuals ranging from age 13 to 100 in a clinical trial. The trial had **2100** participants. Half of the participants were under 65 years old, and the other half was 65 years of age or older.

The trial showed that around 95% of patients 65 or older experienced side effects from the drug, and around 95% of patients under 65 experienced no side effects, generally showing that elderly individuals were more likely to experience side effects.

Ultimately, we want to build a model to tell us whether or not a patient will experience side effects solely based on the patient's age. The judgement of the model will be based on the training data.

**Labels:**
- 1: patient did experience side effects
- 0: patient didn´t experience side effects

In [3]:
for i in range(50):
    # The ~5% of younger individuals who did experience side effects
    random_younger = randint(13,64)
    train_samples.append(random_younger)
    train_labels.append(1)

    # The ~5% of older individuals who did not experience side effects
    random_older = randint(65,100)
    train_samples.append(random_older)
    train_labels.append(0)

for i in range(1000):
    # The ~95% of younger individuals who did not experience side effects
    random_younger = randint(13,64)
    train_samples.append(random_younger)
    train_labels.append(0)

    # The ~95% of older individuals who did experience side effects
    random_older = randint(65,100)
    train_samples.append(random_older)
    train_labels.append(1)

This is what the train_samples data looks like.

In [4]:
for i in train_samples:
    print(i)

52
70
38
73
36
83
16
83
37
69
56
68
30
74
26
70
45
65
43
96
43
91
38
81
52
89
45
65
13
78
35
86
21
65
51
67
39
90
46
65
14
82
13
81
58
86
20
83
37
74
60
88
30
82
56
84
50
97
25
77
45
74
62
74
51
86
19
98
47
81
43
91
35
85
43
85
22
89
27
67
16
72
22
83
64
73
36
67
62
69
23
69
17
79
53
99
49
84
56
85
41
93
22
96
58
75
26
86
33
88
30
65
49
85
49
98
60
74
44
77
24
79
63
66
63
74
25
95
47
84
38
76
23
93
62
100
30
89
41
77
18
89
41
93
23
96
64
82
35
94
23
74
37
92
16
97
61
88
17
90
52
81
52
80
19
79
39
66
23
88
38
73
61
75
27
86
64
92
38
86
40
100
18
96
52
90
53
93
57
83
49
100
61
92
43
67
51
76
24
89
19
67
48
87
63
78
16
71
40
67
39
98
50
78
61
98
35
97
47
74
56
91
42
66
39
87
46
97
41
98
29
95
46
67
29
71
60
75
57
73
40
82
25
70
38
66
14
66
27
69
44
80
39
73
63
96
23
95
37
81
18
90
22
80
19
86
44
97
44
90
44
97
21
80
29
84
52
91
57
87
36
84
34
86
14
80
64
95
50
85
35
67
17
78
38
86
46
69
45
78
15
69
21
98
60
69
36
71
25
95
23
98
50
81
44
86
34
76
27
85
43
78
28
100
40
66
24
67
36
99
30
96


This is what the train_labels look like.

In [5]:
for i in train_labels:
    print(i)

1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1


### Data Processing

We now convert both lists into numpy arrays due to what we discussed the fit() function expects, and we then shuffle the arrays to remove any order that was imposed on the data during the creation process.

In [6]:
train_labels = np.array(train_labels)
train_samples = np.array(train_samples)
train_labels, train_samples = shuffle(train_labels, train_samples)

In this form, we now have the ability to pass the data to the model because it is now in the required format, however, before doing that, we'll first scale the data down to a range from 0 to 1.

We'll use **scikit-learn's MinMaxScaler class** to scale all of the data down from a scale ranging from 13 to 100 to be on a scale from 0 to 1.

We reshape the data as a technical requirement just since the **fit_transform()** function doesn't accept 1D data by default. 

In [7]:
scaler = MinMaxScaler(feature_range=(0,1))
scaled_train_samples = scaler.fit_transform(train_samples.reshape(-1,1))

Now that the data has been scaled, let's iterate over the scaled data to see what it looks like now. 

In [8]:
for i in scaled_train_samples:
    print(i)

[0.87356322]
[0.59770115]
[0.68965517]
[0.95402299]
[0.3908046]
[0.71264368]
[0.40229885]
[0.75862069]
[0.72413793]
[0.27586207]
[0.43678161]
[0.96551724]
[0.50574713]
[0.88505747]
[0.11494253]
[0.79310345]
[0.64367816]
[0.20689655]
[0.70114943]
[0.13793103]
[0.3908046]
[0.73563218]
[0.]
[0.16091954]
[0.71264368]
[0.96551724]
[0.49425287]
[0.7816092]
[0.24137931]
[0.91954023]
[0.57471264]
[0.91954023]
[0.93103448]
[0.96551724]
[0.50574713]
[0.43678161]
[0.09195402]
[0.75862069]
[0.71264368]
[0.91954023]
[0.96551724]
[0.88505747]
[0.03448276]
[0.66666667]
[0.5862069]
[0.64367816]
[0.98850575]
[0.64367816]
[0.45977011]
[0.34482759]
[0.5862069]
[0.47126437]
[0.85057471]
[0.14942529]
[0.3908046]
[0.5862069]
[0.22988506]
[0.73563218]
[0.48275862]
[0.64367816]
[0.48275862]
[0.45977011]
[0.31034483]
[0.35632184]
[0.72413793]
[0.2183908]
[0.81609195]
[0.5862069]
[0.63218391]
[0.96551724]
[0.62068966]
[0.51724138]
[0.98850575]
[0.81609195]
[0.25287356]
[0.96551724]
[0.27586207]
[0.72413793]
[0.

[0.71264368]
[0.55172414]
[0.24137931]
[0.]
[0.75862069]
[0.18390805]
[0.83908046]
[0.44827586]
[0.82758621]
[0.91954023]
[0.22988506]
[0.56321839]
[0.29885057]
[0.01149425]
[0.26436782]
[0.50574713]
[0.79310345]
[0.93103448]
[0.43678161]
[0.82758621]
[0.22988506]
[0.25287356]
[0.17241379]
[0.82758621]
[0.96551724]
[0.65517241]
[0.72413793]
[0.67816092]
[0.68965517]
[0.88505747]
[0.66666667]
[0.5862069]
[0.47126437]
[0.56321839]
[0.13793103]
[0.88505747]
[0.81609195]
[0.65517241]
[0.85057471]
[0.72413793]
[0.98850575]
[0.02298851]
[0.89655172]
[0.44827586]
[0.05747126]
[0.88505747]
[0.75862069]
[0.71264368]
[0.56321839]
[0.95402299]
[0.6091954]
[0.81609195]
[0.2183908]
[0.43678161]
[0.70114943]
[0.75862069]
[0.98850575]
[0.02298851]
[0.3908046]
[0.98850575]
[0.51724138]
[0.55172414]
[0.36781609]
[0.16091954]
[0.73563218]
[0.83908046]
[0.06896552]
[0.81609195]
[0.63218391]
[0.17241379]
[0.88505747]
[0.89655172]
[0.89655172]
[0.47126437]
[0.97701149]
[0.75862069]
[0.02298851]
[0.97701149

[0.77011494]
[0.28735632]
[0.02298851]
[0.]
[0.26436782]
[0.14942529]
[0.95402299]
[0.75862069]
[0.37931034]
[0.35632184]
[0.62068966]
[0.79310345]
[0.09195402]
[0.91954023]
[0.68965517]
[0.24137931]
[0.24137931]
[0.71264368]
[0.63218391]
[0.33333333]
[0.5862069]
[0.96551724]
[0.85057471]
[0.93103448]
[0.79310345]
[0.68965517]
[0.66666667]
[0.10344828]
[0.90804598]
[0.62068966]
[0.82758621]
[0.24137931]
[0.65517241]
[0.83908046]
[0.10344828]
[0.8045977]
[0.85057471]
[0.28735632]
[0.94252874]
[0.81609195]
[0.88505747]
[0.70114943]
[0.66666667]
[0.25287356]
[0.95402299]
[0.96551724]
[0.97701149]
[0.81609195]
[1.]
[0.57471264]
[0.04597701]
[0.51724138]
[0.28735632]
[0.81609195]
[0.82758621]
[0.98850575]
[0.49425287]
[0.08045977]
[0.91954023]
[0.09195402]
[0.52873563]
[0.12643678]
[0.22988506]
[0.82758621]
[0.2183908]
[0.55172414]
[0.85057471]
[0.72413793]
[0.04597701]
[0.79310345]
[0.59770115]
[0.14942529]
[0.64367816]
[0.59770115]
[0.81609195]
[0.94252874]
[0.90804598]
[0.28735632]
[0.95

In [9]:
print(scaled_train_samples.shape)

(2100, 1)


At this point, we've generated some sample raw data, put it into the numpy format that our model will require, and rescaled it to a scale ranging from 0 to 1.

In an upcoming episode, we'll use this data to train a neural network and see what kind of results we can get. 

## Create an artificial neural network with TensorFlow's Keras API

In this episode, we'll demonstrate how to create a simple artificial neural network using a **Sequential model** from the Keras API integrated within TensorFlow.

https://deeplizard.com/images/png/deep%20neural%20network%20with%204%20layers.png

In the last episode, we generated some data from an imagined clinical trial, and now we'll build a simple model for which we can train on this data. 

## Code Setup

First, we need to import all the libraries we'll be making use of.

In [10]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Activation, Dense
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.metrics import categorical_crossentropy

We'll use all of these modules, except for the last two, to **build our neural network**. Note that we'll make use of the last two modules in the next episode when we **train** the model.

A GPU is not required to follow this course, but if you are using one, you'll need to first follow the GPU setup we covered in a previous episode. We can then check to be sure that TensorFlow is able to identify the GPU using the code below. It's also useful to enable memory growth on the GPU. 

In [11]:
physical_devices = tf.config.experimental.list_physical_devices('GPU')
print("Num GPUs Available: ", len(physical_devices))

Num GPUs Available:  0


##  Build a Sequential Model

Let's now create our model. We first create a variable named model and define it as follows. 

In [12]:
model = Sequential([
    Dense(units=16, input_shape=(1,), activation='relu'),
    Dense(units=32, activation='relu'),
    Dense(units=2, activation='softmax')
])

***model*** is an instance of a Sequential object. A tf.keras.Sequential model is a linear stack of layers. It accepts a list, and each element in the list should be a layer.

As you can see, we have passed a list of layers to the Sequential constructor. Let's go through each of the layers in this list now. 

### First Hidden Layer

Our first layer is a **Dense** layer. This type of layer is our standard **fully-connected or densely-connected** neural network layer. The first required parameter that the Dense layer expects is the number of neurons or units the layer has, and we're arbitrarily setting this to 16.

Additionally, the model needs to know the shape of the input data. For this reason, we specify the shape of the input data in the first hidden layer in the model (and only this layer). The parameter called input_shape is how we specify this.

As discussed, we'll be training our network on the data that we generated and processed in the previous episode, and recall, this data is one-dimensional. The input_shape parameter expects a tuple of integers that matches the shape of the input data, so we correspondingly specify (1,) as the input_shape of our one-dimensional data.

You can think of the way we specify the input_shape here as acting as an implicit input layer. The input layer of a neural network is the underlying raw data itself, therefore we don't create an explicit input layer. This first Dense layer that we're working with now is actually the first hidden layer.

Lastly, an optional parameter that we'll set for the Dense layer is the activation function to use after this layer. We'll use the popular choice of **relu**. Note, if you don't explicitly set an activation function, then Keras will use the linear activation function. 

### Second Hidden Layer

Our next layer will also be a Dense layer, and this one will have 32 nodes. The choice of how many neurons this node has is also arbitrary, as the idea is to create a simple model, and then test and experiment with it. If we notice that it is insufficient, then at that time, we can troubleshoot the issue and begin experimenting with changing parameters, like number of layers, nodes, etc.

This Dense layer will also use relu as its activation function.

### Output layer

Lastly, we specify the output layer. This layer is also a Dense layer, and it will have **2 neurons**. This is because we have two possible outputs: either a patient experienced side effects, or the patient did not experience side effects.

This time, the activation function we'll use is softmax, which will give us a probability distribution among the possible outputs. 

Note that we can call summary() on our model to get a quick visualization of it.

In [13]:
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense (Dense)                (None, 16)                32        
_________________________________________________________________
dense_1 (Dense)              (None, 32)                544       
_________________________________________________________________
dense_2 (Dense)              (None, 2)                 66        
Total params: 642
Trainable params: 642
Non-trainable params: 0
_________________________________________________________________


Now we've created our very first model using the intuitive tf.keras.Sequential model type. In the next episode we'll train this model on the data we created last time. 

## Train an Artificial Neural Network with TensorFlow's Keras API

In this episode, we'll demonstrate how to train an artificial neural network using the Keras API integrated within TensorFlow.

In the previous episode, we went through the steps to build a simple network, and now we'll focus on training it using data we generated in an even earlier episode.

##  Compiling the model

The first thing we need to do to get the model ready for training is call the **compile()** function on it. 

In [14]:
model.compile(optimizer=Adam(learning_rate=0.0001), loss='sparse_categorical_crossentropy', metrics=['accuracy'])

This function configures the model for training and expects a number of parameters. First, we specify the **optimizer Adam**. Adam accepts an optional parameter **learning_rate**, which we'll set to 0.0001. Adam optimization is a **stochastic gradient descent (SGD) method**.

The next parameter we specify is **loss**. We'll be using **sparse_categorical_crossentropy**, given that our labels are in integer format.

Note that when we have only two classes, we could instead configure our output layer to have only one output, rather than two, and use **binary_crossentropy as our loss**, rather than categorical_crossentropy. Both options work equally well and achieve the exact same result.

With **binary_crossentropy**, however, the last layer would need to use **sigmoid**, rather than softmax, as its activation function.

Moving on, the last parameter we specify in **compile()** is metrics. This parameter expects a list of metrics that we'd like to be evaluated by the model during training and testing. We'll set this to a list that contains the string **‘accuracy'**. 

## Training the Model

 Now that the model is compiled, we can train it using the fit() function.
 
The first item that we pass in to the fit() function is the training set **x**. Recall from a previous episode, we created the training set and gave it the name scaled_train_samples.

The next parameter that we set is the labels for the training set **y**, which we previously gave the name train_labels. We then specify the **batch_size**.

Next, we specify how many **epochs** we want to run. We set this to 30. Note that an epoch is a single pass of all the data to the network.

Lastly, we specify **verbose=2**. This just specifies how much output to the console we want to see during each epoch of training. The verbosity levels range from 0 to 2, so we're getting the most verbose output. 

In [15]:
model.fit(x=scaled_train_samples, y=train_labels, batch_size=10, epochs=30, verbose=2)

Epoch 1/30
210/210 - 0s - loss: 0.6903 - accuracy: 0.4700
Epoch 2/30
210/210 - 0s - loss: 0.6539 - accuracy: 0.6186
Epoch 3/30
210/210 - 0s - loss: 0.6190 - accuracy: 0.7052
Epoch 4/30
210/210 - 0s - loss: 0.5840 - accuracy: 0.7590
Epoch 5/30
210/210 - 0s - loss: 0.5489 - accuracy: 0.7862
Epoch 6/30
210/210 - 0s - loss: 0.5133 - accuracy: 0.8071
Epoch 7/30
210/210 - 0s - loss: 0.4782 - accuracy: 0.8352
Epoch 8/30
210/210 - 0s - loss: 0.4445 - accuracy: 0.8652
Epoch 9/30
210/210 - 0s - loss: 0.4146 - accuracy: 0.8633
Epoch 10/30
210/210 - 0s - loss: 0.3877 - accuracy: 0.8829
Epoch 11/30
210/210 - 0s - loss: 0.3643 - accuracy: 0.8929
Epoch 12/30
210/210 - 0s - loss: 0.3445 - accuracy: 0.9057
Epoch 13/30
210/210 - 0s - loss: 0.3278 - accuracy: 0.9052
Epoch 14/30
210/210 - 0s - loss: 0.3138 - accuracy: 0.9133
Epoch 15/30
210/210 - 0s - loss: 0.3023 - accuracy: 0.9210
Epoch 16/30
210/210 - 0s - loss: 0.2927 - accuracy: 0.9210
Epoch 17/30
210/210 - 0s - loss: 0.2849 - accuracy: 0.9243
Epoch 

<tensorflow.python.keras.callbacks.History at 0x20654cbf8e0>

We can see corresponding output for each of the 30 epochs. Judging by the loss and accuracy, we can see that both metrics steadily improve over time with accuracy reaching almost 94% and loss steadily decreasing until we reach 0.25.

Note that although this is a very simple model trained on simple data, without much effort, we were able to reach pretty good results in a relatively quick manner of time. In subsequent episodes, we'll demo more complex models as well as more complex data, but hopefully you've become encouraged by how easily we were able to get started with tf.keras. 