In this episode, we'll demonstrate how to process numerical data that we'll later use to train our very first artificial neural network. 


## Samples and Labels

To train any neural network in a supervised learning task, we first need a data set of samples and the corresponding labels for those samples.

When referring to samples, we're just referring to the underlying data set, where each individual item or data point within that set is called a sample. Labels are the corresponding labels for the samples.

**Note that in deep learning, samples are also commonly referred to as input data or inputs, and labels are also commonly referred to as target data or targets.**

###  Expected data format

When preparing data, we first need to understand the format that the data need to be in for the end goal we have in mind. In our case, we want our data to be in a format that we can pass to a neural network model.

The first model we'll build in an upcoming episode will be a **Sequential model** from the Keras API integrated within TensorFlow.

The Sequential model receives data during training, which occurs when we call the ***fit()*** function on the model.

[Documentation of fit() function](https://www.tensorflow.org/api_docs/python/tf/keras/Sequential#fit)

In the ***fit()*** function: **x** is the input data and **y** are the labels for that input data in the same format or data structure.

## Process data in code

We'll start out with a very simple classification task using a simple numerical data set.

We first need to import the libraries we'll be working with. 

In [1]:
import numpy as np
from random import randint
from sklearn.utils import shuffle
from sklearn.preprocessing import MinMaxScaler

Next, we create two empty lists. One will hold the **input data**, the other will hold the **target data or labels**. 

In [2]:
train_labels = []
train_samples = []

### Data Creation

For this simple task, we'll be creating our own example data set.

As motivation for this data, let's suppose that an experimental drug was tested on individuals ranging from age 13 to 100 in a clinical trial. The trial had **2100** participants. Half of the participants were under 65 years old, and the other half was 65 years of age or older.

The trial showed that around 95% of patients 65 or older experienced side effects from the drug, and around 95% of patients under 65 experienced no side effects, generally showing that elderly individuals were more likely to experience side effects.

Ultimately, we want to build a model to tell us whether or not a patient will experience side effects solely based on the patient's age. The judgement of the model will be based on the training data.

**Labels:**
- 1: patient did experience side effects
- 0: patient didn´t experience side effects

In [3]:
for i in range(50):
    # The ~5% of younger individuals who did experience side effects
    random_younger = randint(13,64)
    train_samples.append(random_younger)
    train_labels.append(1)

    # The ~5% of older individuals who did not experience side effects
    random_older = randint(65,100)
    train_samples.append(random_older)
    train_labels.append(0)

for i in range(1000):
    # The ~95% of younger individuals who did not experience side effects
    random_younger = randint(13,64)
    train_samples.append(random_younger)
    train_labels.append(0)

    # The ~95% of older individuals who did experience side effects
    random_older = randint(65,100)
    train_samples.append(random_older)
    train_labels.append(1)

This is what the train_samples data looks like.

In [4]:
for i in train_samples:
    print(i)

42
97
49
85
48
85
57
90
34
69
36
96
45
77
24
96
15
71
15
83
53
69
40
96
30
72
27
99
13
97
18
89
36
66
36
76
60
65
20
73
32
78
33
83
22
78
41
94
21
86
39
98
37
93
58
93
63
77
25
88
39
66
58
79
24
92
36
69
36
69
18
69
20
71
44
89
55
98
55
89
49
84
13
70
56
79
47
98
58
71
55
100
55
89
37
97
25
75
42
99
58
83
51
71
64
79
64
84
29
66
64
83
60
83
24
68
53
90
54
87
26
97
55
89
34
98
15
79
30
87
27
92
28
84
30
69
56
69
52
77
22
79
13
90
13
97
47
92
61
94
64
66
59
89
32
70
55
91
59
65
64
68
23
94
43
68
25
75
32
94
62
99
51
82
35
85
48
65
14
91
19
99
42
67
60
93
60
92
17
80
29
84
55
91
57
74
39
67
18
98
45
94
62
81
58
76
47
76
34
88
55
98
14
88
15
87
54
91
29
65
25
79
33
100
63
97
63
71
20
77
44
91
27
66
24
94
51
96
22
76
26
98
31
79
22
100
21
77
21
93
63
92
42
93
31
76
49
83
20
90
14
97
34
73
29
86
44
99
63
73
64
71
46
72
47
77
54
92
45
83
15
71
61
77
43
86
53
87
16
74
17
65
35
70
50
98
32
86
49
73
25
97
28
84
59
99
58
76
44
83
30
65
49
91
35
97
55
76
47
100
38
70
41
73
36
81
56
70
29
85
22
66


This is what the train_labels look like.

In [5]:
for i in train_labels:
    print(i)

1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1


### Data Processing

We now convert both lists into numpy arrays due to what we discussed the fit() function expects, and we then shuffle the arrays to remove any order that was imposed on the data during the creation process.

In [6]:
train_labels = np.array(train_labels)
train_samples = np.array(train_samples)
train_labels, train_samples = shuffle(train_labels, train_samples)

In this form, we now have the ability to pass the data to the model because it is now in the required format, however, before doing that, we'll first scale the data down to a range from 0 to 1.

We'll use **scikit-learn's MinMaxScaler class** to scale all of the data down from a scale ranging from 13 to 100 to be on a scale from 0 to 1.

We reshape the data as a technical requirement just since the **fit_transform()** function doesn't accept 1D data by default. 

In [7]:
scaler = MinMaxScaler(feature_range=(0,1))
scaled_train_samples = scaler.fit_transform(train_samples.reshape(-1,1))

Now that the data has been scaled, let's iterate over the scaled data to see what it looks like now. 

In [8]:
for i in scaled_train_samples:
    print(i)

[0.8045977]
[0.02298851]
[0.66666667]
[0.63218391]
[0.2183908]
[0.75862069]
[0.37931034]
[0.24137931]
[0.93103448]
[0.54022989]
[0.22988506]
[0.86206897]
[0.70114943]
[0.34482759]
[0.72413793]
[0.7816092]
[0.37931034]
[0.51724138]
[0.59770115]
[0.56321839]
[0.02298851]
[0.59770115]
[0.55172414]
[0.66666667]
[0.89655172]
[0.02298851]
[0.6091954]
[0.03448276]
[0.8045977]
[0.77011494]
[0.63218391]
[0.68965517]
[0.14942529]
[0.64367816]
[0.45977011]
[0.71264368]
[0.62068966]
[0.12643678]
[0.89655172]
[0.87356322]
[0.6091954]
[0.81609195]
[0.35632184]
[0.56321839]
[0.96551724]
[0.93103448]
[0.1954023]
[0.20689655]
[0.79310345]
[0.85057471]
[0.63218391]
[0.44827586]
[0.62068966]
[0.2183908]
[0.33333333]
[0.24137931]
[0.85057471]
[0.96551724]
[0.2183908]
[0.82758621]
[0.]
[0.50574713]
[0.49425287]
[0.98850575]
[0.91954023]
[0.94252874]
[0.1954023]
[0.75862069]
[0.63218391]
[0.17241379]
[0.6091954]
[0.04597701]
[0.05747126]
[1.]
[0.08045977]
[0.96551724]
[0.5862069]
[0.74712644]
[0.91954023]
[

[0.12643678]
[0.08045977]
[0.34482759]
[0.32183908]
[0.36781609]
[0.75862069]
[0.7816092]
[0.88505747]
[0.75862069]
[0.83908046]
[0.43678161]
[0.70114943]
[0.4137931]
[0.2183908]
[0.08045977]
[1.]
[0.50574713]
[0.27586207]
[0.1954023]
[0.74712644]
[0.86206897]
[0.67816092]
[0.62068966]
[0.37931034]
[0.66666667]
[0.98850575]
[0.74712644]
[0.73563218]
[0.17241379]
[0.52873563]
[0.63218391]
[0.02298851]
[0.73563218]
[0.11494253]
[0.04597701]
[0.65517241]
[0.]
[0.56321839]
[0.28735632]
[0.14942529]
[0.87356322]
[0.31034483]
[0.45977011]
[0.66666667]
[0.83908046]
[0.83908046]
[0.98850575]
[0.64367816]
[0.64367816]
[0.5862069]
[0.59770115]
[0.34482759]
[0.54022989]
[0.96551724]
[0.48275862]
[0.75862069]
[0.94252874]
[0.40229885]
[0.54022989]
[0.91954023]
[0.66666667]
[0.03448276]
[0.56321839]
[0.17241379]
[0.28735632]
[0.18390805]
[0.88505747]
[0.97701149]
[0.54022989]
[0.85057471]
[0.65517241]
[0.96551724]
[0.14942529]
[0.09195402]
[0.42528736]
[0.1954023]
[0.33333333]
[0.6091954]
[0.988505

[0.18390805]
[0.14942529]
[0.74712644]
[0.73563218]
[0.04597701]
[0.12643678]
[0.34482759]
[0.90804598]
[0.03448276]
[0.14942529]
[0.1954023]
[0.12643678]
[0.11494253]
[0.06896552]
[0.37931034]
[1.]
[0.17241379]
[0.75862069]
[0.55172414]
[0.90804598]
[0.6091954]
[0.03448276]
[0.63218391]
[0.85057471]
[0.54022989]
[0.45977011]
[0.89655172]
[0.91954023]
[0.50574713]
[0.68965517]
[0.20689655]
[0.]
[0.59770115]
[0.17241379]
[0.71264368]
[0.94252874]
[0.96551724]
[0.49425287]
[0.57471264]
[0.40229885]
[0.87356322]
[0.79310345]
[0.08045977]
[0.83908046]
[0.85057471]
[0.7816092]
[0.65517241]
[0.89655172]
[0.43678161]
[0.54022989]
[0.28735632]
[0.65517241]
[0.52873563]
[0.85057471]
[0.]
[0.12643678]
[0.68965517]
[0.11494253]
[0.2183908]
[0.44827586]
[0.7816092]
[0.06896552]
[0.81609195]
[0.94252874]
[0.03448276]
[0.09195402]
[0.66666667]
[0.70114943]
[0.70114943]
[0.81609195]
[0.51724138]
[0.35632184]
[0.90804598]
[1.]
[0.64367816]
[0.02298851]
[0.1954023]
[0.01149425]
[0.52873563]
[0.51724138

In [9]:
print(scaled_train_samples.shape)

(2100, 1)


At this point, we've generated some sample raw data, put it into the numpy format that our model will require, and rescaled it to a scale ranging from 0 to 1.

In an upcoming episode, we'll use this data to train a neural network and see what kind of results we can get. 

## Create an artificial neural network with TensorFlow's Keras API

In this episode, we'll demonstrate how to create a simple artificial neural network using a **Sequential model** from the Keras API integrated within TensorFlow.

https://deeplizard.com/images/png/deep%20neural%20network%20with%204%20layers.png

In the last episode, we generated some data from an imagined clinical trial, and now we'll build a simple model for which we can train on this data. 

## Code Setup

First, we need to import all the libraries we'll be making use of.

In [10]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Activation, Dense
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.metrics import categorical_crossentropy

We'll use all of these modules, except for the last two, to **build our neural network**. Note that we'll make use of the last two modules in the next episode when we **train** the model.

A GPU is not required to follow this course, but if you are using one, you'll need to first follow the GPU setup we covered in a previous episode. We can then check to be sure that TensorFlow is able to identify the GPU using the code below. It's also useful to enable memory growth on the GPU. 

In [11]:
physical_devices = tf.config.experimental.list_physical_devices('GPU')
print("Num GPUs Available: ", len(physical_devices))

Num GPUs Available:  0


##  Build a Sequential Model

Let's now create our model. We first create a variable named model and define it as follows. 

In [12]:
model = Sequential([
    Dense(units=16, input_shape=(1,), activation='relu'),
    Dense(units=32, activation='relu'),
    Dense(units=2, activation='softmax')
])

***model*** is an instance of a Sequential object. A tf.keras.Sequential model is a linear stack of layers. It accepts a list, and each element in the list should be a layer.

As you can see, we have passed a list of layers to the Sequential constructor. Let's go through each of the layers in this list now. 

### First Hidden Layer

Our first layer is a **Dense** layer. This type of layer is our standard **fully-connected or densely-connected** neural network layer. The first required parameter that the Dense layer expects is the number of neurons or units the layer has, and we're arbitrarily setting this to 16.

Additionally, the model needs to know the shape of the input data. For this reason, we specify the shape of the input data in the first hidden layer in the model (and only this layer). The parameter called input_shape is how we specify this.

As discussed, we'll be training our network on the data that we generated and processed in the previous episode, and recall, this data is one-dimensional. The input_shape parameter expects a tuple of integers that matches the shape of the input data, so we correspondingly specify (1,) as the input_shape of our one-dimensional data.

You can think of the way we specify the input_shape here as acting as an implicit input layer. The input layer of a neural network is the underlying raw data itself, therefore we don't create an explicit input layer. This first Dense layer that we're working with now is actually the first hidden layer.

Lastly, an optional parameter that we'll set for the Dense layer is the activation function to use after this layer. We'll use the popular choice of **relu**. Note, if you don't explicitly set an activation function, then Keras will use the linear activation function. 

### Second Hidden Layer

Our next layer will also be a Dense layer, and this one will have 32 nodes. The choice of how many neurons this node has is also arbitrary, as the idea is to create a simple model, and then test and experiment with it. If we notice that it is insufficient, then at that time, we can troubleshoot the issue and begin experimenting with changing parameters, like number of layers, nodes, etc.

This Dense layer will also use relu as its activation function.

### Output layer

Lastly, we specify the output layer. This layer is also a Dense layer, and it will have **2 neurons**. This is because we have two possible outputs: either a patient experienced side effects, or the patient did not experience side effects.

This time, the activation function we'll use is softmax, which will give us a probability distribution among the possible outputs. 

Note that we can call summary() on our model to get a quick visualization of it.

In [13]:
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense (Dense)                (None, 16)                32        
_________________________________________________________________
dense_1 (Dense)              (None, 32)                544       
_________________________________________________________________
dense_2 (Dense)              (None, 2)                 66        
Total params: 642
Trainable params: 642
Non-trainable params: 0
_________________________________________________________________


Now we've created our very first model using the intuitive tf.keras.Sequential model type. In the next episode we'll train this model on the data we created last time. 

## Train an Artificial Neural Network with TensorFlow's Keras API

In this episode, we'll demonstrate how to train an artificial neural network using the Keras API integrated within TensorFlow.

In the previous episode, we went through the steps to build a simple network, and now we'll focus on training it using data we generated in an even earlier episode.

##  Compiling the model

The first thing we need to do to get the model ready for training is call the **compile()** function on it. 

In [14]:
model.compile(optimizer=Adam(learning_rate=0.0001), loss='sparse_categorical_crossentropy', metrics=['accuracy'])

This function configures the model for training and expects a number of parameters. First, we specify the **optimizer Adam**. Adam accepts an optional parameter **learning_rate**, which we'll set to 0.0001. Adam optimization is a **stochastic gradient descent (SGD) method**.

The next parameter we specify is **loss**. We'll be using **sparse_categorical_crossentropy**, given that our labels are in integer format.

Note that when we have only two classes, we could instead configure our output layer to have only one output, rather than two, and use **binary_crossentropy as our loss**, rather than categorical_crossentropy. Both options work equally well and achieve the exact same result.

With **binary_crossentropy**, however, the last layer would need to use **sigmoid**, rather than softmax, as its activation function.

Moving on, the last parameter we specify in **compile()** is metrics. This parameter expects a list of metrics that we'd like to be evaluated by the model during training and testing. We'll set this to a list that contains the string **‘accuracy'**. 

## Training the Model

 Now that the model is compiled, we can train it using the fit() function.
 
The first item that we pass in to the fit() function is the training set **x**. Recall from a previous episode, we created the training set and gave it the name scaled_train_samples.

The next parameter that we set is the labels for the training set **y**, which we previously gave the name train_labels. We then specify the **batch_size**.

Next, we specify how many **epochs** we want to run. We set this to 30. Note that an epoch is a single pass of all the data to the network.

Lastly, we specify **verbose=2**. This just specifies how much output to the console we want to see during each epoch of training. The verbosity levels range from 0 to 2, so we're getting the most verbose output. 

In [15]:
# model.fit(x=scaled_train_samples, y=train_labels, batch_size=10, epochs=30, verbose=2)

We can see corresponding output for each of the 30 epochs. Judging by the loss and accuracy, we can see that both metrics steadily improve over time with accuracy reaching almost 94% and loss steadily decreasing until we reach 0.25.

Note that although this is a very simple model trained on simple data, without much effort, we were able to reach pretty good results in a relatively quick manner of time. In subsequent episodes, we'll demo more complex models as well as more complex data, but hopefully you've become encouraged by how easily we were able to get started with tf.keras. 

##  Build a validation set with TensorFlow's Keras API

In this episode, we'll demonstrate how to use TensorFlow's Keras API to create a validation set on-the-fly during training.

We'll continue working with the same model we built and trained in the previous episode, but first, let's discuss what exactly a validation set is.

## What is a validation set?

Recall that we previously built a training set on which we trained our model. With each epoch that our model is trained, the model will continue to learn the features and characteristics of the data in this training set.

The hope is that later we can take this model, apply it to new data, and have the model accurately predict on data that it hasn't seen before based solely on what it learned from the training set.

Now, let's discuss where the addition of a validation set comes into play.

Before training begins, we can choose to remove a portion of the training set and place it in a validation set. Then, during training, the model will train only on the training set, and it will validate by evaluating the data in the validation set.

Essentially, the model is learning the features of the data in the training set, taking what it's learned from this data, and then predicting on the validation set. During each epoch, we will see not only the loss and accuracy results for the training set, but also for the validation set.

This allows us to see how well the model is generalizing on data it wasn't trained on because, recall, the validation data should not be part of the training data.

This also helps us see whether or not the model is **overfitting**. Overfitting occurs when the model only learns the specifics of the training data and is unable to generalize well on data that it wasn't trained on. Now let's discuss how we can create a validation set. 

## Creating A Validation Set

There are two ways to create a validation set to use with a **tf.keras.Sequential** model. 

###  Manually create validation set

The first way is to create a data structure to hold a validation set, and place data directly in that structure in the same nature we did for the training set.

This data structure should be a tuple **valid_set = (x_val, y_val)** of Numpy arrays or tensors, where x_val is a numpy array or tensor containing validation samples, and y_val is a numpy array or tensor containing validation labels.

When we call **model.fit()**, we would pass in the validation set in addition to the training set. We pass the validation set by specifying the **validation_data** parameter.

model.fit(
      x=scaled_train_samples
    , y=train_labels
    , validation_data=valid_set
    , batch_size=10
    , epochs=30
    , verbose=2
)

When the model trains, it would continue to train only on the training set, but additionally, it would also be evaluating the validation set.

###  Create Validation Set With Keras

There is another way to create a validation set, and it saves a step!

If we don't already have a specified validation set created, then when we call **model.fit()**, we can set a value for the **validation_split** parameter. It expects a fractional number between 0 and 1. Suppose that we set this parameter to 0.1.

model.fit(
      x=scaled_train_samples
    , y=train_labels
    , validation_split=0.1
    , batch_size=10
    , epochs=30
    , verbose=2
)

With this parameter specified, Keras will split apart a fraction (10% in this example) of the training data to be used as validation data. The model will set apart this fraction of the training data, will not train on it, and will evaluate the loss and any model metrics on this data at the end of each epoch.

Note that the **fit()** function shuffles the data before each epoch by default. When specifying the **validation_split** parameter, however, the validation data is selected from the last samples in the x and y data before shuffling.

**Therefore, in the case we're using validation_split in this way to create our validation data, we need to be sure that our data has been shuffled ahead of time, like we previously did in an earlier episode**. 


##  Interpret Validation Metrics

Now, regardless of which method we use to create validation data, when we call **model.fit()**, then in addition to loss and accuracy being displayed for each epoch as we saw last time, we will now also see **val_loss** and **val_acc** to track the loss and accuracy on the validation set. 

In [16]:
model.fit(x=scaled_train_samples,
          y=train_labels,
          validation_split=0.1,
          batch_size=10,
          epochs=30,
          verbose=2)

Epoch 1/30
189/189 - 0s - loss: 0.6772 - accuracy: 0.5111 - val_loss: 0.6599 - val_accuracy: 0.5857
Epoch 2/30
189/189 - 0s - loss: 0.6441 - accuracy: 0.6212 - val_loss: 0.6231 - val_accuracy: 0.7095
Epoch 3/30
189/189 - 0s - loss: 0.6043 - accuracy: 0.7344 - val_loss: 0.5772 - val_accuracy: 0.7905
Epoch 4/30
189/189 - 0s - loss: 0.5644 - accuracy: 0.7783 - val_loss: 0.5395 - val_accuracy: 0.8238
Epoch 5/30
189/189 - 0s - loss: 0.5288 - accuracy: 0.8196 - val_loss: 0.5012 - val_accuracy: 0.8333
Epoch 6/30
189/189 - 0s - loss: 0.4943 - accuracy: 0.8444 - val_loss: 0.4677 - val_accuracy: 0.8476
Epoch 7/30
189/189 - 0s - loss: 0.4638 - accuracy: 0.8614 - val_loss: 0.4368 - val_accuracy: 0.8571
Epoch 8/30
189/189 - 0s - loss: 0.4364 - accuracy: 0.8741 - val_loss: 0.4094 - val_accuracy: 0.8714
Epoch 9/30
189/189 - 0s - loss: 0.4118 - accuracy: 0.8820 - val_loss: 0.3855 - val_accuracy: 0.8810
Epoch 10/30
189/189 - 0s - loss: 0.3901 - accuracy: 0.8884 - val_loss: 0.3639 - val_accuracy: 0.8905

<tensorflow.python.keras.callbacks.History at 0x16eb79bdf40>

We can now see not only how well our model is learning the features of the training data, but also how well the model is generalizing to new, unseen data from the validation set. Next, we'll see how to use our model for inference. 

## Neural Network Predictions with TensorFlow's Keras API

In this episode, we'll demonstrate how to use a neural network for **inference** to make predictions on data from a test set. We'll continue working with the same **tf.keras.Sequential** model and data that we've used in the last few episodes to do so.

As we touched on previously, when we train a model, the hope is that we'll later be able to take the trained model, apply it to new data, and have the model generalize and accurately predict on data it hasn't seen before.

For example, suppose we have a model that categorizes images of cats or dogs and that the training data contained thousands of images of cats and dogs from a particular data set online. 

###  What is Inference?

Now suppose that later we want to take this model and use it to predict on other images of cats and dogs from a different data set. The hope is that, even though our model wasn't exposed to these particular dog and cat images during training, it will still be able to accurately make predictions for them based on what it's learned from the cat and dog data set from which it was trained.

We call this process **inference**, as the model is using its knowledge gained from training and using it to infer a prediction or result.

At this point, the model we've been working with over the past few episodes has now been trained and validated. Given the results we've seen from the validation data, it appears that this model should do well on predicting on a new test set.

**Note that the test set is the set of data used specifically for inference after training has concluded**. 

## Creating The Test Set

We'll create a test set in the same fashion for which we created the training set. In general, the test set should always be processed in the same way as the training set.

We won't go step-by-step over the code that generates and processes the test data below, as it has already been covered in detail in an earlier episode where we generated the training data, but be sure you have all the imports in place from the previous episodes, as well as all of the existing code up to this point.

In [17]:
test_labels =  []
test_samples = []

for i in range(10):
    # The 5% of younger individuals who did experience side effects
    random_younger = randint(13,64)
    test_samples.append(random_younger)
    test_labels.append(1)

    # The 5% of older individuals who did not experience side effects
    random_older = randint(65,100)
    test_samples.append(random_older)
    test_labels.append(0)

for i in range(200):
    # The 95% of younger individuals who did not experience side effects
    random_younger = randint(13,64)
    test_samples.append(random_younger)
    test_labels.append(0)

    # The 95% of older individuals who did experience side effects
    random_older = randint(65,100)
    test_samples.append(random_older)
    test_labels.append(1)

test_labels = np.array(test_labels)
test_samples = np.array(test_samples)
test_labels, test_samples = shuffle(test_labels, test_samples)

scaled_test_samples = scaler.fit_transform(test_samples.reshape(-1,1))

Note that the **MinMaxScaler object scaler** we're making use of at the end of this code block was defined in a previous episode. 

##  Evaluating the test set

To get predictions from the model for the test set, we call **model.predict()**. 

In [18]:
predictions = model.predict(
      x=scaled_test_samples
    , batch_size=10
    , verbose=0
) 

To this function, we pass in the **test samples x**, specify a **batch_size**, and specify which level of verbosity we want from log messages during prediction generation. The output from the predictions won't be relevant for us, so we're setting verbose=0 for no output.

**Note that, unlike with training and validation sets, we do not pass the labels of the test set to the model during the inference stage.**

To see what the model's predictions look like, we can iterate over them and print them out.

In [19]:
for i in predictions:
    print(i)

[0.22598827 0.7740117 ]
[0.9305548  0.06944521]
[0.04828905 0.951711  ]
[0.07287429 0.92712575]
[0.9305548  0.06944521]
[0.07723728 0.92276275]
[0.03804165 0.9619584 ]
[0.9715029  0.02849705]
[0.970064   0.02993602]
[0.51238465 0.48761535]
[0.87766784 0.12233215]
[0.09499053 0.9050095 ]
[0.06112138 0.9388786 ]
[0.97103095 0.02896909]
[0.06112138 0.9388786 ]
[0.970064   0.02993602]
[0.0648227 0.9351773]
[0.24917096 0.7508291 ]
[0.9711891  0.02881091]
[0.970064   0.02993602]
[0.04287384 0.9571262 ]
[0.41711038 0.5828896 ]
[0.09499053 0.9050095 ]
[0.04550481 0.95449525]
[0.97242117 0.0275788 ]
[0.8632398  0.13676022]
[0.97103095 0.02896909]
[0.9705513  0.02944872]
[0.94270355 0.05729641]
[0.16585408 0.8341459 ]
[0.79087406 0.20912594]
[0.0648227 0.9351773]
[0.970064   0.02993602]
[0.8474058 0.1525942]
[0.9722731  0.02772687]
[0.3276479  0.67235214]
[0.8907664  0.10923357]
[0.09499053 0.9050095 ]
[0.02814519 0.97185487]
[0.96577847 0.0342215 ]
[0.0317618  0.96823823]
[0.10644677 0.89355326

Each element in the predictions list is itself a list of length 2. The sum of the two values in each list is 1. The reason for this is because the two columns contain probabilities for each possible output: experienced side effects and did not experience side effects. Each element in the predictions list is a probability distribution over all possible outputs.

The first column contains the probability for each patient not experiencing side effects, which is represented by a 0. The second column contains the probability for each patient experiencing side effects, which is represented by a 1.

We can also look only at the most probable prediction. 

In [20]:
rounded_predictions = np.argmax(predictions, axis=-1)

for i in rounded_predictions:
    print(i)

1
0
1
1
0
1
1
0
0
0
0
1
1
0
1
0
1
1
0
0
1
1
1
1
0
0
0
0
0
1
0
1
0
0
0
1
0
1
1
0
1
1
0
1
0
1
1
1
1
1
1
1
0
1
1
0
1
0
0
0
0
0
0
1
1
1
1
1
1
0
0
0
1
0
0
0
0
0
0
1
0
0
0
0
1
0
0
1
1
1
0
0
0
0
1
1
1
1
1
1
1
1
1
1
1
0
0
0
1
1
1
1
0
1
0
0
1
1
0
1
1
1
0
0
1
1
0
0
1
0
1
0
1
0
1
1
0
0
1
0
1
0
0
0
1
0
1
1
0
1
0
1
1
0
1
1
0
0
0
1
0
0
1
1
0
1
0
0
0
0
0
1
0
0
1
1
0
1
1
0
1
0
0
1
1
0
1
0
0
0
1
1
1
0
0
1
0
1
0
0
1
0
1
1
0
0
1
0
1
1
0
1
0
1
0
1
1
0
0
0
1
1
0
1
0
1
1
1
1
0
0
1
0
1
1
1
1
1
0
0
0
0
0
1
1
0
0
1
1
0
1
0
0
1
0
1
0
1
0
0
1
0
1
1
1
0
0
1
1
0
1
1
1
0
1
0
0
0
1
0
1
1
0
0
1
0
0
0
1
0
0
0
0
1
0
1
1
1
1
0
1
0
1
0
0
0
0
1
1
1
0
0
1
1
0
0
0
1
1
1
0
0
1
1
1
1
0
1
0
1
0
0
1
0
1
1
1
0
0
1
1
0
1
1
1
0
1
1
1
1
0
0
0
1
1
1
0
1
1
1
1
1
1
1
0
0
1
1
1
1
0
0
1
0
1
1
1
0
0
1
1
1
1
0
1
1
0
1
0
1
0
0
1
0
1
0
0
1
0
0
1
1
0
0
1
0
0
1
0
1
1
1
1
0
0
0
0
1
1
0


From the printed prediction results, we can observe the underlying predictions from the model, however, we cannot judge how accurate these predictions are just by looking at the predicted output.

If we have corresponding labels for the test set, (for which, in this case, we do), then we can compare these true labels to the predicted labels to judge the accuracy of the model's evaluations. We'll see how to visualize this using a tool called a **confusion matrix** in the next episode. 