<a href="https://colab.research.google.com/github/robsub/Keras/blob/main/Part_1_tf_Keras_deepliard.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Get (in this instance create) and pre-process the data

In [2]:
import numpy as np
from random import randint
from sklearn.utils import shuffle
from sklearn.preprocessing import MinMaxScaler

In [3]:
# create lists to hold samples and labels

train_labels = []
train_samples = []

Example data:

- An experimental drug was tested on infivifuals from ages 13 to 100 in a clinical trial. 

- The trial had 2100 participants. Half were under 65 years old, half were under 65 years or older. 

- Around 95% of patients 65 or older experienced side effects. 

- Around 95% of patients under 65% experienced no side effects. 

fit(
    x=None,
    y=None,
    batch_size=None,
    epochs=1,
    verbose='auto',
    callbacks=None,
    validation_split=0.0,
    validation_data=None,
    shuffle=True,
    class_weight=None,
    sample_weight=None,
    initial_epoch=0,
    steps_per_epoch=None,
    validation_steps=None,
    validation_batch_size=None,
    validation_freq=1,
    max_queue_size=10,
    workers=1,
    use_multiprocessing=False
)

Trains the model for a fixed number of epochs (iterations on a dataset).

In [5]:
# https://www.tensorflow.org/api_docs/python/tf/keras/Sequential X and Y lets use a numpy array (y must match x format)


for i in range(50):
# 5% younger individuals who did exp side effects
  random_younger = randint(13,64)
  train_samples.append(random_younger)
  train_labels.append(1) # 1 = side effects

# 5% older individuals who did not exp side effects
  random_older = randint(65,100)
  train_samples.append(random_older)
  train_labels.append(0) # 0 = no side effects

for i in range(1000):
# 95% younger individuals who did not exp side effects
  random_younger = randint(13,64)
  train_samples.append(random_younger)
  train_labels.append(0)

# 95% older individuals who did exp side effects
  random_older = randint(65,100)
  train_samples.append(random_older)
  train_labels.append(1)



In [14]:
for i in train_samples[:5]: # print first 5 only
    print(i)

39
64
23
58
71


In [15]:
for i in train_labels[:5]:
  print(i)

0
0
0
0
1


In [11]:
# LEts process the data to be piped into something the fit command expects (lets go with a numpy array)

train_labels = np.array(train_labels) # This line is converting the train_labels list to a numpy array. Numpy is a numerical computing library for Python and numpy.array() is a function that creates a new numpy array from a given list.
train_samples = np.array(train_samples) # This line is converting the train_samples list to a numpy array.
train_labels,train_samples = shuffle(train_labels, train_samples) # This line is shuffling the two numpy arrays train_labels and train_samples together using the shuffle() function from the sklearn.utils module. This is a common technique used in machine learning to randomize the order of the training data so that the model does not learn any order-related biases. The function returns two shuffled arrays, which are then unpacked and assigned to the original variable names train_labels and train_samples.

# Check the type of the variables after conversion
print(type(train_labels))   # Output: <class 'numpy.ndarray'>
print(type(train_samples))


<class 'numpy.ndarray'>
<class 'numpy.ndarray'>


In [12]:
# This code is scaling the train_samples data to normalize and standardize it, which is a common technique used in machine learning to improve model performance.

# The MinMaxScaler is a scaler object from the sklearn.preprocessing module that scales data to a specified range, which is [0,1] in this case. The feature_range parameter specifies the desired range.

# train_samples is reshaped to a 2D numpy array using the reshape() function with -1 as the first parameter and 1 as the second parameter. This is necessary because the fit_transform() method of MinMaxScaler expects a 2D array as input.

# The fit_transform() method of MinMaxScaler is then called on the reshaped train_samples data to fit the scaler to the data and transform it to the specified range. The resulting scaled data is assigned to the scaled_train_samples variable.

# AKA minmax scaler object used to create feature_range, feature range is then used on next line to rescale data from current range which is 13 to 100 down to a scale of 0 to 1. and the last bit ((train_samples.reshape(-1,1))) is changing data from 1 dimensional to 2 dimensional which is what is expected by the fit function. 
scaler = MinMaxScaler(feature_range=(0,1))
scaled_train_samples = scaler.fit_transform(train_samples.reshape(-1,1))

In [16]:
for i in scaled_train_samples[:5]:
  print(i)

[0.29885057]
[0.5862069]
[0.11494253]
[0.51724138]
[0.66666667]


# Simple tf.keras Sequential Model

In [17]:
import tensorflow as tf # to BUILD our first model
from tensorflow import keras # to BUILD our first model
from tensorflow.keras.models import Sequential # to BUILD our first model
from tensorflow.keras.layers import Activation, Dense # to BUILD our first model
from tensorflow.keras.optimizers import Adam # to TRAIN our first model
from tensorflow.keras.metrics import categorical_crossentropy # to TRAIN our first model

# tensorflow is an open-source software library for dataflow and differentiable programming across a range of tasks. It is commonly used for building deep learning models.
# keras is a high-level neural networks API, written in Python and capable of running on top of TensorFlow, CNTK, or Theano. It is designed to enable fast experimentation with deep neural networks, and thus it is commonly used for building and training neural networks.
# Sequential is a Keras class for creating a sequential neural network model.
# Activation is a Keras class that specifies the activation function to be used in a neural network layer.
# Dense is a Keras class that specifies a fully connected neural network layer.
# Adam is a Keras class that specifies the optimizer used for training a neural network model.
# categorical_crossentropy is a Keras metric used to calculate the loss of a classification model during training.
# You will use these libraries and classes to define, compile, and train your neural network model for your specific machine learning task.




In [19]:
# Build the sequential model (sequential is the simplest type of model using keras or tensorflow)
# The sequential model architecture is a type of feedforward neural network, which means the information flows only in one direction, from input to output.
# Yes, that is correct. Even though backpropagation propagates the error from right to left, a neural network model that consists of only forward-pass layers, like a Sequential model in Keras, is still considered a linear model because the output of each layer is fed directly into the next layer, without any recurrent connections that would allow feedback from later layers to earlier layers. Therefore, the model can be thought of as processing its input in a linear, sequential manner, one layer at a time, from left to right.

model = Sequential([
    Dense(units=16, input_shape=(1,), activation='relu'), # hidden layer (not input layer)
    Dense(units=32, activation='relu'),
    Dense(units=2, activation='softmax'), # output layer
])

# This code is defining a sequential model with three dense layers:

# The first dense layer has 16 units/neurons, and takes an input of shape (1,) (a single input value) and applies the rectified linear unit (ReLU) activation function.
# The second dense layer has 32 units/neurons and applies the ReLU activation function.
# The third dense layer has 2 units/neurons and applies the softmax activation function.
# The first dense layer is defined to have an input shape of (1,) because the input to the model will be a single value, and the output will be one of two classes. The final dense layer has 2 units/neurons because there are two classes to predict.

# So overall, the model takes a single input value and applies two hidden layers with ReLU activation, followed by a final output layer with a softmax activation to predict one of two classes.




# Rob Field
# What is a dense layer? What other types of layers exist and could be used in sequential NN?

# A dense layer is a type of layer in a neural network where each neuron is fully connected to all neurons in the previous layer. In a dense layer, each neuron computes a weighted sum of its inputs, adds a bias term, and applies an activation function to the result.

# Other types of layers that can be used in a sequential neural network include:

# Convolutional layers: these layers are used for image processing and feature extraction. They work by applying a convolution operation on the input, which is a sliding window that extracts features from the input.

# Pooling layers: these layers are used in conjunction with convolutional layers to reduce the dimensionality of the feature maps and provide translation invariance.

# Recurrent layers: these layers are used for processing sequential data, such as text or time series. They work by maintaining an internal state that is updated at each time step and can retain information from previous time steps.

# Dropout layers: these layers are used to prevent overfitting by randomly dropping out a fraction of the neurons during training.

# Batch normalization layers: these layers are used to normalize the inputs to each layer and speed up training by reducing the internal covariate shift.

# Embedding layers: these layers are used for natural language processing tasks, such as text classification or sentiment analysis. They work by mapping each word in the input text to a dense vector representation.





# An activation function is a mathematical function applied to the output of a neural network layer. It introduces non-linearity into the model and allows it to learn more complex patterns.

# ReLU (Rectified Linear Unit) is a commonly used activation function in deep learning that returns 0 for negative inputs and the input value for non-negative inputs.

# Softmax is an activation function used in the final layer of a neural network to normalize the output into a probability distribution over multiple classes. It forces the sum of the output values to be equal to 1 and makes it easier to interpret the output as probabilities.




# Rob Field
# When you say equal to 1, do you mean between 0 and 1?
# Yes, that's correct. The softmax function is used to convert the outputs of the neural network into a probability distribution, where each output is a number between 0 and 1, and the sum of all outputs is equal to 1.




# Rob Field
# sum of all outputs is equal to 1. - can you explain?
# The softmax activation function transforms the output of a layer into a probability distribution over the predicted classes. The function takes in a vector of arbitrary real-valued scores and squashes them down to a range between 0 and 1, so that they can be interpreted as probabilities.

# The output of a softmax function is a vector of the same length as the input, where each element represents the probability of the input belonging to a particular class. The sum of all the elements in the output vector is always equal to 1, ensuring that the outputs can be interpreted as probabilities that an input belongs to each class.



# The input_shape parameter in Keras defines the shape of the input data that will be passed to the neural network model. In the example you provided, the input_shape is (1,), which means that the input data will have one dimension, and the size of that dimension is 1.

# In other words, the input data in this case will be a one-dimensional array with one element, such as [3] or [-2]. This is a common input shape for models that take in scalar values as inputs, such as regression models.

# Other possible input shapes that could be used depending on the data could be (n,) where n is the number of input features. For example, if the input data was a 2D array with 100 rows and 10 columns, the input shape could be (10,) to indicate that there are 10 features or (100, 10) to indicate there are 100 rows and 10 columns. The choice of input shape depends on the nature of the data and the type of problem being solved.

model.summary()


Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_3 (Dense)             (None, 16)                32        
                                                                 
 dense_4 (Dense)             (None, 32)                544       
                                                                 
 dense_5 (Dense)             (None, 2)                 66        
                                                                 
Total params: 642
Trainable params: 642
Non-trainable params: 0
_________________________________________________________________




Looking at a non sequntial model (BEYOND THE SCOPE OF THIS COURSE SO FAR):

here's an example code snippet that uses the functional API and connects layers together by passing the output of one layer as the input to the next layer AKA NOT a sequential/linear model

from tensorflow.keras.layers import Input, Dense
from tensorflow.keras.models import Model

# Define the input layer
input_layer = Input(shape=(784,))

# Define the first hidden layer and connect it to the input layer
hidden_layer_1 = Dense(units=256, activation='relu')(input_layer)

# Define the second hidden layer and connect it to the first hidden layer
hidden_layer_2 = Dense(units=128, activation='relu')(hidden_layer_1)

# Define the output layer and connect it to the second hidden layer
output_layer = Dense(units=10, activation='softmax')(hidden_layer_2)

# Define the model using the input and output layers
model = Model(inputs=input_layer, outputs=output_layer)

This still looks linear to me (from left to right) where are the loops?

You're correct, the code I provided still shows a linear flow from right to left. However, with the functional API, you can create more complex models that involve loops or branches.

For example, you can create a loop by using a for loop to repeat a layer multiple times, or you can create a branch by connecting the output of one layer to multiple subsequent layers. These more complex models are often used for tasks such as image segmentation or language translation.

Here's an example of how you could create a simple loop using the functional API:

scss
Copy code
from tensorflow.keras.layers import Input, Dense
from tensorflow.keras.models import Model

input_layer = Input(shape=(1,))
hidden_layer = Dense(units=16, activation='relu')(input_layer)
for i in range(3):
    hidden_layer = Dense(units=16, activation='relu')(hidden_layer)
output_layer = Dense(units=2, activation='softmax')(hidden_layer)

model = Model(inputs=input_layer, outputs=output_layer)
In this example, we use a for loop to create three hidden layers with the same architecture. The output of each hidden layer is connected to the next hidden layer in the loop, creating a loop in the model. The final output layer is connected to the last hidden layer in the loop.

*italicized text*

# Train the NN 

In [23]:
# Call model.compile function - prepares the model for training
model.compile(optimizer=Adam(learning_rate=0.0001), loss='sparse_categorical_crossentropy', metrics=['accuracy'])

In [24]:
model.fit(x=scaled_train_samples, y=train_labels, batch_size=10, epochs=30, shuffle=True, verbose=2)

Epoch 1/30
210/210 - 1s - loss: 0.3666 - accuracy: 0.8929 - 988ms/epoch - 5ms/step
Epoch 2/30
210/210 - 0s - loss: 0.3496 - accuracy: 0.8971 - 328ms/epoch - 2ms/step
Epoch 3/30
210/210 - 0s - loss: 0.3351 - accuracy: 0.9005 - 317ms/epoch - 2ms/step
Epoch 4/30
210/210 - 0s - loss: 0.3233 - accuracy: 0.9105 - 329ms/epoch - 2ms/step
Epoch 5/30
210/210 - 0s - loss: 0.3132 - accuracy: 0.9076 - 334ms/epoch - 2ms/step
Epoch 6/30
210/210 - 0s - loss: 0.3049 - accuracy: 0.9152 - 326ms/epoch - 2ms/step
Epoch 7/30
210/210 - 0s - loss: 0.2980 - accuracy: 0.9148 - 333ms/epoch - 2ms/step
Epoch 8/30
210/210 - 0s - loss: 0.2921 - accuracy: 0.9252 - 321ms/epoch - 2ms/step
Epoch 9/30
210/210 - 0s - loss: 0.2874 - accuracy: 0.9214 - 321ms/epoch - 2ms/step
Epoch 10/30
210/210 - 0s - loss: 0.2831 - accuracy: 0.9267 - 320ms/epoch - 2ms/step
Epoch 11/30
210/210 - 0s - loss: 0.2796 - accuracy: 0.9290 - 327ms/epoch - 2ms/step
Epoch 12/30
210/210 - 0s - loss: 0.2766 - accuracy: 0.9262 - 332ms/epoch - 2ms/step
E

<keras.callbacks.History at 0x7f46341ff7c0>

What is being cached to cause the first epoch to take 6ms and the rest 2ms (the first time round - changed since writing this)? Presumably, something is being cached? Do you know what is? AKA What does the epoch need to do only once on the first epoch?

The time taken to complete an epoch in a neural network model can depend on several factors such as the size of the dataset, the complexity of the model, and the computing resources available.

In this case, it is possible that the first epoch took longer because the data needed to be loaded into the cache or memory. The subsequent epochs may have taken less time because the data was already loaded into the cache or memory and did not need to be reloaded. This can result in faster processing times for the subsequent epochs.

Additionally, the Keras fit() function can perform various initialization steps before the first epoch, such as compiling the model and initializing the optimizer, which may also contribute to the longer execution time for the first epoch.

It's worth noting that the differences in execution time between epochs in this example are relatively small, and it's unlikely to have a significant impact on the overall training time.




Rob Field
I can see that the accuracy improves on each epoch. This makes sense, but does the accuracy always increase per epoch? or is it because this is a simple dataset and model?

The accuracy of a model on a given dataset can increase, decrease or remain unchanged from one epoch to the next. The direction and rate of change in accuracy can depend on various factors, such as the complexity of the model, the size and quality of the training dataset, the choice of optimizer and learning rate, and the stopping criterion used to terminate the training process.

In general, when a model is underfitting the data, the accuracy is likely to increase with each epoch as the model adjusts its parameters to better fit the training data. However, when the model starts to overfit the data, the accuracy on the training dataset may continue to improve, but the accuracy on the validation dataset may start to decrease, indicating that the model is no longer generalizing well to new data.

In the case of the code snippet you provided, the accuracy is increasing on each epoch, which suggests that the model is improving its ability to fit the training data. However, it's also possible that this is due to the simplicity of the dataset and the model. In more complex scenarios, the accuracy may not increase on every epoch or may plateau at some point, indicating that the model has reached its optimal performance.

In [22]:
model.summary()

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_3 (Dense)             (None, 16)                32        
                                                                 
 dense_4 (Dense)             (None, 32)                544       
                                                                 
 dense_5 (Dense)             (None, 2)                 66        
                                                                 
Total params: 642
Trainable params: 642
Non-trainable params: 0
_________________________________________________________________


# Build a Validation Set With TensorFlow's Keras API

In machine learning, a validation set is a portion of the dataset used to evaluate the performance of a model during training. In the context of sequential neural networks using Keras, the validation set is typically used to evaluate the model's performance on data that it has not seen before.

During the training process, the model is typically trained on a portion of the dataset called the training set. The model's performance on the training set is used to adjust its weights and improve its accuracy. However, to ensure that the model is not overfitting (i.e., performing well on the training set but poorly on new, unseen data), it is important to evaluate the model's performance on a separate dataset that it has not been trained on. This is where the validation set comes in.

In Keras, you can specify a validation set by passing the validation_data argument to the fit() function. The validation_data argument takes a tuple of inputs and targets (e.g., (x_val, y_val)), which represent the validation data. During training, Keras will evaluate the model's performance on the validation data at the end of each epoch, and this information can be used to monitor the model's progress and adjust its hyperparameters (e.g., learning rate, batch size, etc.) to improve its accuracy.