# Functional API in Keras

Will use Keras functional APIs to build a simple NN model:
- Create an input layer
- Add hidden layers
- Define a output layer
- Create a Dropout layer
- Create a Batch Normalization layer
- Compile, train, and evaluate the created model.

Ensuring a clean testing environment, a virtual environment in conda was created. 

_**Note**: TensorFlow supports Python version 3.9 to 3.12. If the environment is in Python 3.13, there will be problems with installaiton._

In [2]:
# Install TensorFlow and NumPy 
# %pip install numpy  # 2.2.5
# %pip install tensorflow==2.16.2  # 2.16.2, installed in conda prompt

Import the required libraries

In [1]:
import tensorflow as tf
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Dense
import warnings
warnings.filterwarnings('ignore', category=UserWarning, module='tensorflow')

## 1 Define the Input Layer

As an simple example, assume we are working with a dataset with each input being a **vector of length 20** - the model expects input data with 20 features.

If we print the defined layer, it's of dtype float32 for elements in the vector. It's a Keras Tensor.

In [2]:
input_layer = Input(shape=(20,))
print("Input Layer:", input_layer)

Input Layer: <KerasTensor shape=(None, 20), dtype=float32, sparse=False, ragged=False, name=keras_tensor>


## 2 Add Hidden Layers

Here we add a fully connected layer with 64 nodes and ReLU activation function. Each hidden layer takes the output of the previous layer as its input. 

In simple words, `Dense(n, activation)(prev_layer)` defines the number of nodes in the _current layer_ in the first bracket, and refer to _previous layer_ the in the second bracket.

In [3]:
hidden_layer1 = Dense(64, activation='relu')(input_layer)
hidden_layer2 = Dense(64, activation='relu')(hidden_layer1)

## 3 Define the Output Layer

Let's say we are handling a **binary classification** problem. The output should have 1 node with a Sigmoid activation function.

In [4]:
output_layer = Dense(1, activation='sigmoid')(hidden_layer2)

## 4 Create the Model

Specify the input and output layers in the model. Use the `Model` class to create the model object.

In [6]:
model = Model(inputs=input_layer, outputs=output_layer)
model.summary()   # Print the model summary

So, the model is "functional". In the summary, we can see:
- the layers and their shapes
- the number of parameters in each layer
- the size of parameters in this model

## 5 Compile the Model

Compile the model before training it: `Model.compile(loss, optimizer, metrics)`

We need to specify:
- Loss function
- Optimizer
- Evaluation Metrics

In [8]:
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

"""
AI Generated Code Explanation:

Loss: binary_crossentropy
    Binary crossentropy is used for binary classification problems.
    It measures the performance of a model whose output is a probability value between 0 and 1.
    It compares the predicted probability with the actual class label (0 or 1) and calculates the loss.
Optimizer: adam
    Adam is an adaptive learning rate optimization algorithm.
    It combines the advantages of two other extensions of stochastic gradient descent (SGD).
    It is computationally efficient and requires little memory.
Metrics: accuracy
    Accuracy is the ratio of correctly predicted instances to the total instances.
    It is a common metric for classification problems.
"""

'\nAI Generated Code Explanation:\n\nLoss: binary_crossentropy\n    Binary crossentropy is used for binary classification problems.\n    It measures the performance of a model whose output is a probability value between 0 and 1.\n    It compares the predicted probability with the actual class label (0 or 1) and calculates the loss.\nOptimizer: adam\n    Adam is an adaptive learning rate optimization algorithm.\n    It combines the advantages of two other extensions of stochastic gradient descent (SGD).\n    It is computationally efficient and requires little memory.\nMetrics: accuracy\n    Accuracy is the ratio of correctly predicted instances to the total instances.\n    It is a common metric for classification problems.\n'

## 6 Train the Model

Randomly generate some data for training in this example. Be careful about the splitting of train, test and evaluation sets in real-world problems

In [9]:
import numpy as np
X_train = np.random.rand(1000, 20)  # 1000 samples, 20 features
y_train = np.random.randint(0, 2, size=(1000, 1))  # Binary labels (0 or 1)

X_test = np.random.rand(200, 20)  # 200 samples, 20 features
y_test = np.random.randint(0, 2, size=(200, 1))  # Binary labels (0 or 1)

Use the `Model.fit(X_train, y_train, epochs, batch_size)` function to train the model.

In [10]:
model.fit(X_train, y_train, epochs=10, batch_size=32)  # Train the model

Epoch 1/10
[1m32/32[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - accuracy: 0.4897 - loss: 0.6965 
Epoch 2/10
[1m32/32[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.5495 - loss: 0.6880 
Epoch 3/10
[1m32/32[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.5822 - loss: 0.6830 
Epoch 4/10
[1m32/32[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.5572 - loss: 0.6791 
Epoch 5/10
[1m32/32[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.5860 - loss: 0.6797 
Epoch 6/10
[1m32/32[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.5340 - loss: 0.6803 
Epoch 7/10
[1m32/32[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.6038 - loss: 0.6730 
Epoch 8/10
[1m32/32[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.6381 - loss: 0.6638 
Epoch 9/10
[1m32/32[0m [32m━━━━━━━━━━━━━━━━━━

<keras.src.callbacks.history.History at 0x1d8b8f054f0>

## 7 Evaluate the model

In [11]:
loss, accuracy = model.evaluate(X_test, y_test)
print(f"Test Loss: {loss}")
print(f"Test Accuracy: {accuracy}")

[1m7/7[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 0.5452 - loss: 0.7033  
Test Loss: 0.7039012312889099
Test Accuracy: 0.5099999904632568


---

## Dropout (Hyperparameter)

Dropout is a **regularization technique**, helping **prevent overfitting** and learn more **robust features that generalize better** to unseen data.

How it works:
- During (and _only_ during) **training**, randomly set a **fraction** of input units to zero at each update cycle
  - The fraction is a **hyperparameter**
  - (Prevent the model being overly reliant on any specific neurons)



In [13]:
from tensorflow.keras.layers import Dropout 

# An example:

input_layer = Input(shape=(20,))

hidden_layer1 = Dense(64, activation='relu')(input_layer)

# Add a Dropout layer with a dropout rate of 0.5
dropout_layer = Dropout(rate=0.5)(hidden_layer1)

hidden_layer2 = Dense(64, activation='relu')(dropout_layer)
output_layer = Dense(1, activation='sigmoid')(hidden_layer2)
model = Model(inputs=input_layer, outputs=output_layer)

model.summary()

# Compile the model to start training

## Batch Normalization

Batch Normalization is a technique used to improve the **training stability and speed** of neural networks. 

It can be applied during **both training and inference** stages. The behavior varies slightly.

How it works:
- Stability: 
  - It normalizes the output of a previous layer by **re-centering and re-scaling** the data, which helps in stabilizing the learning process. 
  - Normalization: mean = 0, var = 1 for inputs to each layer
- Speed: 
  - By **reducing the internal covariate shift** (the changes in the distribution of layer inputs), batch normalization allows the model to use **higher learning rates**, which often speeds up convergence.

Batch normalization layers also introduce **2 learnable parameters** that allow the model to scale and - shift the normalized output, which helps in restoring the model's representational power.


In [14]:
from tensorflow.keras.layers import BatchNormalization 

# An example:

input_layer = Input(shape=(20,))

hidden_layer1 = Dense(64, activation='relu')(input_layer)

# Add a BatchNormalization layer
batch_norm_layer = BatchNormalization()(hidden_layer1)

hidden_layer2 = Dense(64, activation='relu')(batch_norm_layer)
output_layer = Dense(1, activation='sigmoid')(hidden_layer2)
model = Model(inputs=input_layer, outputs=output_layer)

model.summary()

# Compile the model to start training

---

In [20]:
# Exercise 1

inputs = Input(shape=(24,))
hid_layer1 = Dense(64, activation='relu')(inputs)
dropout_layer1 = Dropout(rate = 0.5)(hid_layer1)   # Dropout
hid_layer2 = Dense(64, activation='relu')(dropout_layer1)
dropout_layer2 = Dropout(rate = 0.5)(hid_layer2)   # Dropout
hid_layer3 = Dense(16, activation='relu')(dropout_layer2)
output = Dense(1, activation='sigmoid')(hid_layer3) 
model = Model(inputs=inputs, outputs=output)
model.summary()

model.compile(loss='binary_crossentropy', optimizer = 'adam', metrics=['accuracy', 'mse'])

X_train = np.random.rand(1000, 24)      
y_train = np.random.randint(0, 2, size=(1000, 1))
X_test = np.random.rand(200, 24) 
y_test = np.random.randint(0, 2, size=(200, 1))

model.fit(X_train, y_train, epochs=10, batch_size=32)

loss, accuracy, mse = model.evaluate(X_test, y_test)
print(f"Loss: {loss}")
print(f"Accuracy: {accuracy}")
print(f"MSE: {mse}")

Epoch 1/10
[1m32/32[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 2ms/step - accuracy: 0.4963 - loss: 0.6975 - mse: 0.2522
Epoch 2/10
[1m32/32[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.5173 - loss: 0.6927 - mse: 0.2498
Epoch 3/10
[1m32/32[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.4903 - loss: 0.6972 - mse: 0.2520 
Epoch 4/10
[1m32/32[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.5217 - loss: 0.6922 - mse: 0.2495 
Epoch 5/10
[1m32/32[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.5036 - loss: 0.6976 - mse: 0.2522 
Epoch 6/10
[1m32/32[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.5046 - loss: 0.6944 - mse: 0.2506 
Epoch 7/10
[1m32/32[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.5186 - loss: 0.6919 - mse: 0.2493
Epoch 8/10
[1m32/32[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[

In [19]:
# Exercise 2

inputs = Input(shape=(24,))
hid_layer1 = Dense(64, activation='tanh')(inputs)
hid_layer2 = Dense(64, activation='tanh')(hid_layer1)
hid_layer3 = Dense(16, activation='tanh')(hid_layer2)
output = Dense(1, activation='sigmoid')(hid_layer3) 
model = Model(inputs=inputs, outputs=output)
model.summary()

model.compile(loss='binary_crossentropy', optimizer = 'adam', metrics=['accuracy'])

model.fit(X_train, y_train, epochs=10, batch_size=32)

loss, accuracy = model.evaluate(X_test, y_test)
print(f"Loss: {loss}")
print(f"Accuracy: {accuracy}")


Epoch 1/10
[1m32/32[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - accuracy: 0.5056 - loss: 0.7148
Epoch 2/10
[1m32/32[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.5153 - loss: 0.6938 
Epoch 3/10
[1m32/32[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.5437 - loss: 0.6933 
Epoch 4/10
[1m32/32[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.5363 - loss: 0.6899 
Epoch 5/10
[1m32/32[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.5262 - loss: 0.6889 
Epoch 6/10
[1m32/32[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.5397 - loss: 0.6856 
Epoch 7/10
[1m32/32[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.5604 - loss: 0.6823 
Epoch 8/10
[1m32/32[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.5615 - loss: 0.6838 
Epoch 9/10
[1m32/32[0m [32m━━━━━━━━━━━━━━━━━━━

In [21]:
# Exercise 3

inputs = Input(shape=(24,))
hid_layer1 = Dense(64, activation='relu')(inputs)
batch_norm_layer1 = BatchNormalization()(hid_layer1)  # Batch Normalization
hid_layer2 = Dense(64, activation='relu')(batch_norm_layer1)
batch_norm_layer2 = BatchNormalization()(hid_layer2)
hid_layer3 = Dense(16, activation='relu')(batch_norm_layer2)
output = Dense(1, activation='sigmoid')(hid_layer3) 
model = Model(inputs=inputs, outputs=output)
model.summary()

model.compile(loss='binary_crossentropy', optimizer = 'adam', metrics=['accuracy'])

model.fit(X_train, y_train, epochs=10, batch_size=32)

loss, accuracy = model.evaluate(X_test, y_test)
print(f"Loss: {loss}")
print(f"Accuracy: {accuracy}")

Epoch 1/10
[1m32/32[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 2ms/step - accuracy: 0.4850 - loss: 0.7860
Epoch 2/10
[1m32/32[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.5825 - loss: 0.6786
Epoch 3/10
[1m32/32[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.6536 - loss: 0.6310
Epoch 4/10
[1m32/32[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.6864 - loss: 0.5975 
Epoch 5/10
[1m32/32[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.7270 - loss: 0.5761 
Epoch 6/10
[1m32/32[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.6961 - loss: 0.5710
Epoch 7/10
[1m32/32[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.7439 - loss: 0.5468
Epoch 8/10
[1m32/32[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.7558 - loss: 0.5278
Epoch 9/10
[1m32/32[0m [32m━━━━━━━━━━━━━━━━━━━━[0m