# Feed Forward Networks Using Keras

Content in this notebook is taken from keras.io

In [5]:
!pip install keras
!pip install tensorflow

Collecting tensorflow
  Downloading tensorflow-2.17.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.2 kB)
Collecting astunparse>=1.6.0 (from tensorflow)
  Downloading astunparse-1.6.3-py2.py3-none-any.whl.metadata (4.4 kB)
Collecting flatbuffers>=24.3.25 (from tensorflow)
  Downloading flatbuffers-24.3.25-py2.py3-none-any.whl.metadata (850 bytes)
Collecting gast!=0.5.0,!=0.5.1,!=0.5.2,>=0.2.1 (from tensorflow)
  Downloading gast-0.6.0-py3-none-any.whl.metadata (1.3 kB)
Collecting google-pasta>=0.1.1 (from tensorflow)
  Downloading google_pasta-0.2.0-py3-none-any.whl.metadata (814 bytes)
Collecting libclang>=13.0.0 (from tensorflow)
  Downloading libclang-18.1.1-py2.py3-none-manylinux2010_x86_64.whl.metadata (5.2 kB)
Collecting ml-dtypes<0.5.0,>=0.3.1 (from tensorflow)
  Downloading ml_dtypes-0.4.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (20 kB)
Collecting opt-einsum>=2.3.2 (from tensorflow)
  Downloading opt_einsum-3.3.0-py3-none-any

In [6]:
import numpy as np
from keras.models import Sequential
from keras.layers import Dense
import keras
import tensorflow as tf
import matplotlib.pyplot as plt
import pandas as pd

2024-09-16 19:00:47.967272: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2024-09-16 19:00:48.079452: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2024-09-16 19:00:48.163415: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-09-16 19:00:48.261777: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-09-16 19:00:48.294533: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-09-16 19:00:48.463292: I tensorflow/core/platform/cpu_feature_gu

### Building a Network for an XOR Gate

We will be implementing a 2-input XOR gate using Keras. We chose the XOR problem to explain a basic neural network because it is one of the most common ANN research problems. We will try to predict the output of an XOR gate given two or three binary inputs.

For the two-input XOR gate problem, it should return a `True` value if both inputs are not equal and a `False` value if they are equal.

| Input 1 | Input 2 | Output |
| --- | --- | --- |
| 0 | 0 | 0 |
| 0 | 1 | 1 |
| 1 | 0 | 1 |
| 1 | 1 | 0 |

`Input 1` and `Input 2` will be our training data for the model and `Output` will be the target.

In [7]:
# training data: numpy array, two inputs of a XOR gate
training_data = np.array([[0,0],[0,1],[1,0],[1,1]], 'float32')

# test data: numpy array
test_data = np.array([[0,1],[1,1],[1,0],[0,0]], 'float32')

# target: numpy array, output of a XOR gate
target_data = np.array([[0],[1],[1],[0]], 'float32')

**Sequential Model**

The Keras `Sequential` class helps to form a cluster of layers that is linearly stacked into `tf.keras.Model`. The features of training and inference are provided by `Sequential` to this model. (Definition taken from keras.io)

`Sequential` has two methods, `add` and `pop`. The `add()` method adds a layer on the top of the stack and the `pop()` method removes the last layer from the stack. Documentation for the `Sequential` method can be accessed at the <a href='https://keras.io/api/models/sequential/'>`Sequential`</a> layer source page.

**Dense Layer**

The `Dense` layer is a fully connected neural network layer. It is the most commonly used layer in Keras models. Below is the structure of the `Dense` layer and the name of all the default parameters which can be tuned.

```
tf.keras.layers.Dense(
    units,
    activation=None,
    use_bias=True,
    kernel_initializer='glorot_uniform',
    bias_initializer='zeros',
    kernel_regularizer=None,
    bias_regularizer=None,
    activity_regularizer=None,
    kernel_constraint=None,
    bias_constraint=None,
    **kwargs
)
```

Parameter details can be accessed on the <a href='https://keras.io/api/layers/core_layers/dense/'>`Dense`</a> layer source page.


**ReLU Activation Function**

The ReLU function is probably the closest to its biological counterpart. It's a mix of the identity and the threshold function, and it's called the rectifier, or ReLU, as in Rectified Linear Unit.

<img src='https://ist691.s3.us-east-2.amazonaws.com/images/relu.jpg' width='150'/>

**Sigmoid Activation Function**

The sigmoid activation function is called bipolar sigmoid, and it's simply a logistic sigmoid rescaled and translated to have a range in (-1, 1).

<img src='https://ist691.s3.us-east-2.amazonaws.com/images/sigmoid.jpg' width='300'/>

In [8]:
model = Sequential()

# adding a dense layer of size 16 and a relu activation function
model.add(Dense(16, input_dim = 2, activation = 'relu'))

# adding a dense output layer with output with one neuron and a sigmoid activation function
model.add(Dense(1, activation = 'sigmoid'))

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)



`model.summary()` gives us the summary of the model including the total number of parameters, different layers, output shape and parameters in each layer.

In [9]:
model.summary()

**Mean Square Error (MSE) Loss Function**

Computes the mean of squares of errors between labels and predictions.

<img src='https://ist691.s3.us-east-2.amazonaws.com/images/mse.png' width='200'/>

**Adam Optimizer**

An optimization algorithm that can be used instead of the classical stochastic gradient descent procedure to update network weights iterative based in training data.


**Binary Accuracy**

Calculates how often predictions match binary labels.

This metric creates two local variables, `total` and `count` that are used to compute the frequency with which `y_pred` matches `y_true`. This frequency is ultimately returned as binary accuracy: an <a href='https://en.wikipedia.org/wiki/Idempotence'>idempotent operation</a> that simply divides total by count.

Compile the model.

In [10]:
model.compile(loss = 'mean_squared_error',
              optimizer = 'adam',
              metrics = ['binary_accuracy'])

**`model.fit`**

Trains the model for a fixed number of epochs (iterations on a dataset).

Below is the parameter names and their default values of the `model.fit()` function.

```
model.fit(
    x=None,
    y=None,
    batch_size=None,
    epochs=1,
    verbose=1,
    callbacks=None,
    validation_split=0.0,
    validation_data=None,
    shuffle=True,
    class_weight=None,
    sample_weight=None,
    initial_epoch=0,
    steps_per_epoch=None,
    validation_steps=None,
    validation_batch_size=None,
    validation_freq=1,
    max_queue_size=10,
    workers=1,
    use_multiprocessing=False,
)
```

For this section of the lab, we will utilize the `epochs` and `verbose` paramters of `model.fit`.

**`Epoch`**: One complete cycle of weight updates through the training data set.

**`Verbose`**: Progress bar shown while training the model. 0 = silent, 1 = progress bar, and 2 = one line per epoch.

More information about the description of all other parameters can be found on the <a href='https://keras.io/api/models/model_training_apis/#fit-method'>`model.fit()`</a> source page.

In [11]:
history = model.fit(training_data, target_data, epochs = 10, verbose = 1)

Epoch 1/10


[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 880ms/step - binary_accuracy: 0.5000 - loss: 0.2547
Epoch 2/10
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 16ms/step - binary_accuracy: 0.2500 - loss: 0.2544
Epoch 3/10
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 18ms/step - binary_accuracy: 0.2500 - loss: 0.2540
Epoch 4/10
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 19ms/step - binary_accuracy: 0.2500 - loss: 0.2536
Epoch 5/10
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 61ms/step - binary_accuracy: 0.2500 - loss: 0.2533
Epoch 6/10
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 20ms/step - binary_accuracy: 0.2500 - loss: 0.2530
Epoch 7/10
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 19ms/step - binary_accuracy: 0.2500 - loss: 0.2526
Epoch 8/10
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 19ms/step - binary_accuracy: 0.2500 - loss: 0.2523
Epoch 9/10
[1m1/1

#### `model.predict`

This function is used to generate output predictions for the input data.

> <a href='https://keras.io/api/models/model_training_apis/#predict-method'> Prediction</a> method source page

In [12]:
print(model.predict(test_data).round())

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 35ms/step
[[1.]
 [0.]
 [0.]
 [1.]]


### Building a Network Using the `iris` Dataset

This is perhaps the best known dataset in the pattern recognition literature. The dataset contains 3 classes of 50 instances each, where each class refers to a type of iris plant. One class is linearly separable from the other 2; the latter are *not* linearly separable from each other.

Predicted attribute: class of iris plant.

Attribute Information:

1. sepal length in cm
2. sepal width in cm
3. petal length in cm
4. petal width in cm
5. class: Iris-setosa; Iris-versicolour; Iris-virginica

In [13]:
# read in the dataset using read.csv method
iris = pd.read_csv('https://ist691.s3.us-east-2.amazonaws.com/iris.csv')

In [14]:
# df.head method returns the top 5 rows from the dataframe
iris.head()

Unnamed: 0,Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
0,1,5.1,3.5,1.4,0.2,Iris-setosa
1,2,4.9,3.0,1.4,0.2,Iris-setosa
2,3,4.7,3.2,1.3,0.2,Iris-setosa
3,4,4.6,3.1,1.5,0.2,Iris-setosa
4,5,5.0,3.6,1.4,0.2,Iris-setosa


In [15]:
iris.Species.value_counts()

Species
Iris-setosa        50
Iris-versicolor    50
Iris-virginica     50
Name: count, dtype: int64

In [16]:
# selecting sepal length, sepal width, petal length, petal width as features
X = iris.iloc[:,1:5].values

# selecting species as the target variable
y = iris.iloc[:,5].values

In [17]:
# convert categorical data into dummy variables
y = pd.get_dummies(y).values
y[:2]

array([[ True, False, False],
       [ True, False, False]])

In the dummies we created in the last step, the value 1 in the list indicates a `True` value. In each row, only one value can be `True`, which means each row belongs to one class.

In [18]:
# splitting the data into train and test using the train_test_split function of sklearn
from sklearn.model_selection import train_test_split

# spliting the dataset into 75:25 ratio
X_train, X_test, y_train, y_test = train_test_split(X, y,
                                                    test_size = 0.25,
                                                    random_state = 100)

In [19]:
from keras.models import Sequential
from keras.layers import Dense
from tensorflow.keras.optimizers import SGD
from tensorflow.keras.optimizers import Adam

In [20]:
model = Sequential()
model.add(Dense(16, input_shape = (4,), activation = 'relu'))
model.add(Dense(12, activation = 'sigmoid'))
model.add(Dense(8, activation = 'relu'))
model.add(Dense(3, activation = 'softmax'))

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


In [21]:
keras.utils.plot_model(model, 'my_first_model.png', show_shapes = True)

You must install pydot (`pip install pydot`) for `plot_model` to work.


In [22]:
model.summary()

In [23]:
model.compile(Adam(learning_rate = 0.005),
              'categorical_crossentropy',
              metrics = ['accuracy'])

In [24]:
model.fit(X_train, y_train, epochs = 10)
y_pred = model.predict(X_test)

Epoch 1/10


[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - accuracy: 0.3863 - loss: 1.1023  
Epoch 2/10
[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - accuracy: 0.6409 - loss: 1.0520 
Epoch 3/10
[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - accuracy: 0.4789 - loss: 1.0433 
Epoch 4/10
[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - accuracy: 0.5565 - loss: 0.9996 
Epoch 5/10
[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - accuracy: 0.5284 - loss: 0.9546 
Epoch 6/10
[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - accuracy: 0.6534 - loss: 0.9099 
Epoch 7/10
[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - accuracy: 0.6746 - loss: 0.8623 
Epoch 8/10
[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.7235 - loss: 0.7874 
Epoch 9/10
[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1

In [25]:
y_test_class = np.argmax(y_test,axis = 1)
y_pred_class = np.argmax(y_pred,axis = 1)

In [26]:
from sklearn.metrics import classification_report, confusion_matrix
print(classification_report(y_test_class, y_pred_class))
print(confusion_matrix(y_test_class, y_pred_class))

              precision    recall  f1-score   support

           0       1.00      1.00      1.00        14
           1       0.42      1.00      0.59        10
           2       0.00      0.00      0.00        14

    accuracy                           0.63        38
   macro avg       0.47      0.67      0.53        38
weighted avg       0.48      0.63      0.52        38

[[14  0  0]
 [ 0 10  0]
 [ 0 14  0]]


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
