## Week 12 Extra Credit :- Feed Forward Neural Network

 This notebook provides the **Extra Credit task for Week 12** by writing and documenting the implemntation of a Feed-Forward Neural Network (FFNN) on synthetic binary classification data.

In [None]:
# Now we install the pyreadr to load the .rds files
!pip install pyreadr --quiet

Now Lets Import the Libraries

In [None]:
import os
import pathlib
import sys
import numpy as np
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
import tensorflow as tf
from tensorflow import keras
import pyreadr

 ## Loading Synthetic Datasets

The datasets were generated from a logistic model in R and stored as `.rds` files. They are all binary target variables (outcome) and numeric features.

Three dataset sizes will be evaluated for performance.
- 1000 rows  
- 10,000 rows  
- 100,000 rows


In [None]:
# Now we will load the 1000 row sythetic dataset file
result_1000 = pyreadr.read_r('synthetic_data_1000.rds')
df_1000 = result_1000[None]
print("Loaded 1k dataset!")
df_1000.head()

Loaded 1k dataset!


Unnamed: 0,pregnant,glucose,pressure,triceps,insulin,mass,pedigree,age,outcome
0,0.0,144.0,88.0,13.0,82.0,34.5,0.687,50.0,1.0
1,4.0,127.0,76.0,14.0,54.0,35.4,1.321,24.0,0.0
2,3.0,98.0,64.0,26.0,166.0,32.4,0.403,25.0,0.0
3,1.0,179.0,62.0,23.0,32.0,37.6,2.137,31.0,1.0
4,0.0,129.0,76.0,31.0,115.0,20.4,0.871,21.0,0.0


In [None]:
#Now we will load the 10k row synthetic dataset file
result_10k = pyreadr.read_r('synthetic_data_10000.rds')
df_10k = result_10k[None]
print("10k dataset loaded")
df_10k.head()

10k dataset loaded


Unnamed: 0,pregnant,glucose,pressure,triceps,insulin,mass,pedigree,age,outcome
0,6.0,145.0,74.0,35.0,255.0,33.6,1.6,23.0,1.0
1,11.0,99.0,68.0,18.0,85.0,29.5,0.559,26.0,0.0
2,0.0,126.0,44.0,22.0,175.0,28.5,1.268,21.0,0.0
3,7.0,84.0,86.0,28.0,285.0,37.1,0.452,57.0,0.0
4,0.0,79.0,60.0,17.0,64.0,35.0,0.283,38.0,0.0


In [None]:
# Now we will load the big dataset which is of 100k rows
result_100k = pyreadr.read_r('synthetic_data_100000.rds')
df_100k = result_100k[None]
print("Successfully loaded 100k rows dataset")
df_100k.head()

Successfully loaded 100k rows dataset


Unnamed: 0,pregnant,glucose,pressure,triceps,insulin,mass,pedigree,age,outcome
0,5.0,123.0,56.0,28.0,64.0,45.4,0.361,35.0,1.0
1,0.0,174.0,78.0,30.0,105.0,35.9,0.342,21.0,1.0
2,1.0,121.0,74.0,18.0,277.0,46.8,0.245,27.0,0.0
3,1.0,129.0,82.0,13.0,159.0,27.7,0.748,21.0,0.0
4,4.0,101.0,70.0,25.0,71.0,31.6,0.808,23.0,0.0


## Preprocessing of the Data

I first extracted the feature columns and the binary outcome.

Then I scale the features with standard scaling to make the range of the feature normalized.

Later we preprocess data and split each of them into:
- 80% Training Set  
- 20% Validation Set


In [None]:
# Now we import the necessary libraires for preprocessing
from sklearn import preprocessing
from sklearn.model_selection import train_test_split

In [None]:
# let us now preprocess for 1000 row dataset file
features_1k = df_1000.drop(columns=['outcome']).values
target_1k = df_1000['outcome'].values

features_1k_scaled = preprocessing.scale(features_1k)

# train-test split for 1k dataset
X_train_1k, X_val_1k, y_train_1k, y_val_1k = train_test_split(
    features_1k_scaled, target_1k, test_size=0.2, random_state=42)

In [None]:
# Now let us preprocess for 10000 row dataset file
features_10k = df_10k.drop(columns=['outcome']).values
target_10k = df_10k['outcome'].values

features_10k_scaled = preprocessing.scale(features_10k)

X_train_10k, X_val_10k, y_train_10k, y_val_10k = train_test_split(
    features_10k_scaled, target_10k, test_size=0.2, random_state=42)

In [None]:
# Now let us preprocess for 100000 row dataset file
features_100k = df_100k.drop(columns=['outcome']).values
target_100k = df_100k['outcome'].values

#lets scale the features
features_100k_scaled = preprocessing.scale(features_100k)

#lets split into train/test
X_train_100k, X_val_100k, y_train_100k, y_val_100k = train_test_split(
    features_100k_scaled, target_100k, test_size=0.2, random_state=42)

## FFNN with 1 Hidden Layer – Model 1

**Architecture**:

- Input: 8 feature nodes  
- 1 Hidden Layer: 4 nodes, `ReLU` activation  
- Output: 1 node, `sigmoid` activation (for binary classification)

**Configuration**:

- Loss: `binary_crossentropy`  
- Optimizer: `adam`  
- Evaluation Metric: `accuracy`

We train for 20 epochs and time each training session.

In [None]:
#building models
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
import time

In [None]:
# Here we build the model with 1 hidden layer
def build_nn_with_one_hidden(input_shape):
    model = Sequential()
    model.add(Dense(4, activation='relu', input_shape=input_shape))
    model.add(Dense(1, activation='sigmoid'))
    model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
    return model

In [None]:
# Now we train the model: First by 1000 rows dataset
model_1k_h1 = build_nn_with_one_hidden((X_train_1k.shape[1],))

# start timer
start_time = time.time()
hist_1k_h1 = model_1k_h1.fit(X_train_1k, y_train_1k,
                             epochs=20, batch_size=32, verbose=0,
                             validation_data=(X_val_1k, y_val_1k))
end_time = time.time()

# get accuracy values
train_acc_1k_h1 = hist_1k_h1.history['accuracy'][-1]
val_acc_1k_h1 = hist_1k_h1.history['val_accuracy'][-1]
time_1k_h1 = end_time - start_time

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


In [None]:
# Now we train the model: Second by 10000 rows dataset
model_10k_h1 = build_nn_with_one_hidden((X_train_10k.shape[1],))

# time it!
start = time.time()
history_10k_h1 = model_10k_h1.fit(X_train_10k, y_train_10k,
                                  epochs=20, batch_size=32, verbose=0,
                                  validation_data=(X_val_10k, y_val_10k))
end = time.time()

# get accuracy
train_accuracy_10k_h1 = history_10k_h1.history['accuracy'][-1]
val_accuracy_10k_h1 = history_10k_h1.history['val_accuracy'][-1]
time_taken_10k_h1 = end - start

In [None]:
# Run on the largest dataset (100k)
big_model_h1 = build_nn_with_one_hidden((X_train_100k.shape[1],))

In [None]:
# Now we train the model: Second by 100000 rows dataset
print("Training on 100k rows - please wait...")
start_time_big = time.time()
hist_big_h1 = big_model_h1.fit(X_train_100k, y_train_100k,
                               epochs=20, batch_size=32, verbose=0,
                               validation_data=(X_val_100k, y_val_100k))
end_time_big = time.time()

# now we get performance metrics
train_acc_big_h1 = hist_big_h1.history['accuracy'][-1]
val_acc_big_h1 = hist_big_h1.history['val_accuracy'][-1]
time_big_h1 = end_time_big - start_time_big

Training on 100k rows - please wait...


In [None]:
# Now lets print the results that we achieved
print("Results: 1 Hidden Layer (4 Nodes)\n")

print("1000 Rows -> Training Acc:", round(train_acc_1k_h1, 4),
      "| Validation Acc:", round(val_acc_1k_h1, 4),
      "| Time:", round(time_1k_h1, 2), "s")

print("10000 Rows -> Training Acc:", round(train_accuracy_10k_h1, 4),
      "| Validation Acc:", round(val_accuracy_10k_h1, 4),
      "| Time:", round(time_taken_10k_h1, 2), "s")

print("100000 Rows -> Training Acc:", round(train_acc_big_h1, 4),
      "| Validation Acc:", round(val_acc_big_h1, 4),
      "| Time:", round(time_big_h1, 2), "s")

Results: 1 Hidden Layer (4 Nodes)

1000 Rows -> Training Acc: 0.7387 | Validation Acc: 0.765 | Time: 11.64 s
10000 Rows -> Training Acc: 0.9962 | Validation Acc: 0.998 | Time: 19.48 s
100000 Rows -> Training Acc: 0.9988 | Validation Acc: 0.9988 | Time: 180.01 s


## FFNN with 2 Hidden Layers (Model 2)

An additional hidden layer is added to this model to increase capacity.

**Architecture**:

- Hidden Layer 1: 4 nodes, `ReLU`  
- Hidden Layer 2: 4 nodes, `ReLU`  
- Output Layer: 1 node, `sigmoid`

The other training configurations are the same.

In [None]:
# Now we define by the model by doing 2 hidden layers
def build_deeper_model(input_shape):
    NN = Sequential()
    NN.add(Dense(4, activation='relu', input_shape=input_shape))
    NN.add(Dense(4, activation='relu'))
    NN.add(Dense(1, activation='sigmoid'))
    NN.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['accuracy'])
    return NN

In [None]:
#Now we train the model on 1000 rows dataset for 2 hidden layers
model_1k_deeper = build_deeper_model((X_train_1k.shape[1],))

t1 = time.time()
hist_1k_deeper = model_1k_deeper.fit(X_train_1k, y_train_1k,
                                    epochs=20, batch_size=32, verbose=0,
                                    validation_data=(X_val_1k, y_val_1k))
t2 = time.time()

acc_train_1k_deeper = hist_1k_deeper.history['accuracy'][-1]
acc_val_1k_deeper = hist_1k_deeper.history['val_accuracy'][-1]
time_1k_deeper = t2 - t1

In [None]:
#Now we train the model on 10000 rows dataset for 2 hidden layers
model_10k_h2 = build_deeper_model((X_train_10k.shape[1],))

start_timer = time.time()
hist_10k_h2 = model_10k_h2.fit(X_train_10k, y_train_10k,
                              epochs=20, batch_size=32, verbose=0,
                              validation_data=(X_val_10k, y_val_10k))
end_timer = time.time()

train_acc_10k_h2 = hist_10k_h2.history['accuracy'][-1]
val_acc_10k_h2 = hist_10k_h2.history['val_accuracy'][-1]
time_10k_h2 = end_timer - start_timer


In [None]:
#Now we train the model on 100000 rows dataset for 2 hidden layers
model_100k_deeper = build_deeper_model((X_train_100k.shape[1],))

print("Training deeper model on 100k dataset - this might take a while...")
t_start = time.time()
hist_100k_h2 = model_100k_deeper.fit(X_train_100k, y_train_100k,
                                    epochs=20, batch_size=32, verbose=0,
                                    validation_data=(X_val_100k, y_val_100k))
t_end = time.time()

train_accuracy_100k_h2 = hist_100k_h2.history['accuracy'][-1]
validation_accuracy_100k_h2 = hist_100k_h2.history['val_accuracy'][-1]
training_time_100k_h2 = t_end - t_start

Training deeper model on 100k dataset - this might take a while...


In [None]:
# Now let us print all the results
print(" Results: 2 Hidden Layers (4 Nodes Each)\n")

print("1000 Rows -> Training Acc:", round(acc_train_1k_deeper, 4),
      "| Validation Acc:", round(acc_val_1k_deeper, 4),
      "| Time:", round(time_1k_deeper, 2), "s")

print("10000 Rows -> Training Acc:", round(train_acc_10k_h2, 4),
      "| Validation Acc:", round(val_acc_10k_h2, 4),
      "| Time:", round(time_10k_h2, 2), "s")

print("100000 Rows -> Training Acc:", round(train_accuracy_100k_h2, 4),
      "| Validation Acc:", round(validation_accuracy_100k_h2, 4),
      "| Time:", round(training_time_100k_h2, 2), "s")

 Results: 2 Hidden Layers (4 Nodes Each)

1000 Rows -> Training Acc: 0.9588 | Validation Acc: 0.93 | Time: 9.86 s
10000 Rows -> Training Acc: 0.9966 | Validation Acc: 0.999 | Time: 23.4 s
100000 Rows -> Training Acc: 0.9984 | Validation Acc: 0.9982 | Time: 179.34 s


## Summary of Feed-Forward Neural Network Results

The final results for both architectures are below.

### Model 1 – 1 Hidden Layer (4 Nodes)
- 1000 Rows → Training Acc: 0.7387 | Validation Acc: 0.765 | Time: 11.64 s  

- 10000 Rows → Training Acc: 0.9962 | Validation Acc: 0.998 | Time: 19.48 s  

- 100000 Rows → Training Acc: 0.9988 | Validation Acc: 0.9988 | Time: 180.01 s  

### Hidden Layers (4 Nodes Each) – Model 2

- 1000 Rows → Training Acc: 0.9588| Validation Acc: 0.93 | Time: 9.86 s

- 10000 Rows → Training Acc: 0.9966| Validation Acc: 0.999| Time: 23.4 s  

- 100000 Rows → Training Acc: 0.9984| Validation Acc: 0.9982| Time: 179.34 s  

As dataset size increases, both models perform very well, these results. Smaller datasets are better trained by the two layer model, and the accuracy on validation is slightly higher, but it takes the same amount of time to train.

## Reflection

This notebook shows the use of **Feed-Forward Neural Networks (FFNNs)** on structured classification data.

Key observations:
- Validation accuracy improves significantly with more training data.

- 2-hidden-layer model works slightly better, especially with bigger datasets.

- Run time increases with model complexity and dataset size.

- Preprocessing and data splitting are important to get stable and unbiased evaluation.

This notebook is an addendum to the write-up in `README_ExtraCredit.md`.