# Introduction to Neural Networks

In this assignment you first will be introduced to the components of a deep learning model. You will study the code that creates and executes a model and you will apply the theoretical background to propose a model for image data. 

Learning goals:
- understand components of a deep learning model and how they work mathematically
- relate the components to the hyperparameters and model setup of keras tensorflow model
- propose improvements in design 

Data:

The data we will use is the Breast Cancer Wisconsin (Diagnostic) Data Set of the UCI Machine Learning Repository. You are however free to use your own dataset of interest to study the code. 

<a name='2'></a>
In case you want to study image data classifications you can use a dataset like the Dataset of breast ultrasound imagesfrom Al-Dhabyani W, Gomaa M, Khaled H, Fahmy  2020 Feb;28:104863. DOI: 10.1016/j.dib.2019.104863


Sources used: 
- https://medium.com/mlearning-ai/binary-classification-of-breast-cancer-diagnosis-using-tensorflow-neural-networks-30ac8f40388
- Deep Learning for the Life Sciences by Bharath Ramsundar, Peter Eastman, Pat Walters, Vijay Pande Released April 2019 Publisher(s): O'Reilly Media, Inc. ISBN: 9781492039839 https://www.oreilly.com/library/view/deep-learning-for/9781492039822/



# Assignment

Study the material and answer the questions

1. Study the [background text](#0)
2. Study the [code steps](#1). Add comments in your own words and explain design choices such as
    - number of [layers](#01), 
    - [width](#02) of layers, 
    - number of [epochs](#03), 
    - [activation functions](#04), 
    - [loss function](#05), 
    - [gradient descent function](#06), 
    - [regularization function](#07)
3. Run the [code](#1). Evaluate the performance by discussing the results of the evaluation metrics. What hyper parameters would you recommend to change? Explain your choices. 
4. How do I set up a `batch_size` and how does it effect the outcome? Why do you think the batch_size was not set in the first place?
5. (Optional) Would there be a possibility to execute cross validation? How? 
6. (Optional) How can I introduce a validation test set? What would I need to change in the code?
7. Study the [tensor](#2) text. Consider a dataset of breast cancer images. What needs to be changed to the deep learning model design to make a model based on pictures? You can answer this in words, but if you like you can also try to code the solution. 

<a name='1'></a>
# Study Case

Consider the Breast Cancer Wisconsin (Diagnostic) Data Set. UCI Machine Learning Repository: Breast Cancer Wisconsin (diagnostic) data set. (n.d.). https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Diagnostic%29. Consider the code below. 


In [None]:
import pandas as pd
import numpy as np 
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import LabelEncoder
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Activation
from tensorflow.keras.layers import Flatten
from tensorflow.keras.layers import Dropout
from tensorflow.keras.layers import Dense
from sklearn.metrics import classification_report, confusion_matrix, ConfusionMatrixDisplay
import yaml

In [None]:
# Load the configuration from the YAML file
with open("config.yml", "r") as file:
    config = yaml.safe_load(file)

# Get the dataset path from the configuration
dataset = config["dataset"]["path"]

# Load the dataset
df = pd.read_csv(dataset)


In [None]:

# # Preprocess the labels: Convert categorical variable into dummy/indicator variables
le = LabelEncoder()
le.fit(df['diagnosis'])
df['diagnosis'] = le.transform(df['diagnosis'])

# Split the data into features and labels
X = df[df.columns[2:-1]]
y = df['diagnosis']

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
  X, y, test_size=0.20, random_state=42)

# Normalize the features using MinMaxScaler
scaler = MinMaxScaler()
scaler.fit(X_train)
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)

# Define the model architecture
model = Sequential()

# Add first hidden layer with 20 neurons
# Add 'relu' activation function
model.add(Dense(20, activation='relu'))
# Add dropout layer to prevent overfitting
model.add(Dropout(0.5))

# Add second hidden layer with 10 neurons and 'relu' activation function
model.add(Dense(10, activation='relu'))
# Add dropout layer to prevent overfitting
model.add(Dropout(0.5))

# Add output layer with 'sigmoid' activation function for binary classification
model.add(Dense(1, activation='sigmoid'))

# Compile the model with binary cross entropy loss function and Adam optimizer
model.compile(loss='binary_crossentropy', optimizer='adam')

# Train the model for 100 epochs
model.fit(x=X_train, y=y_train, epochs=100, validation_data=(X_test, y_test))

# Plot the loss during training
model_loss = pd.DataFrame(model.history.history)
model_loss.plot()


In [None]:

# Make predictions and print the classification report
predicted=(model.predict(X_test) > 0.5).astype(int)
print(classification_report(y_test, predicted))


In [None]:
# Plot the confusion matrix
confusion_matrix = confusion_matrix(y_test, predicted)
cm_display = ConfusionMatrixDisplay(confusion_matrix = confusion_matrix, 
                                    display_labels = [False, True])

cm_display.plot()
plt.show()

-----

# 2- Design Choice:
### Number and width of layers: 
The model has 3 layers in total. The first two are hidden layers with 20 and 10 nodes, the final layer is the output layer with one node.
Having multiple layers allows the model to learn more complex representations of the data.

**Reducing the number of nodes for each subsequent layer is a common practice in deep learning as it forces the model to learn a compressed representation of the input data.**

### Number of epochs:
The model is trained for <font color='red'>100 epochs.</font>
The number of epochs is often chosen based on the complexity of the task, the amount of data available and also the expertise of the data scientict.

### Activation functions:
The hidden layers use the ReLU (Rectified Linear Unit) activation function. The final layer uses the sigmoid activation function, which is common for binary classification tasks as it outputs probabilities between 0 and 1.

### Loss function:
The model uses binary cross entropy as its loss function, which is a common choice for binary classification tasks.

### Gradient descent function (optimizer):
The Adam optimizer is used for the gradient descent. It combines the advantages of two other extensions of stochastic gradient descent: **AdaGrad and RMSProp.**
It adapts the learning rate for each weight in the model individually and computes adaptive learning rates for different parameters.

### Regularization function:
Dropout is used as a regularization technique. Dropout randomly sets a fraction of input units to 0 at each update during training time, which helps prevent overfitting.

-----------

# 3- Performance evaluation and hyperparameter tuning:

The model performance is good. The final validation loss is low at 0.0624 and the test set accuracy is quite high at 0.98.

This means that model correctly classifies 98% of the test set samples. Moreover, the precision, recall and F1-score are also high (0.98) for both classes, indicating that model has a good balance between precision and recall, and does not favor one class over the other.

If we want to experiment with some hyperparameters:

- Number of layers and width of layers: Increasing the number of layers and nodes might allow the model to capture more complex patterns in the data. Conversely, reducing them might prevent potential overfitting if it's an issue.

- Batch size: You might want to try increasing or decreasing the batch size. A smaller batch size might increase the generalization capability of the model, while a larger batch size could speed up the learning process.

- Number of epochs: If the model is not converging, we might need to increase the number of epochs. If the model is overfitting, we might need to decrease the number of epochs or use early stopping.

- Dropout rate: If the model is overfitting, we could increase the dropout rate to introduce more regularization. Conversely, if the model is underfitting, we might need to decrease the dropout rate.

- Optimizer: Although Adam is used very often and is a solid choice, we can try other optimizers like SGD (with or without momentum) or RMSProp might have a minor impact.

--------

# 4- Batch size setup and effect:

Batch size is the number of samples processed before the model is updated. The size of a batch must be more than or equal to one and less than or equal to the number of samples in the training dataset.

The effect of batch_size on the outcome is dependent on several factors. Smaller batch sizes, down to a batch size of 1 (stochastic gradient descent), can result in a model that generalizes better, at the cost of longer training times. Larger batch sizes train faster, but might lead to a model that doesn't generalize as well, and also consumes more memory, because you have to load more data at once.
When Batch size is not explicitly set, Keras will use the default batch size of 32. This number is a common choice that tends to work well in many scenarios
So we can change the batch_size:



In [None]:
model.fit(x=X_train, y=y_train, epochs=100, batch_size=40, validation_data=(X_test, y_test))


----------

# 5- Cross-validation:


Cross-validation can be implemented using the KerasClassifier or KerasRegressor wrapper in scikit-learn. K-fold cross-validation can then be performed using the cross_val_score function from scikit-learn with the wrapped model as input.

Introducing a validation test set:
A validation set can be introduced during the model.fit call. In the provided code, the test set is used as the validation set. To create a separate validation set, we need to further split the training data into a smaller training set and a validation set. This can be done using train_test_split again on the X_train and y_train.

Adapting for image data:
For image data, we might want to use convolutional layers (Conv2D for 2D images) in our model as they are designed to automatically and adaptively learn spatial hierarchies of features from the input. We'd also need to reshape our input data so it's suitable for these convolutional layers. The shape should be (batch_size, height, width, channels) for color images, or (batch_size, height, width, 1) for grayscale images.

We'll also likely want to include some MaxPooling2D layers after the convolutional layers to downsample the input along its spatial dimensions. Dropout or Batch Normalization might also be helpful to prevent overfitting. Finally, before passing the data to the dense output layer, we'll need to flatten the output from the convolutional and pooling layers.

Remember to adjust the loss function and the final layer's activation function if the problem is not binary classification. For multiclass classification, use 'categorical_crossentropy' as loss function and 'softmax' as the activation function in the output layer. For regression, use 'mse' or 'mae' as the loss function and no activation function in the output layer.

In [None]:
from scikeras.wrappers import KerasRegressor, KerasClassifier
from sklearn.model_selection import cross_val_score
from keras.models import Sequential
from keras.layers import Dense, Dropout


# Define a function to create the model, required for KerasClassifier
def create_model():
    model = Sequential()
    model.add(Dense(20, input_dim=X_scaled.shape[1], activation='relu'))
    model.add(Dropout(0.5))
    model.add(Dense(10, activation='relu'))
    model.add(Dropout(0.5))
    model.add(Dense(1, activation='sigmoid'))
    model.compile(loss='binary_crossentropy', optimizer='adam')
    return model

# create model
model = KerasClassifier(build_fn=create_model, epochs=100, verbose=0)

# Normalize the features using MinMaxScaler
scaler = MinMaxScaler()
scaler.fit(X)
X_scaled = scaler.transform(X)

# evaluate using 5-fold cross validation
results = cross_val_score(model, X_scaled, y, cv=5)
print(results.mean())



**The mean score of results: 0.9684 returned from the 5-fold cross-validation and it's accuracy of our model.
It means that on average, the model correctly classified about 96.84% of the instances across the five different splits of the data into training and testing sets.**

### Tricky point:
input_dim=X_scaled.shape[1]

Keras model expects the input shape of the first layer to be provided. we have to specify the input_dim attribute in the first layer of the model preventing error.



--------

# 6- Validation test

For validation test we can split the data into three sets: training set, validation set, and test set.
The training set is used to train the model, the validation set is used to tune parameters and to decide when to stop training (early stopping), and the test set is used to evaluate the final model. 

In [None]:
from sklearn.model_selection import train_test_split
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout
from sklearn.preprocessing import StandardScaler
from tensorflow.keras.optimizers import Adam

# Split the data into training+validation set and test set
X_temp, X_test, y_temp, y_test = train_test_split(X, y, test_size=0.20, random_state=42)

# Then split training+validation set into training and validation sets
X_train, X_val, y_train, y_val = train_test_split(X_temp, y_temp, test_size=0.25, random_state=42) # 0.25 x 0.8 = 0.2

# Scale the data (important for neural networks)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_val = scaler.transform(X_val)
X_test = scaler.transform(X_test)

# Create model
model = Sequential()
model.add(Dense(32, input_dim=X.shape[1], activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))

# Compile model
model.compile(loss='binary_crossentropy', optimizer=Adam(), metrics=['accuracy'])

# Fit the model
history = model.fit(X_train, y_train, validation_data=(X_val, y_val), epochs=100, batch_size=10)

# Evaluate the model
test_loss, test_acc = model.evaluate(X_test, y_test)

print(f'Test accuracy: {test_acc}')


Here are my interpretation:
Training loss (loss) and validation loss (val_loss) both start at 0.8704 and 0.5503, respectively, indicating that the model isn't fitting the data well initially.

As the epochs increase, the loss for both the training and validation sets significantly decreases, which means that the model is learning and improving its performance.

The accuracy of the model on the training set (accuracy) starts at 0.4633, but quickly improves to reach 0.9971 by epoch 100, indicating that the model is fitting the training data very well.

The validation accuracy (val_accuracy) also increases over time, but it reaches a peak of 0.9649 around epoch 17 and remains roughly constant thereafter, suggesting that the model generalizes well to unseen data.

It's also important to note that from around epoch 27 onwards, while the training loss continues to decrease, the validation loss starts to increase. This is an indication of overfitting: the model is starting to memorize the training data and loses its ability to generalize to new data.

The final test accuracy is 0.9737, which is close to the final validation accuracy. This suggests that the model is likely to perform similarly on new data in the future.

In conclusion, the model seems to be performing quite well, although there are signs of overfitting in the later epochs. We might need to consider strategies to mitigate overfitting, such as early stopping (stopping training when the validation loss starts to increase).




-------------

# 7- Model based on pictures:

 If the input data are images instead of structured data, the structure of the deep learning model needs to be changed. Instead of Dense layers, we usually use convolutional layers (Conv2D) for image data. Also, we need to ensure that the input shape matches the shape of the images.
 To adapt the code to work with image data, we would change the data preprocessing steps to handle images.

In [None]:
import yaml

# Load the configuration from the YAML file
with open("config.yml", "r") as file:
    config = yaml.safe_load(file)

# Get the dataset path from the configuration
image_dataset = config["image_dataset"]["path"]


In [None]:
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D
from tensorflow.keras.layers import MaxPooling2D
from tensorflow.keras.layers import Flatten
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Dropout
from tensorflow.keras.optimizers import Adam

# Set up the ImageDataGenerators like this:
train_datagen = ImageDataGenerator(rescale=1./255,
                                   shear_range=0.2,
                                   zoom_range=0.2,
                                   horizontal_flip=True,
                                   validation_split=0.2) # set validation split

train_set = train_datagen.flow_from_directory(image_dataset,
                                              target_size=(64, 64),
                                              batch_size=32,
                                              class_mode='categorical',
                                              subset='training') # set as training data

validation_set = train_datagen.flow_from_directory(image_dataset, # same directory as training data
                                              target_size=(64, 64),
                                              batch_size=32,
                                              class_mode='categorical',
                                              subset='validation') # set as validation data

# Set up a simple CNN architecture like this:
model = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(64, 64, 3)))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(3, activation='softmax')) # 3 output neurons for 3 classes

# Compile the model
opt = Adam(learning_rate=0.0001)
model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])

# Train the model
history = model.fit(train_set,
                    epochs=50,
                    validation_data=validation_set)


Both the training loss and the validation loss decrease over time, which is a good sign. This indicates that the model is learning to classify the images correctly.

Both the training accuracy and the validation accuracy increase over time, again indicating that the model is learning effectively.

The model's performance on the validation set is relatively close to its performance on the training set, which suggests that overfitting is not a significant issue.

**Final Performance:**
By the 50th epoch, the model achieves a validation accuracy of about 74.92%, which means it correctly classifies approximately 75% of the images in the validation set.

In conclusion, the model seems to be performing reasonably well with a final validation accuracy of around 75%. If a higher accuracy is required, experimenting with different model architectures, adding more data, or using data augmentation might help. 





-----

Fatemeh and I did this assignment together.