# Lab 1: Neural Networks
This week, you will use EEG recording of one patient with epileptic seizures to train a convolutional neural network. You can use this link https://keras.io/api/ to get more information about how to work with keras.

## A: Data pre-processing
We first load all python packages required for the code. In principle, these are all packages you will need for this lab, so **you’re not allowed** to import any other packages besides the ones we provide you with.

#### Load packages

In [1]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn import metrics
from tensorflow import keras as K, nn
import pandas as pd
import csv

Now, we load the dataset. This is an EEG recording of one patient with epilepsy with annotator labels. A label 1 means that the patient had an epileptic seizure in that fragment of the data, label 0 means that the patient did not have an epileptic seizure.


In the code below, you'll be provided with 'data', 'label' and 'test_data'. The 'data' and 'label' are the dataset that you will use to train and validate neural networks. At the end of the lab, you will use your trained convolutional neural network to predict the labels of 'test_data'. When you hand in the lab, your predicted labels will be compared against a set of secret annotator labels by Kaggle.

#### Import data

In [2]:
path_data = '../input/machine-learning-for-io-t-2025-lab-1/patient05_train.csv'
file = open(path_data)
csvreader = csv.reader(file)
rows = []
for row in csvreader:
    res = [float(i) for i in row]
    rows.append(res)
rows = np.asarray(rows)
print(np.shape(rows))
label = rows[:, 0].astype(int)
data = rows[:, 1:]

path_data = '../input/machine-learning-for-io-t-2025-lab-1/patient05_test.csv'
file = open(path_data)
csvreader = csv.reader(file)
rows = []
for row in csvreader:
    res = [float(i) for i in row]
    rows.append(res)
test_data = np.asarray(rows)
print(np.shape(test_data))

FileNotFoundError: [Errno 2] No such file or directory: '../input/machine-learning-for-io-t-2025-lab-1/patient05_train.csv'

#### Split the dataset into training and validation sets.

In [None]:
train_data, validation_data, train_labels, validation_labels= train_test_split(data, label, test_size=...)

print('train data shape:',np.shape(train_data))
print('validation data shape:',np.shape(validation_data))
print('train labels shape:',np.shape(train_labels))
print('validation labels shape:',np.shape(validation_labels))

#### Standardize the training set to have a mean of 0 and a variance of 1.

Hint: Use the mean and standard deviation (the square root of variance) of the training set to standardize the validation and test sets.

In [None]:
train_data = ...

validation_data = ...

test_data = ...

print(np.shape(train_data))

## B: Model training

#### Create a convolutional neural network using Keras to be applied to the training set.

In [None]:
model = K.models.Sequential([
    ...
    ])

#### Compile the neural network.

In [None]:
model.compile(optimizer=..., loss=..., metrics=['accuracy'])
model.summary()

#### Fit the neural network using test and validation sets.

Hint: You can use callback for early stopping to avoid overfitting.

In [None]:
history = model.fit(train_data, train_labels, validation_data=(validation_data, validation_labels), batch_size=..., epochs=...)

#### Plot the learning curves.

Try to analyze the curves for yourself.

In [None]:
plt.title('Learning Curves')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.plot(history.history['accuracy'], label='train')
plt.plot(history.history['val_accuracy'], label='validation')
plt.legend()
plt.show()

#### For the validation set, plot the ROC curve. Calculate and report AUC score.

Hint: You should use **predict** function to predict the labels. Then from **metrics** module, call functions to calculate false positive rate, true positive rate and AUC score.

In [None]:
predicted_labels_val = ...
fpr, tpr, _ = ...
auc = ...
print(auc)

#create ROC curve
plt.plot(fpr, tpr)
plt.ylabel('True Positive Rate')
plt.xlabel('False Positive Rate')
plt.show()

#### Predict labels for the test set.

In [None]:
predicted_labels_test = model.predict(test_data)
#submission_array = ... #You can use the commented code below, or any other code that results in a vector of size [~218 x 1] of 0 and 1’s that correspond to
#the predicted label. It is important to do this part correctly, as it directly affects your final score.

submission_array = [0 if predicted_labels_test[i,0]>predicted_labels_test[i,1] else 1 for i in range(np.shape(predicted_labels_test)[0])]

#### Collect the predictions.

In [None]:
x = [i for i in range(1,len(submission_array)+1)]
d = {'Id':x,'Category': submission_array}
df = pd.DataFrame(data=d)
print(df)
file_name = 'submission.csv'
df.to_csv(file_name, sep=',',index=False)

## C: Questions

Think about these questions to demonstrate your lab/results to TA in lab sessions:

**1.** Describe how you chose the layers and activation functions, and explain how these choices are justified based on your dataset’s characteristics and the specific problem you aim to solve.

**2.** How do parameters like the network’s depth, number of filters, and kernel size affect its ability to capture intricate patterns in the data? Describe how you evaluate the number of parameters in your model and how these factors relate to issues like underfitting and overfitting.

**3.** Describe how you optimize model complexity for your task. Discuss techniques such as regularization and grid search, and explain how each could impact your model’s performance and ability to generalize well to new data.