# Problem Session 11
## A Pumpkin Seeds Neural Network

This notebook will serve as an introduction to our neural network content. In particular, this material will touch on the following lecture notebooks:
- `Lectures/Neural Networks/1. Perceptrons`,
- `Lectures/Neural Networks/2. The MNIST Data Set`,
- `Lectures/Neural Networks/3. Multilayer Neural Networks` and
- `Lectures/Neural Networks/4. keras`.

Note that while you may be completing this problem session prior to the live neural networks lecture, the goal is to make this notebook as understandable as possible.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

##### 1. `keras` check

In order to run some of the code in this notebook you will need to have `keras` properly installed. `keras` is the python package that we will use to build neural networks in this boot camp. 

##### a. 

Try and run the following to check that you have keras installed.

In [None]:
## Try this first
import keras

## If that does not work, try this
# from tensorflow import keras

In [None]:
## When I wrote this notebook I had version 2.11
print(keras.__version__)

If you do not have `keras` installed, you can try:
- Running `pip install keras` in your command prompt/terminal,
- Running `conda install keras` in your command prompt/terminal,
- Following the directions on the Erd&#337;s Institute website for installing with Anaconda Navigator, or
- Following the installation directions at <a href="https://keras.io/getting_started/">https://keras.io/getting_started/</a>.

Note that individuals using an Apple computer with M1 or M2 chips may need to set up a new conda environment with `keras` as an initial install.

#### 2. Load the data

In this notebook you will work to build neural network models to make predictions on the pumpkin seed data featured in the previous three notebooks. 

##### a.

Load and prepare that data using the code below.

In [None]:
seeds = pd.read_excel("../../data/Pumpkin_Seeds_Dataset.xlsx")

seeds['y'] = 0

seeds.loc[seeds.Class=='Ürgüp Sivrisi', 'y']=1

In [None]:
from sklearn.model_selection import train_test_split

In [None]:
seeds_train, seeds_test = train_test_split(seeds.copy(),
                                              shuffle=True,
                                              random_state=123,
                                              test_size=.1,
                                              stratify=seeds.y.values)

##### b.

While probably not needed for this problem, it is common to use a validation set for neural network model building instead of cross-validation. This is because neural networks can take a long time to train. To help you practice that step we will make a validation set for the pumpkin seed data.

In [None]:
## code here




##### c.

Neural networks are a model type where differing data scales can greatly impact performance. Fit a `StandardScaler` on the training set from the validation split. Then transform the two sets using that scaler. Store the transformed data in `X_tt` and `X_val` respectively. Then also store the labels in `y_tt` and `y_val`.

In [None]:
## code here



In [None]:
## code here



##### 3. An `sklearn` neural network

In this problem you'll first use `sklearn`'s `MLPClassifier` model which is `sklearn`'s neural network classifier.

##### a. 

Import the `MLPClassifier` from `sklearn.neural_network`, <a href="https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html">https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html</a>.

In [None]:
## code here



##### b. 

The `MLPClassifier` is what is known as a <i>feed forward</i> neural network.

The feed forward network you will build in this part looks schematically like this:

<img src="pumpkin_nn.png" width="70%"></img>

The nodes (circles) in the "Input Layer" represent the 12 features for this data set. Each node corresponds to one of the unique features. 

The hidden layers are nodes in which activation functions, $\sigma$, are applied to weighted sums of the previous layer's nodes. For example the value of the topmost node in the first hidden layer would be given by:
$$
\sigma\left( \sum_{i=1}^{12} w_i x_i \right),
$$

for some nonlinear function $\sigma$ and some weights, $w_i$, that would be found through fitting. The arrows from one node to another denote these weighted sums. The output layer gives the prediction of the model.

This network is said to have two hidden layers, each of which are $5$ nodes tall.

To build this network set the `hidden_layer_sizes` argument to `(5,5,)` when defining the `MLPClassifier` object.

Make this model and fit it with the training set from the validation split. Then find the accuracy on the validation set.

##### Sample Solution

In [None]:
## Make the model object here
## Note you will need to increase the max_iter, 10000 should work
mlp1 = 

## Fit the model object


In [None]:
from sklearn.metrics import accuracy_score

In [None]:
## calculate the validation accuracy


##### c.

One way to improve neural network performance is to add more nodes to your hidden layers or to add additional hidden layers. Try some different hidden layer structures and see how the validation accuracies compare to the initial model.

In [None]:
## code here



In [None]:
## code here



In [None]:
## code here



In [None]:
## code here



#### 4. Your first `keras` neural network

While `sklearn`'s neural network is relatively easy to use, it is not common for people to use `MLPClassifier` (or `MLPRegressor`) to build neural networks. Instead it is much more common for people to use a python package dedicated to the construction of neural networks. For us that package will be `keras`, <a href="https://keras.io/">https://keras.io/</a>.

In this problem you will build your first neural network made with `keras`.

##### a.

We will start by building the exact same neural network as in <i>3. b.</i>. First we will need to import all of the classes and functions used in the creation of a neural network with `keras`. Run the code chunk below to do so.

In [None]:
## Import the following
from keras import models
from keras import layers
from keras import optimizers
from keras import losses
from keras import metrics
from keras.utils import to_categorical  # note:  this version should work with the version of keras in the erdos_sp_2024 conda env.


### If you have a newer version of keras ###
#from keras.utils.np_utils import to_categorical

##### b.

Together we will build this `keras` neural network step by step. The first step is to make an empty `Sequential` model. This can be created with `models.Sequential()`. Store this in a variable called `model1`.

In [None]:
## code here



##### c.

Now we have to add the hidden layers to our model. The layers we will add are called `Dense` layers in `keras`. The first input to `layers.Dense()` are the number of nodes in the hidden layer. The second input is the type of activation function, the $\sigma$ from above, for us this will be the `'relu'` activation function. The first `Dense` layer also requires an `input_shape` input, this just tells the model how large your input layer is.

In the code chunk below try to add the second `Dense` layer of the hidden layer. I will provide the code for adding the first `Dense` layer.

In [None]:
model1.add(layers.Dense(5, activation='relu', input_shape=(X_tt.shape[1],)))


## Add the second Dense layer here,
## Remember first the number of nodes,
## Then the activation function, but
## do not include an input_shape


In [None]:
## This will show you the model we have built so far
model1.summary()

##### d.

After the hidden layers we have to put in the output layer. This is also a `Dense` layer, but there should only be $1$ node and the `activation` function should be the `'sigmoid'`. The sigmoid activation takes the weighted sum and turns it into a probability.

Try adding the output layer. Then check your network with `.summary()`.

In [None]:
## code here



In [None]:
## code here



##### e. 

Before fitting a `keras` model you have to `compile` it. Compiling tells the model how you want it to be fit:
- What optimization algorithm to use,
- What `loss` function you want to use, and
- What metrics you want to keep track of.

Run the code chunk below to see how you `compile` a `keras` model.

In [None]:
## Compile the network here with
## the 'rmsprop' optimizer, the 'categorical_crossentropy' loss and
## 'accuracy' as the only metric
model1.compile(optimizer = 'rmsprop',
                 loss = 'binary_crossentropy',
                 metrics = ['accuracy'])

##### f.

Now we are ready to fit the model.

Neural networks are fit using the method of gradient descent (for more information on that check out the Supervised Learning Gradient Descent lecture notebook). Gradient descent works by randomly cycling through the training observations and updating the current coefficient guesses for the weighted sums. Each time through the training set is called an <i>epoch</i>. When calling `.fit` in `keras` you have to provide the number of epochs. For this example we will use `100`. You will also see an argument called `batch_size`, this controls how many training observations are used at each update set. We will set the `batch_size` to `25`.

Run the code chunk below to train the neural network you built.

In [None]:
## You'll train the model for 100 epochs
n_epochs = 100

## fit the model here
## First the X are input
## Then the y
## then the epochs,
## the batch_size
## and finally we can provide a validation set
## keras is nice and calculates the accuracy on this as the model is trained
history1 = model1.fit(X_tt,
                       y_tt,
                       epochs = n_epochs,
                       batch_size = 25,
                       validation_data = (X_val, 
                                          y_val))

##### g.

You may have noticed that we stored the results of `.fit` in a variable called `history1`. `history1` has an attribute called `.history` which contains a dictionary of:
- The loss function on the training set at each epoch stored with the `loss` key,
- The accuracy on the training set at each epoch stored with the `accuracy` key,
- The loss function on the validation set at each epoch stored with the `val_loss` key, and
- The accuracy on the validation set at each epoch stored with the `val_accuracy` key.

Try plotting the training accuracy and the validation accuracy against the epoch below. Then make a similar plot but with the loss function on both sets.

In [None]:
## Demonstrating the dictionary
history1.history

In [None]:
sns.set_style("whitegrid")


##### h. 

A common step in neural network training is trying to identify when to stop training the model. This can be done by looking at the two plots you just made and looking for when the validation set performance leveled off.

With that in mind, how many epochs would you choose?

##### Write here



#### 5. Building more models

Congratulations! You have successfully built a `keras` model. Feel free to use the remainder of this notebook to play around and build neural networks with different architectures. You may want to compare their performance to `model1`.

You could also test out if using PCA to preprocess your data first would improve the neural networks performance.

In [None]:
## code here



In [None]:
## code here



In [None]:
## code here



In [None]:
## code here



In [None]:
## code here



In [None]:
## code here



In [None]:
## code here



--------------------------

This notebook was written for the Erd&#337;s Institute C&#337;de Data Science Boot Camp by Matthew Osborne, Ph. D., 2023.

Any potential redistributors must seek and receive permission from Matthew Tyler Osborne, Ph.D. prior to redistribution. Redistribution of the material contained in this repository is conditional on acknowledgement of Matthew Tyler Osborne, Ph.D.'s original authorship and sponsorship of the Erdős Institute as subject to the license (see License.md)