<a href="https://colab.research.google.com/github/matt-cornelius/Flip/blob/main/Copy_of_Practical_1_Pytorch_perceptron_ish.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# CSE 40868: Neural Networks - Spring 2025

## Practical Session 1 - Single Layer "Quasi-Perceptrons"

_New: Jan 14, 2025_

Logistics:
 * Using the "File > Save to Github" menu above, save a copy of this notebook to your class Github repo. Maks sure you save it  in the `Practicals/Practical1` folder by prepending `Practicals/Practical1/` to the notebook file name.
 * Study and run the notebook *before class on January 22*.
 * Arrive in class ready to do some of the work required.
 * Finish up your work and commit the changes to your repo, and finish the work by 11:59pm on Friday, January 24.

[ref](https://machinelearningmastery.com/building-a-single-layer-neural-network-in-pytorch/)


This practical session is intended to familiarize you with common idioms of [PyTorch](https://pytorch.org) -- the current dominant approach to neural net design and training.

We're going to implement a single-neuron net that will act similarly to a perceptron. PyTorch is overkill for this, but the idioms are mostly here and will transfer right over to your future code.

We aren't using GPUs in this notebook.  Supporting them adds a bit of extra stuff that gets in the way of a good understanding on a first exposure.

It might be quite useful for you to examine/run a beginner's tutorial from the PyTorch people that addresses a different computer vision problem (classification of clothing items): go [here](https://pytorch.org/tutorials/beginner/basics/quickstart_tutorial.html) for that tutorial.

# Prof-provided material

## 1. Import all the things

If you add code that needs a new thing to be `import`ed, put it here.


In [None]:
#include all the things
import os
import numpy as np
import gdown
import random
import matplotlib.pyplot as plt
import torch
import torch.nn as nn
import torchsummary
import tqdm.notebook as tqdm


## 2. Initialize the random number generator

Initialize the random number generator used by PyTorch. Sometimes you want a repeatable set of random numbers, and PyTorch can do that. We don't need that here.  Just a random initialization is all we need.  The return value is the 64-bit integer used to seed the generator (we assign it to `_` to avoid generating output from the cell).

In [None]:
# "Sets the seed for generating random numbers to a non-deterministic random number on all devices."
_ = torch.random.seed()

## 3. Get the data and examine it a bit

The code in the cell below, downloads and reads (into a dictionary) a 40MB data file containing a bunch of 128-dimensional data samples from a face recognition experiment using data collected here at ND.

The vectors are points in some high-dimensional space, and these vectors are grouped by identity in a `dictionary` -- the `key` is a numerical subject id, and the `value` is a numpy array with 128 columns and N rows, where N is the number of images of the subject.

In [None]:
# download data from Google Drive using gdown

url = 'https://drive.google.com/uc?id=19heLsvTf6AHj9irshCS-zdhTeL-w-Zc7'
localname = 'embeddingsbysubject.npy'
if os.path.exists(localname):
    print(f'{localname} already exists. Delete if you want to re-download it.')
else:
    gdown.download(url,localname)

# load the file
# the file contains a numpy array with one element: a ginormous dict.
# the .item() method pries the dict out of the array

d = np.load(localname,allow_pickle=True).item()

# brief examination of the dictionary
# a list of keys
lkeys = list(d.keys())

# a random choice from the list - a subject ID
rkey = random.choice(lkeys)
print(f'{len(lkeys)} keys present. One of them is {rkey}.')

# how many samples for that ID? get the value, it's a numpy array.
# look at its shape
v = d[rkey]
s = d[rkey].shape
print(f'd[{rkey}].shape is {s}: {s[0]} samples, each one {s[1]}-dimensional')


## 4. Define a Pytorch `Dataset` subclass

`torch.utils.data.DataSet` is an abstract class in Pytorch for a source of data. You create a subclass that implements `__init__()`, `__len__()`, and `__getitem__()`, which do the obvious things. Most of the action is in the init method, and there are "patterns" for file-based data sets, randomly created data sets, (probably) streaming data sets, _etc._

Here, we define a `Dataset` subclass named `TwoPersonData` that is tied closely to the data file we just read. It's initialized with the dictionary and two subject-IDs.  As long as the dict is organized this way and the subject-IDs are valid keys, and a numpy array is retrieved when the dict is keyed, then this works.


In [None]:
# Note: we're using a sigmoid for classification so the class labels must be
# 0 and 1
class TwoPersonData(torch.utils.data.Dataset):
    def __init__(self,source,subid0,subid1):
        """initialize a data set from two entries in a dict with the supplied keys."""
        self.data = np.vstack([d[subid0],d[subid1]])
        self.n0 = d[subid0].shape[0] # number of samples
        self.n1 = d[subid1].shape[0] # number of samples
        # since we turn the subject-IDs into 0 and 1, keep a mapping around
        self.labeldict = {subid0:0, subid1:1} # give it subid, get class label
        self.rlabeldict = {0:subid0, 1:subid1} # give class label, get subid
        # label vector has self.n0 0s followed by self.n1 1s
        self.label = np.array([0]*self.n0 + [1]*self.n1)
        return

    def __len__(self):
        return self.n0 + self.n1
    def __getitem__(self,idx):
        return self.data[idx,:],self.label[idx]
    def dimension(self):
        assert self.data.shape[0] > 0, "Error: no data"
        return self.data.shape[1]


## 5. Define a single-linear-unit neural network class

Here, we define a class `SingleNeuronNet` that we can instantiate, and we can train the instance.

**Important terminology!** An instance of a network class is what we refer to as a `model`.

The input size must be specified. It computes a biased linear function of a sample of the specified input size (that's what `nn.Linear` does) and feeds the result through a sigmoid function (`torch.sigmoid`). Recall that the sigmoid function asymptotically approaches 0 for negative numbers, asymptotically approaches +1 for positive numbers, and interpolates the origin. It's differentiable everywhere (which is critically important for backpropagation based training to work).  Note: the classic Perceptron's output is not differentiable everywhere, which means this is not going to train (or operate) exactly like a classically trained perceptron would.

In [None]:
class SingleNeuronNet(nn.Module):
    def __init__(self, input_size):
        super(SingleNeuronNet, self).__init__()
        # Define the single layer with input_size input features and 1 neuron
        self.layer = nn.Linear(input_size, 1)

    def forward(self, x):
        # pass the input through this one layer and use the sigmoid to perform a
        # soft thresholding (essentially estimating the probability that the sample
        # is from class 1).
        # Sigmoid saturates at 0 and 1, so if you are using it
        # in a 2-class classifier, the class labels need to be 0 and 1!
        output = torch.sigmoid(self.layer(x))
        return output

## 6. Define an evaluation function

If we have a model, and want to see how well it works on a 2-class classification task with data from a specified loader, this function can be used. It is specialized to this problem (two classes with class labels 0 and 1).

You can call this function before training, to see how poorly your untrained model performs.  You can call it during training, to see how much better it gets as training proceeds. You can call it after training is done, to see the results at the end of training.

It is risky to evaluate a model on its training data, as the results may be positively biased. But you can evaluate a model on validation data during training, to get a sense for the trend in performance on hopefully-independent data. And after training is done, you can evaluate the trained model on the testing data, yielding performance figures that you would report as the official "performance" of your model.

In [None]:
def evaluate(model, loader):
    # we need to switch the model into the evaluation mode - no gradients are calcaulated in this mode,
    # so the model runs faster
    model.eval()

    # create a list to store the prediction correct-nesses
    correct = []

    # process all the data in the dataloader, in batches
    for X,y in loader:
        X = X.float()
        y = y.float()
        ypred = model(X)
        # ypred is a "probability of class 1" - threshold it.
        # .squeeze() de-nests it.
        ypred = (ypred > 0.5).float().squeeze()
        # if the prediction is correct, append True; else append False
        #print(f'y {y} ypred {ypred}')
        correct += (ypred == y).tolist()

    # return the classification accuracy
    acc = sum(correct)/len(correct)
    return acc

## 7. Training function

Define a function that performs training of a supplied model, using suppled training data (via a DataLoader), for a specifid number of epochs and using a specified learning rate.

This particular trainer makes some decisions for you: SGD optimizer and binary cross-entropy loss are the biggies.

Usability note: see the use of `tqdm` in the training loop to produce a progress bar. An iteration of training can take some time, and (in general) if you're training for a lot of iterations, it might take minutes or hours or days or weeks or months or years (not kidding) for training to complete.  So some feedback that something is actually happening is useful.

In [None]:
def train(model, train_loader, val_loader=[], epochs=100, learning_rate=0.001,debug=False):
    """ train the model using data from the dataloader, for the specified
    number of epochs and with the specified learning rate. """

    # good old stochastic gradient descent optimizer
    optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

    # binary cross-entropy loss. The rule: if you used a sigmoid to generate
    # the prediction (hence, class probability), use BCELoss. If you didn't
    # (hence, logits instead of probabilities), use BCEWithLogitsLoss.
    # The net above uses a Sigmoid in its output.
    #
    # This loss and the sigmoid make this net only an "approximate" perceptron.
    # True perceptrons use hard thresholds and cannot be trained with gradient
    # descent as performed in pytorch.

    criterion = torch.nn.BCELoss()

    model.train() # put model in training mode

    running_losses = []
    training_accuracies = []
    if (val_loader): validation_accuracies = []

    for epoch in tqdm.tqdm(range(epochs)):
        optimizer.zero_grad()
        running_loss = 0
        for i,(X,y) in enumerate(train_loader):
            # x,y are tensors: make sure they are floats, not doubles
            X = X.float()
            y = y.float().unsqueeze(1) # .unsqueeze() to match nested prediction
            # run the net on X, get a y
            ypredicted = model(X)
            # calculate loss
            loss = criterion(ypredicted,y)
            # backpropagate the loss
            loss.backward()
            optimizer.step()
            # accumulate loss, calc metrics
            running_loss += loss.item()
            train_acc = evaluate(model,train_loader)
            if val_loader:
                val_acc = evaluate(model,val_loader)
        # update stats-by-epoch
        running_losses.append(running_loss)
        training_accuracies.append(train_acc)
        if (val_loader): validation_accuracies.append(val_acc)
        if debug:
            print(f'epoch {epoch}')
            print(f' training loss {running_loss}')
            print(f' training accuracy, not that we should care: {train_acc}')
            if val_loader:
                print(f' Validation accuracy: {val_acc}')
    if val_loader:
        return [running_losses, training_accuracies, validation_accuracies]
    else:
        return [running_losses, training_accuracies]


## 8. Finally: Run an experiment!

Let's run a classification task on the data. In this cell, we

1. Instantiate the data set and partition it into training, testing, and validation partitions
  * Choose two people in the set of subjects and create a `TwoPersonData` from them.
  * Then partition this into three pieces
    * 60% for training
    * 20% for validation
    * 20% for testing

    The partitioning will use random numbers, so we'll seed the random number generators also.

2. Create `DataLoaders` for batches of training and testing data

  The `DataLoader` is a generic means for *supplying the food* to the training and testing processes.  They wrap a `Dataset` or compatible data structure like the output of `random_split()` and provide means for getting "batches" of data from that dataset. IMHO, much of the value in the `DataLoader` is its utility when we explicitly separate the data based on its role, e.g. separate DataLoaders for train, test, and val.

  We're creating a `DataLoader` for training, one for validation, and one for testing.

3. Create an instance of the model.  `torchsummary.summary()` is a handy function to print out info about the model.  You have to supply the input shape as a tuple. Here, our input is a 128-D vector, so its input shape is the 1-element tuple `(128,)`.

4. Train and test!

In [None]:
# assemble data from subjects 02463 and 04261 into a TwoPersonData set
# instance

twocdata = TwoPersonData(source=d,subid0='02463',subid1='04261')

print(f'TwoPersonDataset() has {len(twocdata)} samples.')


# split the data into training, validation, and testing partitions
# (fractions must add up to 1.0)
train_set_frac = 0.6
val_set_frac = 0.2
test_set_frac = 0.2
train_set, val_set, test_set = torch.utils.data.random_split(twocdata,[train_set_frac,val_set_frac,test_set_frac])

# how many samples and labels are issued when we iterate the DataLoader?
batch_size = 8

#create a DataLoader for each partition.

train_loader = torch.utils.data.DataLoader(train_set, batch_size=batch_size, shuffle=True, drop_last=False)
val_loader = torch.utils.data.DataLoader(val_set,shuffle=True, batch_size=batch_size, drop_last=False)
test_loader = torch.utils.data.DataLoader(test_set,shuffle=True, batch_size=batch_size, drop_last=False)


# create the model with number of inputs matching our data dimensionality.
# and one neuron generating an output (class)

model = SingleNeuronNet(twocdata.dimension())

# print out a summary of the model we just created. You're required to supply
# the size of the input to torchsummary.summary(). twocdata.dimension()
# (defined above) is the size of a sample in the data
# (128), and the input dimensionality is 1D (thus, a 128-dimensional vector).
# In general, the input to a net can be any dimensionality you want, so
# torchsummary.summary() required you to use a tuple for the input size, which
# leads to the odd-looking 1D tuple (128,) here.

torchsummary.summary(model,input_size=(twocdata.dimension(),))


# hyperparameters - number of epochs of training
num_epochs = 100
# learning rate - set through experimentation (and experience)
learning_rate = 0.01

#train it
[losses,tacc,vacc] = train(model,train_loader,val_loader=val_loader,epochs=num_epochs,learning_rate=learning_rate)

# get performance on testing data
test_acc = evaluate(model,test_loader)

plt.plot(range(num_epochs),losses,'k-')
plt.title('Running loss vs. epoch')
plt.xlabel('Epoch')
plt.ylabel('Training Loss')
plt.show()


plt.plot(range(num_epochs),tacc,'r-',label='Training Acc.')
plt.plot(range(num_epochs),vacc,'g-',label='Validation Acc.')
plt.scatter(num_epochs-1,test_acc,label=f'Testing Acc.: {test_acc:6.4f}',s=24,c='blue')
plt.xlim(0,num_epochs)
plt.ylim(0.0,1.0)
plt.legend()
plt.title('Accuracies')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')

## What did we learn?

1. Implementing and training a model has several parts:
 *   Creating a Dataset class and instantiating it
 *   Wrapping it in DataLoaders for training, validation, testing
 *   Creating a model class and instantiating it
 *   Defining functions for evaluation and training
 *   Training the model
 *   Estimating the trained model's performance (_on data **not** used to train it_!)

2. There are lots of fiddly details in all of this (hence, this complete implementation - it's an example you can refer to in future practicals and assignments)

3. In the experiment above, the model was trained to distinguish between two people and training got it to a place where it performed at an accuracy of 96%-99% (the number will change as you run the experiment multiple times - but probably not by much).



# Student Work


## Part 1. Shortish answer
Answer these questions by placing text in the cell after the word "Answer: "

1. The output of `torchsummary.summary()` in the code cell above indicates 129 parameters. Explain where the number 129 comes from for this network.

STUDENT ANSWER GOES HERE
 Answer: The data has 128 dimensions and the bias has to be accounted for.  All together, this sums to 129 parameters.

2. Reach back in time to the happy days of CSE30124 and explain why it might be likely for the training accuracy to exceed the testing accuracy in the second plot above.  You don't have to provide a mathematical discussion -- just explain what is going on and why.

STUDENT ANSWER GOES HERE
 Answer: ***

## Part 2. Ten Monte Carlo runs

Copy the experiment code in the previous code cell to the cell below and modify it. This modified code needs to do several things:
 3. Run the experiment with the same configuration **ten times**. Each run **must** use a separately created instance of the SingleNeuronNet model. Don't "recycle" the model from run to run.  Since the random numbers used to generate the splits are different, the experiments will all have different results. Sometimes, this is called a "Monte Carlo experiment". According to [Wikipedia](https://en.wikipedia.org/wiki/Monte_Carlo_method):
> The name comes from the Monte Carlo Casino in Monaco, where the primary developer of the method, mathematician Stanisław Ulam, was inspired by his uncle's gambling habits.


 4. Compute and plot the _average_ training and validation accuracy, over the ten runs, as a function of epoch. Don't plot each run's accuracy curve; average them and plot the average.Also plot the _average_ testing performance as a single data point at epoch `num_epochs-1`. It is not necessary to plot the loss (average or otherwise) so you can remove the code that generates the loss plot. Summary: Your code should generate a single graph with an average training accuracy curve, an average validation accuracy curve, and an average test performance dot.

Note: the ten-experiment run takes 5 minutes in my solution.


In [None]:
# PRACTICAL CODE GOES HERE!



## Part 3 - A bigger experiment

The previous experiments used data from only two subjects of the many in the data file provided. Let's look at more subjects.

5. In the code cell below, filter the dictionary `d` to create a new dictionary `d2`, containing those entries ((`key`, `value`) pairs) where the number of samples for the subject specified by `key` is greater than or equal to 100.

Hints:
 * `len(d)` is the number of keys in `d`
 * `len(d[k])` is the number of samples for the subject `k`
 * a dictionary comprehension with an `if` clause can make this a _very_ short piece of code.


In [None]:
# PRACTICAL CODE GOES HERE!




6. Copy the original provided experiment code to the cell below, and modicy it to perform ten training and evaluation experiments.
 * For each of the ten experiments, choose a _distinct pair of subjects_ (keys) from `d2` created above) to create the dataset that is provided to the dataloaders. Leave all of the other parameters unchanged.
 * Generate three output plots.
  * One is a plot of training loss for each pair of subjects (10 curves on this plot)
  * One is a plot of training accuracy for each pair of subjects (10 more curves)
  * One is a plot of validation accuracy for each pair of subjects (10 more curves). On this plot, also plot the testing accuracy for each pair as a dot at epoch `num_epochs-1` (ten dots).
 Use a legend, and label each curve and dot with the string "subjectID1-vs-subjectID2", (e.g. "04203-vs-04459"), and make sure each experiment uses a different color so your grader can tell the experiments apart.

 Note: this experiment took five minutes in the prof's implementation.

7. Immediately below, comment on the results you obtained in this experiment.  Did they show some consistency in behavior, or were they very different in performance? Since you know that these samples came from face images of different people, what might cause differences in performance?

Answer: STUDENT ANSWER GOES HERE


In [None]:
# PRACTICAL CODE GFOES HERE
