# Week 14, ASTR 596: Fundamentals of Data Science


## Classification with a Multi-layer Perceptron (MLP)
#### Shamelessly ripped off from Ashley Villar, who gets all the credit

### Gautham Narayan 

##### <gsn@illinois.edu>

Here, we will learn how to use one of the most common modules for building neural networks: Pytorch

In [None]:
# Run this to install some packages
!pip install astronn torch tensorflow

In [None]:
# Run this 
import torch
import torch.nn.functional as F
import matplotlib.pyplot as plt
import numpy as np
from sklearn.metrics import accuracy_score, confusion_matrix, ConfusionMatrixDisplay

# GN - deal with a stupid astroNN error
import ssl 
ssl._create_default_https_context = ssl._create_unverified_context

%matplotlib inline

## A few notes on Pytorch syntax

Pytorch datatype summary: The model expects a single precision input. You can change the type of a tensor with `tensor_name.type(`), where tensor_name is the name of your tensor and type is the `dtype`. For typecasting into single precision floating points, use `float()`. A numpy array is typecasted with `array_name.astype(type)`. For single precision, the type should be `np.float32`. Before we analyze tensors we often want to convert them to numpy arrays with `tensor_name.numpy()`

If pytorch has been tracking operations that resulted in the current tensor value, you need to detach the tensor from the graph (meaning you want to ignore things like its derivative) before you can transform it into a numpy array: `tensor_name.detach()`. Scalars can be detached with `scalar.item()`

Pytorch allows you to easily use your CPU or GPU; however, we are not using this feature. If you tensor is currently on the GPU, you can bring it onto the CPU with `tensor_name.cpu()`

# Problem 1: Understanding the Data

For this problem set, we will use the Galaxy10 dataset made available via the astroNN module. This dataset is made up of 17736 images of galaxies which have been labelled by hand. See this [link](https://astronn.readthedocs.io/en/latest/galaxy10.html) for more information. 

First we will visualize our data.

**Problem 1a** Show one example of each class as an image.



In [None]:
from astroNN.datasets import load_galaxy10
from astroNN.datasets.galaxy10 import galaxy10cls_lookup

#helpful functions:
#Load the images and labels as numbers
images, labels_original = load_galaxy10()

#convert numbers to a string
galaxy10cls_lookup(labels_original[0])

# Plot an example image from each class
#### YOUR CODE HERE

**Problem 1b** Make a histogram showing the fraction of each class

Keep only the top two classes (i.e., the classes with the most galaxies)

In [None]:
# make a histogram of the fraction of each class 
#### YOUR CODE HERE

#Only work with 1 and 2
#### YOUR CODE HERE

# How many total galaxies are in class 1 + 2
#### YOUR CODE HERE - one line

# what is the shape of the images array
#### YOUR CODE HERE - one line

This next block of code converts the data to a format which is more compatible with our neural network.

In [None]:
torch.set_default_dtype(torch.float)

# we one-hot encode (see last lecture) the labels - this is just to make the labels work with torch
labels_top_two_one_hot = F.one_hot(torch.tensor(labels_top_two - np.min(labels_top_two)).long(), num_classes=2)

# convert the images to "Tensor" objects for torch, and make sure everything is a float
images_top_two = torch.tensor(images_top_two).float()
labels_top_two_one_hot = labels_top_two_one_hot.float()

# we're going to flatten the images for our MLP - i.e. make a 2D image 1D 
# this won't matter in this toy problem, but might for more complex problems
# it does have the great advantage of making your problem faster

images_top_two_flat = #### YOUR CODE HERE - one line

# if the imaages are 69x69x3 channels what's the flattened image size?
#### YOUR CODE HERE - one line

# Normalize the flux of the images here
# this is standard - subtract mean, divide by standard deviation
# to evaluate the mean and standard deviation though, use torch instead of numpy, since we've converted
# to a tensor already

#### YOUR CODE HERE - one line

**Problem 1c** Split the data into a training and test set (66/33 split) using the train_test_split function from sklearn

In [None]:
from sklearn.model_selection import train_test_split

#### YOUR CODE HERE - one line

# given the size you computed for the images above, and the number of total objects you have in class 1 and 2, 
# what do you expect for the size of the resulting tensor?

#### YOUR CODE HERE - one line

In [None]:
# verify that your images_train is the size you expect
#### YOUR CODE HERE - ONE LINE

In [None]:
# and quickly plot the training data to see how it's structured now 

#### YOUR CODE HERE - ONE LINE

The next cell will outline how one can make a MLP with pytorch. 

**Problem 2a** Talk to a partner about how this code works, line by line. Add another hidden layer which is the same size as the first hidden layer.

This should help:
https://pytorch.org/tutorials/beginner/introyt/modelsyt_tutorial.html

In [None]:
class MLP(torch.nn.Module):
      # this defines the model
        def __init__(self, input_size, hidden_size):
            super(MLP, self).__init__()
            print(input_size,hidden_size)
            self.input_size = input_size
            self.hidden_size  = hidden_size
            self.hiddenlayer = torch.nn.Linear(self.input_size, self.hidden_size)
            self.outputlayer = torch.nn.Linear(self.hidden_size, 2)
            self.sigmoid = torch.nn.Sigmoid()
            self.softmax = torch.nn.Softmax()
            self.relu = torch.nn.ReLU()
            
        def forward(self, x):
            layer1 = self.hiddenlayer(x)
            activation = self.sigmoid(layer1)
            layer2 = self.outputlayer(activation)
            
            #### YOUR CODE TO ADD ANOTHER LAYER HERE
            # You need an activation function - it can be the same as the one you used in layer1
            # and an outputlayer 
            activation2 = 
            layer3 = 
            
            # and finally something non-linear again to get the outputlayer you just created
            # into your network's output
            output = #### YOUR CODE HERE 
            return output

The next block of code will show how one can train the model for 100 epochs. Note that we use the *binary cross-entropy* as our objective function and *stochastic gradient descent* as our optimization method.

**Problem 2b** Edit the code so that the function plots the loss for the training and test loss for each epoch.

In [None]:
# train the model
def train_model(training_data,training_labels, test_data,test_labels, model):
    # define the optimization
    fig = plt.figure()
    ax = fig.add_subplot(1,1,1)
    criterion = torch.nn.BCELoss()
    optimizer = torch.optim.SGD(model.parameters(), lr=0.007,momentum=0.9)
    for epoch in range(100):
        # clear the gradient
        optimizer.zero_grad()
        # compute the model output
        myoutput = model(training_data)
        # calculate loss
        loss = criterion(myoutput, training_labels)
        # credit assignment
        loss.backward()
        # update model weights
        optimizer.step()

        #### YOUR CODE HERE
        # first evaluate the model on the test data
        output_test = #### YOUR CODE HERE - one line
        
        # next compute the loss (in this case the binary cross-entropy) given the predicted output and the 
        # test labels
        loss_test = #### YOUR CODE HERE - one line
        
        # finally plot the loss for the training and test set (black and red)
        # the loss is a tensor, so see the note at the top befor eyou convert it to a numpy object 
        #### YOUR CODE HERE
        
        # here's a hint - this is going to print the loss at each epoch
        print(epoch,loss.detach().numpy())
    fig.show()  


The next block trains the code, assuming a hidden layer size of 100 neurons.

**Problem 2c** Change the learning rate `lr` to minimize the cross entropy score

In [None]:
model = MLP(np.shape(images_train[0])[0],100)
train_model(images_train, labels_train, images_test, labels_test, model)

Write a function called `evaluate_model` which takes the image data, labels and model as input, and the accuracy as output. you can use the `accuracy_score` function.

In [None]:
# evaluate the model
def evaluate_model(data,labels, model):
    #### YOUR CODE HERE
    return(acc)

# evaluate the model
acc = evaluate_model(images_test,labels_test, model)
print('Accuracy: %.3f' % acc)

**Problem 2d** make a confusion matrix for the test set - you can use the `confusion_matrix` and `ConfusionMatrixDisplay` we imported from sklearn

In [None]:
# evaluate the model on the test images
#### YOUR CODE HERE - one line

# convert your predictions back to a numpy object
#### YOUR CODE HERE - one line

# you can use numpy's argmax with an axis argument to get the maximum prediction for each row/image
#### YOUR CODE HERE - one line each for truth and prediction
# I'm calling the variables truth and best_class

# the confusion_matrix object just takes truth and prediction as arguments
cm = confusion_matrix(truth, best_class)
disp = ConfusionMatrixDisplay(confusion_matrix=cm)
disp.plot()
plt.show()

**Challenge Problem 3** Add a third class to your classifier and begin accounting for uneven classes. There are several steps to this:

1. Edit the neural network to output 3 classes
2. Change the criterion to a *custom criterion* function, such that the entropy of each class is weighted by the inverse fraction of each class size (e.g., if the galaxy class breakdowns are 1:2:3, the weights would be 6:3:2).

This is basically copy and paste all of the above with light modifications.