<a href="https://colab.research.google.com/github/spetryk/ai4all2020/blob/master/3-Detecting_Fake_Brain_Scans.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
## For Google CoLab to set everything up
! git clone https://github.com/spetryk/ai4all2020.git
%cd ai4all2020/
%mkdir data

In [None]:
%matplotlib inline

# Functions for training neural network
from tools import *
from scans_utils import MRIDataset
import torch
import torch.optim as optim
import torch.nn as nn
from torch.utils.data import DataLoader
import os


# Functions for visualizations
import torchvision
import torchvision.utils as vutils
import matplotlib.pyplot as plt
import numpy as np
from tqdm.notebook import tqdm

import warnings
warnings.filterwarnings("ignore")

# Loading the Dataset

The dataset we're using consists of real and fake MRI brain scans. The real brain scans come from a dataset called Brainomics, downloadable here: https://osf.io/vhtf6/files/. To make the fake brain scans, I trained a GAN much like we did yesterday.

When you run the next cell for the first time, it will take some time to download the ~1GB dataset.

Given a `dataset` variable, you can access the ith image and the label simply:
```
image, label = dataset[i]
```

You can compute the number of data points in a dataset by calling `len(dataset)`, and if you want to see what an `image` looks like, you can run
```
plt.matshow(image)
plt.show()
```

In [None]:
train_dataset = MRIDataset("data", train=True)
train_loader = DataLoader(train_dataset, batch_size=64)

val_dataset = MRIDataset("data", train=False)
val_loader = DataLoader(val_dataset, batch_size=64)

# Exercise 1: Inspecting the Data

It's always important to understand what your dataset looks like, since many real-world datasets have oddities that can affect your machine learning algorithms. Here are a few simple things you might want to look into:

* How many MRI images are in the dataset?
* How many images are in the training set, and how many are in the validation set? Is this a good ratio for this dataset, and why might we want to use a different ratio?
* A dataset has a balanced class distribution if there are the same number of images in each class (here, real and fake). Unbalanced datasets are more difficult to train on, since the model can learn how to do well on only the majority class. Is this dataset balanced?
* If we use a classifier that always guesses "fake" no matter what image you present it with, what percent accuracy will that classifier get? What if the classifier guesses randomly between "real" and "fake"?
* What is the mean pixel value of the real images? What is the mean pixel value of the fake images?
* If we use a classifier that distinguishes between real and fake images by just looking at the mean pixel value of the image, what accuracy will that classifier get?

You may notice that some of the fake images in this dataset do not look very realistic. This is because I trained a single GAN without fiddling with it much, and it only trained for about one day on cheap hardware. As you saw in the previous notebook, GANs can produce highly realistic images if trained with enough computational resources.

In [None]:
image, label = train_dataset[1]
print("Label: {} (i.e. '{}'')".format(label, "real" if label else "fake"))
plt.matshow(image)
plt.show()

image, label = train_dataset[11]
print("Label: {} (i.e. '{}'')".format(label, "real" if label else "fake"))
plt.matshow(image)
plt.show()

# Exercise 2: Build your model
A neural network consists of a sequence of layers: the input (in this case an image that we need to detect as real or fake) passes to the first layer, and then the output of the first layer is used as the input of the second layer, etc. At the very end, the last layer should output a single number that indicates the model's guess: a number close to 0 means the model thinks the image is fake, and a number close to 1 means the model thinks the image is real.

There are many different types of neural network layers. The simplest is called a *linear* layer. A linear layer multiplies every number in the input by some weight, which is adjusted during training, and then it sums up the result of all those multiplications. We want the model to predict something close to 0 to indicate "fake," and something close to 1 to indicate "real." However, the output of a linear layer can be any number. Therefore, we use a "sigmoid" function to squish the linear layer's output into the range (0,1). A classifier with a single linear layer can be created in PyTorch as follows:
```
model = nn.Sequential(nn.Flatten(),
                      nn.Linear(image_shape[0] * image_shape[1], 1),
                      nn.Sigmoid())
```
where `image_shape` is the shape, in pixels, of an individual image. Try out this model, or modify it in any way you like.

In [None]:
# image_shape is the number of pixels wide and tall each scan is
image_shape = (120, 64)

model = nn.Sequential(nn.Flatten(),
                      nn.Linear(image_shape[0] * image_shape[1], 1),
                      nn.Sigmoid())

# set up the loss function and optimizer
calculate_loss = nn.BCELoss()
optimizer = optim.Adam(model.parameters(), lr=1e-3, weight_decay=1e-4)

# Exercise 3: Fill in the Evaluation Loop
This function measures the performance of the input model on the input dataset. Fill in the TODOs.
To calculate the loss function, you can use a function called `calculate_loss`, which takes the model's predictions, as well as the ground truth labels. For example, if the model predicts `[0.2, 0.8, 0.9]` for three real images, you could compute the loss by running
```
loss = calculate_loss([0.2, 0.8, 0.9], [1, 1, 1])
```
To figure out how to compute the correct loss values, keep in mind that `calculate_loss` will return a small value (i.e. the model is doing well) if the predictions and the true labels match.

In [None]:
def evaluate(model, data_loader):
    n_correct = 0
    n_images = 0
    losses = []
    for images, labels in data_loader:
        labels = labels.float()
        
        predictions =               # TODO use the model to make a prediction on images
        loss =                      # TODO calculate the loss on this batch
        
        # save the loss from this iteration through the loop
        losses.append(loss.item())
        predictions = predictions.view(-1)
        
        number_correct_this_round = # TODO count how many images the model got correct

        # keep track of the total number of images the model got correct, and the total number of images
        n_correct += number_correct_this_round.item()
        n_images += labels.numel()
    print("Average Loss:", np.mean(losses))
    # calculate the accuracy as <number correct> / <total number of images>
    print("Accuracy: {:3.2f}%".format(n_correct / n_images * 100))

# Exercise 4: Fill in the Training Loop
As in the evaluation loop, we need to compute the model's predictions as well as the loss on each batch of data. The loss is then used to improve the model (this happens in `loss.backward()` and `optimizer.step()`). The training loop steps through all of the training data, improving the model at every iteration. It does this many times -- in this case for 10 "epochs."

In [None]:
# recall that the detector model is in a variable called `model`
step = 0
for epoch in range(10):
    for images, labels in train_loader:
        optimizer.zero_grad()
        labels = labels.float()
        
        predictions = # TODO predict whether the brain scans in the images variable are real or fake
        loss =        # TODO compute the loss on this batch
        
        # update the model by tweaking it to improve the loss a little bit
        loss.backward()
        optimizer.step()
        
        # print out the loss every once in a while
        if step % 100 == 0:
            print("loss:", loss.item())
        step += 1
    # see how well the model does on the validation dataset
    evaluate(model, val_loader)

# Exercise 5: Analyze the Trained Model
It's always a good idea to take a look at the trained model in order to understand where it does well and where it does poorly. Here are a few questions you might consider:
* What is the final accuracy of the model on the validation data? What is the final accuracy of the model on the training data? Why is there a discrepancy?
* What is the accuracy of the model on real images in the validation data? What about the accuracy on fake images? If these numbers are very different, why might that be?
* What are some real images that the model classified as fake? Why do you think they were classified as fake? What about fake images that the model classified as real? Do those images look particularly realistic to you?
* What might you change about the training procedure to improve performance? Should you train for more epochs? Larger learning rate? Different batch size?
* What might you change about the model to improve performance? Should you add more layers? Different layers (what about convolutions?)