# Intro to image classification with PyTorch

**Make sure you look at [`Intro to image classification`](Intro_to_image_classification.ipynb) before coming here.**

We'll use `Pytorch` on its own in this notebook. See the accompanying notebook, [`Intro to image classification with skorch`](Intro_to_image_classification_with_skorch.ipynb) to see some helper libraries.

In [None]:
import matplotlib.pyplot as plt
import numpy as np

## The fossil dataset

Let's generate a workflow to classify images using a CNN.
We'll make use of a collection of functions in `utils.py` to help process the images found in the `data/fossils` folder.

In [None]:
X = np.load('../data/fossils/X.npy')
y = np.load('../data/fossils/y.npy')

In [None]:
from sklearn.model_selection import train_test_split

X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.15, random_state=42)

In [None]:
X_train.shape

In [None]:
plt.imshow(X_train[1].reshape(32,32))
plt.colorbar()

## Augmentation

Neural networks like a lot of data. It seems like it should help to increase the size of the dataset... but without having to collect more examples. 

For example, let's flip the image above:

In [None]:
img = X_train[1].reshape(32,32)

flipped = np.flip(img, axis=1)

plt.imshow(flipped)

In [None]:
from scipy.ndimage import zoom

cropped = zoom(flipped, 1.1)

cropped = cropped[1:-2, 1:-2]

plt.imshow(cropped)

<div style="background: #e0ffe0; border: solid 2px #d0f0d0; border-radius:3px; padding: 1em; color: darkgreen">
<div class="alert alert-success">
<h3>Exercise</h3>

- Write a function to randomly flip and crop each record in `X_train`. (It's okay to use a loop for this.)
- Add your new flipped records to `X_train`, and their labels to `y_train`.
</div>
</div>

In [None]:
# YOUR CODE HERE



In [None]:
X_train, y_train = augment(X_train, y_train)

In [None]:
plt.imshow(X_train[499].reshape(32, 32))

In [None]:
X_train.shape

In [None]:
y_train[499]

## `sklearn.neural_network`

We'll first train a fully connected network. This requires the images to be 1D vectors, like the ones we have, but this means we'll lose some of the 2D spatial properties... Until we use a convolutional neural network!

See the notebook [Intro to image classification](Intro_to_image_classification.ipynb).

In [None]:
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import classification_report

clf = MLPClassifier(hidden_layer_sizes=[100, 24], max_iter=500)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_val)
print(classification_report(y_val, y_pred))

We'll start by replicating this in `pytorch`.

## The `pytorch` approach

We'll need to encode the target variable so that the classes are represented by integers. We can use scikit-learn's `LabelEncoder` for that:

In [None]:
from sklearn.preprocessing import LabelEncoder

encoder = LabelEncoder()
encoder.fit(np.append(y_train, y_val))

y_train = encoder.transform(y_train)
y_val = encoder.transform(y_val)

In [None]:
y_val

Now we can make a `Sequential` model and train it.

In [None]:
import torch
from torch import nn

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
device

Define the architecture of the network

In [None]:
class FossilNet(torch.nn.Module):
    def __init__(self):
        super(FossilNet, self).__init__()
        self.fc1 = nn.Linear(1024, 100)
        self.act1 = nn.ReLU()
        self.fc2 = nn.Linear(100, 24)
        self.act2 = nn.ReLU()
        self.out = nn.Linear(24, 3)
        # nb Criterion includes softmax.
        
    def forward(self, x):
        z1 = self.fc1(x)
        a1 = self.act1(z1)
        z2 = self.fc2(a1)
        a2 = self.act2(z2)
        z3 = self.out(a2)
        return z3

model = FossilNet()

In [None]:
model

Now define the loss function, which Torch calls the 'criterion', and the optimizer:

In [None]:
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(),
                            lr = 0.003,
                            weight_decay=0.01,  # L2 regularization.
                            momentum=0.9,
                           )

Get the data ready for Torch:

In [None]:
X_train_ = torch.Tensor(X_train).to(device)
y_train_ = torch.Tensor(y_train).type(torch.LongTensor).to(device)
X_val_ = torch.Tensor(X_val).to(device)
y_val_ = torch.Tensor(y_val).type(torch.LongTensor).to(device)

Now we can write the training loop:

In [None]:
epochs = 500
vals, trains = [], []
idx = np.arange(0, y_train.size)

for epoch in range(epochs):
    np.random.shuffle(idx)
    X_train_ = X_train_[idx]
    y_train_ = y_train_[idx]
    
    # Train.
    model.train()
    optimizer.zero_grad()
    y_pred = model(X_train_)  # No batches.
    loss = criterion(y_pred, y_train_)  
    loss.backward()
    optimizer.step()
    
    # Capture training loss.
    print(f"Epoch {epoch}/{epochs}: train loss: {loss.item():.3f}")
    trains.append(loss.item())

    # Capture validation loss.
    model.eval()
    with torch.no_grad():
        y_pred = model(X_val_)
        loss = criterion(y_pred, y_val_)    
        vals.append(loss.item())

And inspect the history:

In [None]:
plt.plot(trains, label='Training loss')
plt.plot(vals, label='Validation loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()

[Validation loss lower than training loss?](https://twitter.com/aureliengeron/status/1110839223878184960)

This can happen for a few reasons:

- The training loss is measured during the epoch, while validation loss is measured after it. So the model used in validation is a bit better.
- The training loss includes the regularization penalty, whereas the validation loss does not.
- The validation data might be more predictable than the training data.

## Evaluation

In [None]:
y_out = model(X_val_).detach().numpy()

But these are not probabilities:

In [None]:
np.sum(y_out, axis=-1)

In [None]:
from scipy.special import softmax

y_prob = softmax(y_out, axis=-1)

np.sum(y_prob, axis=-1)

That's better!

Now we can find the argmax for each record:

In [None]:
y_pred = np.argmax(y_prob, axis=-1)

In [None]:
print(classification_report(y_val, y_pred))

## Class probability

The network can emit probabilities. Each instance's vector contains the probability of each class. The argmax of this gives the predicted class.

In our poor result, the classes are almost equally likely.

In [None]:
import utils

utils.visualize(X_val, y_val, y_prob,
                ncols=5, nrows=3,
                shape=(32, 32),
                classes=encoder.classes_)
plt.show()

## Convolution

Convolutional networks replace the weights with kernels, and the multiplication step with convolution.

Let's see what convolution can do to an image.

In [None]:
plt.imshow(img)

In [None]:
kernel = np.array([[-1, 0, 1],   # Sobel edge detector
                   [-2, 0, 2],
                   [-1, 0, 1]])

plt.imshow(kernel)

In [None]:
from scipy.signal import convolve2d

attr = convolve2d(img, kernel.T, mode='valid')

plt.imshow(attr)

Here's a nice resource on ConvNets: https://cs231n.github.io/convolutional-networks/

## A convolutional neural network

In [None]:
class FossilCNN(torch.nn.Module):
    def __init__(self):
        super(FossilCNN, self).__init__()

        self.conv1 = nn.Conv2d(1, 24, (3, 3), padding=0)
        self.act1 = nn.ReLU()
        self.bn1 = nn.BatchNorm2d(24)

        self.conv2 = nn.Conv2d(24, 8, (3, 3), padding=0)
        self.act2 = nn.ReLU()
        self.bn2 = nn.BatchNorm2d(8)

        self.fc = nn.Linear(8 * 28 * 28, 3)
        
        
    def forward(self, x):
        x = self.conv1(x)
        x = self.act1(x)
        x = self.bn1(x)
        x = self.conv2(x)
        x = self.act2(x)
        x = self.bn2(x)
        x = torch.flatten(x, start_dim=1)
        x = self.fc(x)
        return x

model = FossilCNN()

In [None]:
model

In [None]:
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(),
                            lr = 0.003,
                            weight_decay=0.01,  # L2 regularization.
                            momentum=0.9,
                           )

In [None]:
X_train_ = torch.Tensor(X_train.reshape(-1, 1, 32, 32)).to(device)
y_train_ = torch.Tensor(y_train).type(torch.LongTensor).to(device)
X_val_ = torch.Tensor(X_val.reshape(-1, 1, 32, 32)).to(device)
y_val_ = torch.Tensor(y_val).type(torch.LongTensor).to(device)

In [None]:
epochs = 100
vals, trains = [], []
idx = np.arange(0, y_train.size)

for epoch in range(epochs):
    np.random.shuffle(idx)
    X_train_ = X_train_[idx]
    y_train_ = y_train_[idx]
    
    # Train.
    model.train()
    optimizer.zero_grad()
    y_pred = model(X_train_)  # No batches.
    loss = criterion(y_pred, y_train_)  
    loss.backward()
    optimizer.step()
    
    # Capture training loss.
    print(f"Epoch {epoch}/{epochs}: train loss: {loss.item():.3f}")
    trains.append(loss.item())

    # Capture validation loss.
    model.eval()
    with torch.no_grad():
        y_pred = model(X_val_)
        loss = criterion(y_pred, y_val_)    
        vals.append(loss.item())

In [None]:
plt.plot(trains, label='Training loss')
plt.plot(vals, label='Validation loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()

## Evaluation

<div style="background: #e0ffe0; border: solid 2px #d0f0d0; border-radius:3px; padding: 1em; color: darkgreen">
Can you evaluate this model? Write a function to handle everything. You will need to:

- Compute the model output to make `y_out` (don't forget to detach the tensor).
- Use the `softmax` function to turn the output into probabilities, `y_pred`.
- Get the argmax of the probabilities to make `y_pred`.
- Return `y_prob` and `y_pred`.
- Print a classification report.
</div>

In [None]:
def predict(X, model):
    X = torch.Tensor(X.reshape(-1, 1, 32, 32)).to(device)
    """Use `model` to predict on `X`."""
    # YOUR CODE HERE
    
    
    return y_prob, y_pred

In [None]:
utils.visualize(X_val, y_val, y_prob,
                ncols=5, nrows=3,
                shape=(32, 32),
                classes=encoder.classes_
               )

## The kernels

In [None]:
w1 = model.conv1.weight.detach().numpy()
w1.shape

In [None]:
fig, axs = plt.subplots(nrows=3, ncols=8, figsize=(12, 6))
for w, ax in zip(w1, axs.ravel()):
    ax.imshow(np.sum(w, axis=0))
    ax.axis('off')

In [None]:
w2 = model.conv2.weight.detach().numpy()

fig, axs = plt.subplots(nrows=1, ncols=8, figsize=(12, 3))
for w, ax in zip(w2, axs.ravel()):
    ax.imshow(np.sum(w, axis=0))
    ax.axis('off')

## Model persistence and future inference

The easiest way to save a model is with `torch.save`, but `state_dict` is just an `OrderedDict` so you can do anything you want with it.

In [None]:
torch.save(model.state_dict(), './fossilnet.pt')

Later, you or someone else can load it. Note that you need to instantiate the model first; the state dictionary does not contain the architecture.

In [None]:
model = FossilCNN()
model.load_state_dict(torch.load('./fossilnet.pt'))
model.eval()

In [None]:
from PIL import Image
import io
import requests

url = "https://www.treasuremountainmining.com/image/cache/data/2017/08-17/Adam30/EB0817AMMOR4-650x650.jpg"
r = requests.get(url)
img = Image.open(io.BytesIO(r.content))
img

In [None]:
img = img.convert(mode='L')
img.thumbnail(size=(32, 32))
img

In [None]:
ima = np.asarray(img) / 255
ima.shape

In [None]:
x = torch.Tensor(ima.reshape(-1, 1, 32, 32)).to(device)

y_prob, y_pred = predict(x, model)

print(f"Class {encoder.classes_[y_pred].item().upper():} with p={np.max(y_prob):.3f}")

---

&copy; 2020 Agile Scientific