# Intro to PyTorch for classification tasks

We'll use Pytorch together with supporting libraries `tensorlayers` and `skorch` to train a regressor that will map synthetic accoustic seismic waveforms and their corresponding velocity profiles. The dataset was put together by Lukas Mosser and is hosted on github here: [https://github.com/LukasMosser/SNIST](https://github.com/LukasMosser/SNIST)

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import datetime
import os

In [None]:
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

In [None]:
urls = [
        'https://raw.githubusercontent.com/LukasMosser/SNIST/master/data/train/train_amplitudes.npy',
        'https://raw.githubusercontent.com/LukasMosser/SNIST/master/data/train/train_velocities.npy',
        'https://raw.githubusercontent.com/LukasMosser/SNIST/master/data/test/test_amplitudes.npy',
        'https://raw.githubusercontent.com/LukasMosser/SNIST/master/data/test/test_velocities.npy',
        'https://raw.githubusercontent.com/LukasMosser/SNIST/master/data/test/test_amplitudes_noise_1.npy',
        'https://raw.githubusercontent.com/LukasMosser/SNIST/master/data/test/test_amplitudes_noise_2.npy'
    ]

Numpy allows you to point at URL data sources. It'll take care of downloading them and keeping reference of where they are with respect to a root folder specified by the user.

In [None]:
ds = np.DataSource('../data/')

train_amplitudes = np.load(ds.open(urls[0], mode='rb'))
train_velocities = np.load(ds.open(urls[1], mode='rb'))

In [None]:
plt.imshow(train_amplitudes[0], aspect=0.06)

In [None]:
plt.plot(train_velocities[0])

Let's define some dataset parameters. Note that these come from the [SNIST](https://github.com/LukasMosser/SNIST) properties.

In [None]:
N_train = 600    # Number of total training examples
N_val = 150      # Number of samples used for validation
N_samples = 271  # Number of samples in time
N_recorders = 20 # Number of recording stations
N_target = 9     # Number of layers in the target velocity model
N_z = 360        # Number of grid blocks in z-dimension (only used for visualisation)

Now some neural network parameters:

In [None]:
lr = 1e-2                  # Learning rate
batch_size = N_train-N_val # Batchsize used in training - do full batch evaluation because of small data
N_epochs = 200             # Number of epochs to train for

In [None]:
import torch
from torch import nn

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
device

In [None]:
# Number of neurons in hidden layer
n_hidden = 50

# Setup a sequential network
model = nn.Sequential(
            nn.Linear(N_recorders*N_samples, n_hidden),
            nn.Sigmoid(),
            nn.Linear(n_hidden, N_target),
        )

model.to(device)
print(model)

We need to make sure the inputs are standarized:

In [None]:
train_mean, train_std = train_amplitudes.mean(), train_amplitudes.std()
train_vel_max = train_velocities.max()

In [None]:
X = train_amplitudes - train_mean
X /= train_std

y = train_velocities / train_vel_max

X_train = X[:-N_val]
y_train = y[:-N_val]


X_test = X[N_train-N_val:]
y_test = y[N_train-N_val:]

Let's confirm that the shapes of these matrices are OK:

In [None]:
X_train.shape, y_train.shape, X_test.shape, y_test.shape

We want to make the training data to be 1D to be allowed through this fully connected neural network.

In [None]:
X_train = X_train.reshape((N_train-N_val, N_samples*N_recorders))
X_test = X_test.reshape((N_val, N_samples*N_recorders))

Now we can implement 

In [None]:
import torch.utils.data as utils

X_train_ = torch.tensor(X_train, dtype=torch.float).to(device)
y_train_ = torch.tensor(y_train, dtype=torch.float).to(device)

traindata = utils.TensorDataset(X_train_, y_train_)
trainloader = utils.DataLoader(traindata)

In [None]:
import torch.optim as optim

optimizer = optim.SGD(model.parameters(), lr=1e-3)
criterion = nn.MSELoss()

In [None]:
epochs = 10

for epoch in range(epochs):
    epoch_loss = 0.0
    for i, data in enumerate(trainloader):
        inputs, labels = data
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        epoch_loss += loss.item()
    print(f"# {epoch+1}  Loss {epoch_loss}")
print('Finished Training')

We can make predictions with the trained model:

In [None]:
X_test_ = torch.tensor(X_test, dtype=torch.float).to(device)
y_test_ = torch.tensor(y_test, dtype=torch.float).to(device)

valdata = utils.TensorDataset(X_test_, y_test_)
valloader = utils.DataLoader(valdata)

In [None]:
y_pred = np.array([model(xi).cpu().detach().numpy() for xi, yi in valloader])
y_pred = np.squeeze(y_pred)

In [None]:
plt.plot(y_pred[0], label='Predicted velocity')
plt.plot(y_test[0], label='True velocity')
plt.legend()

You can save the model like this:

In [None]:
torch.save(model.state_dict(), 'model.ckpt')

<div style="background: #e0ffe0; border: solid 2px #d0f0d0; border-radius:3px; padding: 1em; color: darkgreen">
<h3>Exercise</h3>

- Create a training function to perform the whole training loop. For now, it should receive as parameters `X_train` and `y_train` and return a trained model
- Put as parameter the number of neurons that are part of the single hidden layer and see how the training performance (loss) varies
- Add more hidden layers and train models with different network configurations
</div>

In [None]:
# Your code here.



A couple of extra libraries can help the process of making neural networks with Pytorch:
- `Torchlayers`: Figures out dimensionality of input and output of each layer
- `skorch`: Provides a `scikit-learn` compatible object from a `Pytorch` network to incorporate in scikit-learn workflows

In [None]:
import torchlayers as tl
from skorch import NeuralNetRegressor

A model can now be created like this:

In [None]:
# Define the architecture of the network
net_arch = torch.nn.Sequential(
    tl.Linear(50),  # specify ONLY out_channels
    tl.Sigmoid(),  # activation from first hidden layer
    tl.Linear(10),  # specify ONLY out_channels
    tl.Sigmoid(), # activation from second hidden layer
    tl.Linear(N_target),  # Output for 10 classes
)

# Build the network
net = tl.build(net_arch, torch.randn(1, *X_train[0].shape)) # torchlayers needs an input example to figure out the internal dimensions of the network

Now we can create the scikit-learn compatible object:

In [None]:
model = NeuralNetRegressor(
    net,
    max_epochs=10,
    lr=0.01,
)

model.fit(X_train, y_train)

In [None]:
y_proba = model.predict(X_test)
y_proba[0]

In [None]:
plt.plot(y_pred[0], label='Predicted velocity')
plt.plot(y_test[0], label='True velocity')
plt.legend()

# Convolutional Neural Networks
What'as a Convolutional Neural Network?
 - Keep these cheatsheets at hand: https://github.com/afshinea/stanford-cs-230-deep-learning
 
![alt text](../images/convolution-layer-a.png)

## What's max-pooling?
![alt text](../images/max-pooling-a.png)

In [None]:
X = train_amplitudes - train_mean
X /= train_std

y = train_velocities / train_vel_max

X_train = X[:-N_val]
y_train = y[:-N_val]


X_test = X[N_train-N_val:]
y_test = y[N_train-N_val:]

In [None]:
# torch.nn and torchlayers can be mixed easily
net_arch = torch.nn.Sequential(
    tl.Conv(32),  # specify ONLY out_channels
    nn.ReLU(),  # use torch.nn wherever you wish
    tl.GlobalMaxPool(),  # Known from Keras
    tl.Linear(5), # Add a fully connected hidden layer
    tl.ReLU(), # Activate the hidden layer
    tl.Linear(9),  # Output for target linear output
)

net = tl.build(net_arch, torch.randn(1, *X_train[0].shape))

In [None]:
model = NeuralNetRegressor(
    net,
    max_epochs=10,
    lr=0.01,
)

model.fit(X_train, y_train)

In [None]:
y_proba = model.predict(X_test)
y_proba[0]

In [None]:
plt.plot(y_pred[0], label='Predicted velocity')
plt.plot(y_test[0], label='True velocity')
plt.legend()

## Model persistence

The easiest way to save a model is to `pickle` the trained model object.

In [None]:
import pickle

# saving
with open('torch_regressor.pkl', 'wb') as f:
    pickle.dump(model, f)

# loading
with open('torch_regressor.pkl', 'rb') as f:
    model = pickle.load(f)

In [None]:
model.predict(X_test)