In [None]:
%load_ext autoreload
%autoreload 2

In [None]:
import sys; sys.path.extend(["../src", ".."])
import sensai
import logging
import config

c = config.get_config(reload=True)
sensai.util.logging.configureLogging(level=logging.INFO)

# Tensor Models with PyTorch-Lightning

In this notebook we show how sensAI's TensorModel wrappers can be used together with pytorch-lightning models
and trainers for even faster development and experimentation.

In [None]:
from IPython.display import display
import torch
from torch.nn import functional as F
import pytorch_lightning as pl
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from sensai.data import InputOutputArrays, DataSplitterFractional

from sensai.pytorch_lightning import PLTensorToScalarClassificationModel
from sensai.tensor_model import extractArray

import logging
logging.basicConfig(level=logging.INFO)

from config import get_config

c  = get_config()

## Loading the Data

Unlike in the mnist-based torch-lightning tutorial, here we will load the data in a more "realistic" way,
namely with pandas from disc.

In [None]:
X = pd.read_csv(c.datafile_path("mnist_train.csv.zip"))
labels = pd.DataFrame(X.pop("label")).astype(np.int64)
X = X.values.reshape(len(X), 28, 28) / 2 ** 8
X = pd.DataFrame({"mnist_image": list(X)}, index=labels.index)

display(X.head())
display(labels.head())

display("Plotting some image from the data set")
some_image = X.iloc[13, 0]
plt.imshow(some_image)
plt.show()


## Using Data Loaders in Pure PyTorch Lightning

First, let us see how training would proceed in pure pytorch-lightning.

We will use sensaAI only for obtaining torch data loaders (which otherwise would require a few more lines of code)
by transforming the data frames to arrays, splitting them and converting them to loaders.

In [None]:
TEST_FRACTION = 0.2
VALIDATION_FRACTION = 0.1

full_ds = InputOutputArrays(extractArray(X), extractArray(labels))

full_train_ds, test_ds = DataSplitterFractional(1-VALIDATION_FRACTION).split(full_ds)
train_ds, val_ds = DataSplitterFractional(1-VALIDATION_FRACTION).split(full_train_ds)
train_dataloader = train_ds.toTorchDataLoader()
val_dataloader = val_ds.toTorchDataLoader()
test_dataloader = test_ds.toTorchDataLoader()

Now that we have the data loaders, let us forget about sensAI for the moment. We create the model declaration and
trainer with pytorch-lightning and fit on the MNIST data

In [None]:
class MNISTModel(pl.LightningModule):

    def __init__(self):
        super(MNISTModel, self).__init__()
        self.l1 = torch.nn.Linear(28 * 28, 10)

    def forward(self, x: torch.Tensor):
        x = x.float()
        x = torch.relu(self.l1(x.view(x.size(0), -1)))
        return F.log_softmax(x, dim=1)

    def training_step(self, batch, *args):
        x, y = batch
        loss = F.nll_loss(self(x), y)
        return loss

    def validation_step(self, batch, *args):
        x, y = batch
        loss = F.nll_loss(self(x), y)
        return loss

    def configure_optimizers(self):
        return torch.optim.Adam(self.parameters(), lr=0.02)

In [None]:
mnist_model = MNISTModel()

trainer = pl.Trainer(max_epochs=5, progress_bar_refresh_rate=20)
trainer.fit(mnist_model, train_dataloader, val_dataloader)

Let us pick some images from the validation set and look at the results

In [None]:
mini_test_set = test_dataloader.dataset[10:20]
test_images, test_labels = mini_test_set

display(mnist_model(test_images).argmax(axis=1))
display(test_labels)

In [None]:
import sklearn

sklearn.metrics.accuracy_score(test_ds.outputs, mnist_model(test_dataloader.dataset[:][0]).argmax(axis=1))

## Wrapping the Model with sensAI

Now let us wrap the model with sensAI interfaces. Since sensAI offers dedicated wrappers
for pytorch-lightning models, this requires only one additional line of code.

This model maps a tensor to a single label, so the correct class to wrap it with is `PLTensorToScalarClassificationModel`,
where the `PL` prefix stands for pytorch-lightning.

In [None]:
mnist_model = MNISTModel()
trainer = pl.Trainer(max_epochs=3, progress_bar_refresh_rate=20)
sensaiMnistModel = PLTensorToScalarClassificationModel(mnist_model, trainer, validationFraction=VALIDATION_FRACTION)

NB: Even without dedicated wrappers, it would require only a few more lines of code to get a custom implementation of
a suitable sensAI base class that wraps one's model.

With the wrapped model, we can fit directly on the data frames. We don't lose any of the niceties that pytorch-lightning
brings to the game (both the original model and the trainer are available in `sensaiMnistModel`). By wrapping the
model and trainer we gain all the safety, transparency, flexibility in feature engineering as well
as extensive support for model evaluation that sensAI is all about.

In [None]:
display(labels.dtypes)
np.stack(np.stack(labels.values, axis=1).squeeze(), axis=0).shape

In [None]:
ioData = sensai.InputOutputData(X, labels)
trainData, testData = DataSplitterFractional(0.8).split(ioData)

sensaiMnistModel.fitInputOutputData(trainData)

The wrapped model performs predictions on data frames. Let us take some points from the training set,
perform a prediction on them and have a look at the true labels

In [None]:
display("Predicted data frame")
display(sensaiMnistModel.predict(testData.inputs.iloc[:10]))
display("True labels data frame")
display(testData.outputs.iloc[:10])

## Evaluating the Model

In [None]:
evaluator = sensai.evaluation.VectorClassificationModelEvaluator(trainData, testData)
evaluator.evalModel(sensaiMnistModel).getEvalStats().metricsDict()