In [None]:
%reload_ext autoreload
%autoreload 2
%matplotlib inline

In [None]:
# to process image data, we use fastai.vision
from fastai.vision.all import * 
# This is the latest point where we really should use the GPU for computing. 
# We first test, if there is a CUDA device available
print( torch.cuda.get_device_name(0) )
print( torch.__version__ )

In [None]:
# Fix random seed -> pseudo random 
torch.manual_seed(0) # for pyTorch
random.seed(0)       # for python

## Load Data

In [None]:
from pathlib import Path
import os

# First, we create a path-object that points to the data
path = Path('data/mnist_png/')
sub_directories = [f.path for f in os.scandir(path) if f.is_dir()]
     
image_files = get_image_files(path)
image_files

# Now we define a function that creates a label for each filename. In our case,
# the class is encoded in the directory name. To create the label function,
# we typically have to check the directory structure and the structure of the filenames.
def label_function(filename):
    return filename.parents[0].name

When loading the images, we can pass transfomations to change the images (for example adapt the size) and realize data augmentations. In our case the images already have the appropriate size (64x64 pixels), but we use severeal data augmentations to improve generalization. Also, we normalize the images to the statistics from ImageNet.

In [None]:
item_transforms  = []

# If images need to be cropped in size, this can be done as follows:
# A drawback of the vgg16 is the fixed input size of $244\times244$ pixels. 
item_transforms = []
batch_transforms = [*aug_transforms(size=224), Normalize.from_stats(*imagenet_stats)]

We create a random spliter object to split data data in training- and validation data. Test data is split on file level.

In [None]:
splitter = RandomSplitter(valid_pct=0.2, seed=42)

Now we create a DataBlock. Hereby, we specify first which "Blocks", i.e. types of encoding, we want to use. In our case we need an ImageBlock for the input and a CategoryBlock for the output.

In [None]:
blocks = (ImageBlock, CategoryBlock)

block = DataBlock(blocks=blocks,
                  get_items=get_image_files,
                  get_y=label_function,
                  splitter=splitter,
                  item_tfms=item_transforms,
                  batch_tfms=batch_transforms)

In [None]:
batchSize = 32
data_loader = block.dataloaders(path, bs=batchSize, num_workers=0)

To check the impact of the transformations, we can repeatedly executre the cell below. Every run creates new, transformed, images.

In [None]:
data_loader.show_batch()

## Training: VGG16

Now we know how our data looks and are convinced, that the loading of both images and labels, as well as the transformations, work as intended. It is time to train a first model. You can experiment with different architectures and metrics.

In [None]:
architektur = vgg16
metrik = error_rate

Aditional architectures to test are:

* alexnet
* vgg16
* densenet_121 (161, 169, 201)
* resnet18 (34, 50, 101, 152)

Even more architectures are in "torchvision". This can be implemented as follows:

In [None]:
import torchvision.models as torchModels

Additional metrics, that can be tested, include:
* accuracy
* error_rate
* dice

There are more metrics, which cannot be tested with our example:
* mean_squared_error
* mean_absolute_error
* mean_squared_logarithmic_error
* exp_rmspe
* explained_variance
* r2_score

In [None]:
learner = vision_learner(data_loader,
                         architektur,
                         metrics = metrik)

learner.summary()

The learning rate finder automatically tests different learning rates. Denken Sie daran, den Lerner vorher neu zu erstellen, damit Sie immer mit einer zufälligen Initialisierung der Gewichte starten.
We store the model with the randomly initialized weights for later experiments.

In [None]:
learner.save("vgg16_initial")
learner.lr_find()
gewaehlteLernrate = 1e-03
learner.load("vgg16_initial")

The fit function trains one epoch.

In [None]:
learner.fit( 1, lr=gewaehlteLernrate )

In [None]:
learner.recorder.plot_loss()

In [None]:
learner.save('vgg16_phase1')

## Ergebnisse

We can now examine the models we have learned with regard to the accuracy of the classification. First, we look at which class is most frequently interchanged with which class. Of course, this comparison makes more sense for many classes. In our case with 2 classes, we can only see whether the proportion of swaps is roughly the same.

In [None]:
interpretation = ClassificationInterpretation.from_learner(learner)
interpretation.plot_top_losses(9, figsize=(20,11))
interpretation.most_confused(min_val=1)

With the help of a confusion matrix, this can also be easily seen in our simple case.

In [None]:
interpretation.plot_confusion_matrix(figsize=(12,12), dpi=60)