# Motivation
Trying simple and straigntforward fastai v2 process.
Fastai2 (v.2.6.3) full path from training to testing and submission, with the augmentation during training, validation and testing.
Trying to use only fastai implementations, including FocalLoss(). Exception: using Albumentation library for data augmentation.

In [1]:
from fastai.vision.all import *
import pandas as pd
%matplotlib inline

# Data Augmentation
This section is not the "pure fastai". 

I got frustrated playing around fastai2 augmentations. There are only few for the item transform. More batch transformations are available, however they will not work for validation and testing. I want the same augmentations to work on validation as well as for training, and it is essential for me to have augmentations during testing because I am using TTA. 

Albumentation library offers great variety of the transformations and efficient implementation - frustration-free. In fact, I got this reference from the official fastai tutorial, so they do recognize their current limitations. https://docs.fast.ai/tutorial.albumentations.html

In [2]:
import albumentations as Alb
class AlbTransform(Transform):
    def __init__(self, aug): self.aug = aug
    def encodes(self, img: PILImage):
        aug_img = self.aug(image=np.array(img))['image']
        return PILImage.create(aug_img)
    
def get_augs(): return Alb.Compose([
    Alb.Transpose(),
    Alb.Flip(),
    Alb.RandomRotate90(),
    Alb.HueSaturationValue(
      hue_shift_limit=5, 
      sat_shift_limit=5, 
      val_shift_limit=5 ),
])
# there are many suggestions to upscale to 196x196
item_tfms = [Resize(196), AlbTransform(get_augs())]
batch_tfms = Normalize.from_stats(*imagenet_stats) 

# Create Training-Validation data loader

In [3]:
train_path='../input/histopathologic-cancer-detection/train/'
train_df = pd.read_csv('../input/histopathologic-cancer-detection/train_labels.csv')
# for interactive DEBUG: reduce amount of train images : train_df = train_df[:1024]
dls = ImageDataLoaders.from_df(train_df, path=train_path, suff='.tif', 
    item_tfms=item_tfms, batch_tfms=batch_tfms, shuffle=True)

In [4]:
# try this to see dataloader working

# dls.train.show_batch(max_n=12)
# dls.valid.show_batch(max_n=12)

# Load pre-trained model and fine tune it on our data

In [5]:
learn = vision_learner(dls, densenet121, path='.', 
    loss_func=FocalLoss(),  
    metrics=BalancedAccuracy() ) # because this dataset is unbalanced (40/60)

In [6]:
learn.fine_tune(6, freeze_epochs=3) # learn is performed using fit_one_cycle()

In [7]:
# learn.export()
learn.save('my_super_model') # I will try to continue training it in a separate notebook

# Create Testing data loader

In [8]:
test_path='../input/histopathologic-cancer-detection/test/'
test_df = pd.read_csv('../input/histopathologic-cancer-detection/sample_submission.csv')
# for interactive DEBUG: test_df = test_df[:12]
# new loader for the sake of path, which if different from the training path
tdls = ImageDataLoaders.from_df(test_df, path=test_path, suff='.tif',
    item_tfms=item_tfms, batch_tfms=batch_tfms, shuffle=False)
tst_dl = tdls.test_dl(test_df) 

In [9]:
# try it working !!! here I need to make sure aug is working, seems like not
# tst_dl.show_batch(max_n=12)

# Testing round with TTA (Test Time Augmentation)

In [None]:
preds, y = learn.tta(dl=tst_dl, n=16, use_max=False)
preds_f1 = torch.softmax(preds, dim=1)[:, 1]

# Prepare submission file

In [None]:
lb_df = pd.DataFrame({ 'label' : preds_f1 })
test_df.label = lb_df.label
test_df.to_csv(f'submission_HST.csv', header=True, index=False)
test_df.head()

# Have a nice day!
Was it easy? Upvote and comment, please :)