- Data augmentation technique:
    - Normalization
    - Mixup
    - Progressive resizing approach
    - test time augmentation

- Going to train a model from scratch (not using transfer learning)
- Using a subset of ImageNet


## Imagenette
If a dataset taking longer to do an experiment, think about how you could cut down your dataset, or simplify your model, to impore your expermentation speed.

In [1]:
# get started with this dataset:
from fastai.vision.all import *
path = untar_data(URLs.IMAGENETTE)

In [2]:
# get our dataset into a DataLoaders object, using the presizing:

dblock = DataBlock(
    blocks=(ImageBlock(), CategoryBlock()),
    get_items=get_image_files,
    get_y=parent_label,
    item_tfms=Resize(460),
    batch_tfms=aug_transforms(size=224, min_scale=0.75)
)

dls = dblock.dataloaders(path, bs=64)

torch.linalg.solve has its arguments reversed and does not return the LU factorization.
To get the LU factorization see torch.lu, which can be used with torch.lu_solve or torch.lu_unpack.
X = torch.solve(B, A).solution
should be replaced with
X = torch.linalg.solve(A, B) (Triggered internally at  ../aten/src/ATen/native/BatchLinearAlgebra.cpp:766.)
  ret = func(*args, **kwargs)


In [3]:
# do a training run that will serve as a baseline:

model = xresnet50()
learn = Learner(dls, model, loss_func=CrossEntropyLossFlat(), metrics=accuracy)
learn.fit_one_cycle(5, 3e-3)

epoch,train_loss,valid_loss,accuracy,time
0,1.675329,1.779331,0.48581,06:27
1,1.24867,1.509915,0.5295,06:11
2,0.953398,1.093729,0.631068,06:11
3,0.740561,0.689591,0.771098,06:11
4,0.605196,0.561991,0.820388,06:12


## Normalization
Input data is normalized helps training model:
- has a mean of 0
- a standard deviation of 1

In computer vision libraries use values between but not normalized
- 0 and 255 pixels
- 0 and 1 


In [4]:
# grab a batch of data:
x,y = dls.one_batch()

# look at those values
# by averaging over all axes except for the channel axis(axis 1):
x.mean(dim=[0,2,3]), x.std(dim=[0,2,3])

(TensorImage([0.4680, 0.4675, 0.4215], device='cuda:0'),
 TensorImage([0.2848, 0.2808, 0.2951], device='cuda:0'))

As expected, the mean and standard deviation are not very closs to the desired values.

To normalize need to pass to this transform the mean and standard deviation 

In [6]:
# add this transform
def get_dls(bs, size):
    dblock = DataBlock(blocks=(ImageBlock, CategoryBlock),
                       get_items=get_image_files,
                       get_y=parent_label,
                       item_tfms=Resize(460),
                       batch_tfms=[*aug_transforms(size=size, min_scale=0.75),
                                   Normalize.from_stats(*imagenet_stats)])
    return dblock.dataloaders(path, bs=bs)

dls = get_dls(64, 224)

# take a look at one batch now:
x,y = dls.one_batch()
x.mean(dim=[0,2,3]),x.std(dim=[0,2,3])

(TensorImage([-0.1885, -0.1044, -0.0593], device='cuda:0'),
 TensorImage([1.2236, 1.2316, 1.2718], device='cuda:0'))

In [7]:
# check what effect this had on training our model:

model = xresnet50()
learn = Learner(dls, model, loss_func=CrossEntropyLossFlat(), metrics=accuracy)
learn.fit_one_cycle(5, 3e-3)

epoch,train_loss,valid_loss,accuracy,time
0,1.697637,1.5934,0.499253,06:12
1,1.31428,1.659077,0.54481,06:11
2,0.99084,0.872144,0.717326,06:11
3,0.760564,0.726331,0.765497,06:11
4,0.614061,0.594109,0.813294,06:11


Helped little here, normalization becomes expecially important when using pretrained models.

When distribute a model, need to also distribute the statistics used for normalization. If using pretrained model, find out what normalization statistics they used and match them.

When using a pretrained model through cnn_learner, the fastai library automatically adds the Normalize transform

## Progressive Resizing
All our training up until now has been done at size 224. We could have begun training at a smaller size before going to that: Progressive resizing: start training using small images(enhence speed) and end training using large images(enhence accuracy).

We are trying to get our model to learn to do something a little bit different from what it has learned to do before, by changing the size of image: reminds Transfer Learning

Another form of data augmentation.

First creat a get_dls func that takes an image size and a batch size, returns DataLoaders

In [8]:
# create your DataLoaders with a small size
dls = get_dls(128,128)
learn = Learner(dls, xresnet50(), loss_func=CrossEntropyLossFlat(), 
                metrics=accuracy)

# train for fewer epochs than you might otherwise do:
learn.fit_one_cycle(4, 3e-3)

epoch,train_loss,valid_loss,accuracy,time
0,1.939635,1.801803,0.501867,03:13
1,1.330091,1.232446,0.630695,03:04
2,0.979738,0.873671,0.720687,03:05
3,0.750537,0.657811,0.809186,03:05


In [9]:
# Dffthen replace the DataLoaders inside the Learner
learn.dls = get_dls(64, 224)
# fine-tune:
learn.fine_tune(5, 1e-3)

epoch,train_loss,valid_loss,accuracy,time
0,0.816524,0.82011,0.75168,06:12


epoch,train_loss,valid_loss,accuracy,time
0,0.669446,0.769156,0.769604,06:12
1,0.674326,0.947356,0.726288,06:12
2,0.591393,0.589485,0.818148,06:12
3,0.48366,0.478636,0.850261,06:12
4,0.430764,0.452525,0.861464,06:11


Getting much better performance, and the initial training on small images was much faster on each epoch.

For transfer learning, progressive resizing hurt performance(trained on similar-sized images)

If transfer learning task is going to use images that are of different sizes, shapes or styles that those used in ther pretaining task, progressive resizing will probably help.

## Test Time Augmentation
- random cropping: fastai will automatically use center-cropping for the validation set-largest square area in the center of the image. Problem: critical features cropped out
- Simply squish or stretch the rectangular images to fit into a square space. Difficult for our model, because it has to learn how to recognize squished and squeezed images.
- Select a number of areas to crop from the original rectangular image, pass through model and take the max or avg of the predictions. This is known as test time augmentation


In [10]:
# pass any DataLoader to fastai's tta method, by default it will use your validation set:
preds, targs = learn.tta()
accuracy(preds, targs).item()

0.8629574179649353

TTA gives a boost in performance, with no additional training required.

## Mixup
Data augmentation technique that can provide higher accuracy, especially when don't have much data and pretrained model(trained on data similar to dataset).

It's helpful to have data augmentation techniques that "dial up" or "dial down" the amount of change, to see what works best for u.

Mixup works as follows, for each image:
1. Select another image from your dataset at random.
2. Pich a weight at random.
3. Take a weighted average of the selected image with your image; independent var.
4. Take a weighted avg of this image's labels with your image's labels; dependent var.

In pseudocode:
```
image2,target2 = dataset[randint(0,len(dataset)]
t = random_float(0.5,1.0)
new_image = t * image1 + (1-t) * image2
new_target = t * target1 + (1-t) * target2
```

0.3xchurch + 0.7xgas_station: get linear combination of one-hot-encoded targets:

The one-hot-encoded representations are as follows:
```
[0, 0, 1, 0, 0, 0, 0, 0, 0, 0] and [0, 0, 0, 0, 0, 0, 0, 1, 0, 0]
```
Here is our final target:
```
[0, 0, 0.3, 0, 0, 0, 0, 0.7, 0, 0]
```

This all done for us inside fastai by adding a callback(inject custom behaviour in the training loop) to our Learner.

In [12]:
# here is how to train a model with Mixup:

model = xresnet50()
learn = Learner(dls, model, loss_func=CrossEntropyLossFlat(),
                metrics=accuracy, cbs=MixUp)
learn.fit_one_cycle(5, 3e-3)

epoch,train_loss,valid_loss,accuracy,time
0,2.185049,2.679157,0.326736,03:07
1,1.735136,1.690184,0.51755,03:10
2,1.487736,1.058351,0.660194,03:10
3,1.335943,0.863896,0.723674,03:09
4,1.223124,0.73315,0.785287,03:07


## Label Smoothing
replace all our 1s with a number a bit less that 1 and our 0s with a number a bit more than 0, and then train. This is called Label Smoothing.

In our Imagenette example that has 10 classes, the targets become something like this:
```
[0.01, 0.01, 0.01, 0.91, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01]
```

In [13]:
# use this in practice, have to change the loss fn in our call to Learner:

model = xresnet50()
learn = Learner(dls, model, loss_func=LabelSmoothingCrossEntropy(),
                metrics=accuracy)
learn.fit_one_cycle(5, 3e-3)

epoch,train_loss,valid_loss,accuracy,time
0,2.736039,2.85854,0.45519,03:07
1,2.229111,2.3722,0.56012,03:10
2,1.959794,1.908748,0.675504,03:09
3,1.755817,1.660063,0.784914,03:10
4,1.628465,1.584881,0.821135,03:10
