## How to efficiently load images?

Things to consider:  
* find how much time it takes to load and resize
* find out if training is affected by saving them into smaller images
* how does batchsize impact loading. Like 
* how does num_workers impact loading? if too many subprocess brings in overhead?
* it would be increasing to load everything into ram and see how that goes
    * need a custom dataset loader
* batch size with num_workers is there an interaction for that. 
* Final test compare training time

In [1]:
import fastai
from fastai import *
from fastai.vision import *
import utils  # person functions

%matplotlib inline

In [2]:
# global variables
labels='train_stratified_split.csv'
padding_mode='border'
param_baseline = {'experiment': 'baseline', 
                  'image_size': 224, 
                  'batch_size': 32, 
                  'num_workers':0, 
                  'folder':'train'}

In [3]:
def load_train_image(data):
    'Just load the data and nothing else'
    for i, batch in enumerate(data.train_dl): 
        pass

def get_data(params):
    'wrapper function for parameter settings'
    return utils.get_data(size=param['image_size'], 
                          bs=param['batch_size'], 
                          csv=labels, 
                          folder=param['folder'], 
                          num_workers=param['num_workers'], 
                          padding_mode=padding_mode)

In [4]:
tfms = get_transforms(flip_vert=False, max_zoom=1);
[i.tfm.name for i in tfms[0]]  # transformation done on training data set

['TfmCrop', 'TfmPixel', 'TfmCoord', 'TfmAffine', 'TfmLighting', 'TfmLighting']

In [5]:
# first step is to resize the image
vision.transform._crop_pad??

[0;31mSignature:[0m
[0mvision[0m[0;34m.[0m[0mtransform[0m[0;34m.[0m[0m_crop_pad[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0mx[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0msize[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mpadding_mode[0m[0;34m=[0m[0;34m'reflection'[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mrow_pct[0m[0;34m:[0m [0;34m<[0m[0mfunction[0m [0muniform[0m [0mat[0m [0;36m0x7f5ff60cc0d0[0m[0;34m>[0m [0;34m=[0m [0;36m0.5[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mcol_pct[0m[0;34m:[0m [0;34m<[0m[0mfunction[0m [0muniform[0m [0mat[0m [0;36m0x7f5ff60cc0d0[0m[0;34m>[0m [0;34m=[0m [0;36m0.5[0m[0;34m,[0m[0;34m[0m
[0;34m[0m[0;34m)[0m [0;34m->[0m [0mfastai[0m[0;34m.[0m[0mvision[0m[0;34m.[0m[0mimage[0m[0;34m.[0m[0mImage[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m <no docstring>
[0;31mSource:[0m   
[0;32mdef[0m [0m_crop_pad[0m[0;34m([0m[0mx[0m[0;34m,[0m [0msize[0m[0;34m,[0m [0mpadd

In [6]:
exp_param = pd.DataFrame(columns=['experiment', 'image_size', 'batch_size', 'num_workers', 'folder', 'avg', 'stdev'])

## Baseline: Loading unprocessed images
How long does it take to load images from a dataloader with all the transformation?

In [7]:
param = param_baseline.copy()

In [8]:
result = %timeit -n1 -r3 -o load_train_image(get_data(param))

6min 2s ± 2.61 s per loop (mean ± std. dev. of 3 runs, 1 loop each)


In [9]:
param['avg'], param['stdev'] = result.average, result.stdev

In [10]:
exp_param = exp_param.append(param, ignore_index=True)

## Experiment 1: Number of workers
How does the number of worker impact loading of the data?  
Are there a limit before overhead becomes not worth it?

In [24]:
workers_ls = [1,2,3,4,5,6,10,20]
param = param_baseline.copy()
param['experiment'] = 'number of workers'

In [25]:
for x in workers_ls:
    print(f'Workers: {x}')
    param['num_workers'] = x
    result = %timeit -n1 -r3 -o load_train_image(get_data(param))
    param['avg'], param['stdev'] = result.average, result.stdev
    exp_param = exp_param.append(param, ignore_index=True)

Workers: 1
6min 47s ± 338 ms per loop (mean ± std. dev. of 3 runs, 1 loop each)
Workers: 2
3min 41s ± 799 ms per loop (mean ± std. dev. of 3 runs, 1 loop each)
Workers: 3
2min 44s ± 422 ms per loop (mean ± std. dev. of 3 runs, 1 loop each)
Workers: 4
2min 15s ± 213 ms per loop (mean ± std. dev. of 3 runs, 1 loop each)
Workers: 5
2min ± 328 ms per loop (mean ± std. dev. of 3 runs, 1 loop each)
Workers: 6
1min 50s ± 486 ms per loop (mean ± std. dev. of 3 runs, 1 loop each)
Workers: 10
1min 36s ± 196 ms per loop (mean ± std. dev. of 3 runs, 1 loop each)
Workers: 20
1min 33s ± 857 ms per loop (mean ± std. dev. of 3 runs, 1 loop each)


In [None]:
#plot result? vs baseline?

## Experiment 2: Batch Size

Batch size shouldn't impact the time because the amount of work is the same. But what if there is an interaction between num_workers and batch_size. Larger batchsize allow for better use of subprocess?

In [29]:
batch_ls = [1, 8, 16, 32, 64, 128]
param = param_baseline.copy()
param['experiment'] = 'batch_size'

num_worker = 0, the cpu usage is 50%. 

In [30]:
for x in batch_ls:
    print(f'Batch Size: {x}')
    param['batch_size'] = x
    result = %timeit -n1 -r3 -o load_train_image(get_data(param))
    param['avg'], param['stdev'] = result.average, result.stdev
    exp_param = exp_param.append(param, ignore_index=True)

Batch Size: 1
6min 7s ± 1.1 s per loop (mean ± std. dev. of 3 runs, 1 loop each)
Batch Size: 8
6min 4s ± 186 ms per loop (mean ± std. dev. of 3 runs, 1 loop each)
Batch Size: 16
6min 3s ± 799 ms per loop (mean ± std. dev. of 3 runs, 1 loop each)
Batch Size: 32
6min 3s ± 622 ms per loop (mean ± std. dev. of 3 runs, 1 loop each)
Batch Size: 64
6min 2s ± 488 ms per loop (mean ± std. dev. of 3 runs, 1 loop each)
Batch Size: 128
6min 5s ± 671 ms per loop (mean ± std. dev. of 3 runs, 1 loop each)


## Export result

In [27]:
exp_copy = exp_param.copy()

In [32]:
exp_param

Unnamed: 0,experiment,image_size,batch_size,num_workers,folder,avg,stdev
0,baseline,224,32,0,train,362.446834,2.609809
1,number of workers,224,32,1,train,407.077534,0.337877
2,number of workers,224,32,2,train,221.52017,0.798716
3,number of workers,224,32,3,train,164.20188,0.422187
4,number of workers,224,32,4,train,135.563526,0.213248
5,number of workers,224,32,5,train,120.119384,0.327717
6,number of workers,224,32,6,train,110.612599,0.48622
7,number of workers,224,32,10,train,96.664971,0.195501
8,number of workers,224,32,20,train,93.646667,0.857278
9,batch_size,224,1,0,train,367.511921,1.098099


In [33]:
exp_param.to_csv(utils.data_fp/'loading_runtime.csv', index=False)

In [None]:
exp_param = pd.read_csv(utils.data_fp/'loading_runtime.csv')

### Hardware of current system

In [31]:
hardware_info = !lshw -short -sanitize
for line in hardware_info:
    for word in ['WARNING','Description', '==', 'processor', 'memory', 'display']:
        if word in line: print(line)

H/W path              Device   Class       Description
/0/0                           memory      15GiB System memory
/0/1                           processor   AMD Ryzen 5 1600 Six-Core Processor
/0/100/1.3/0.2/4/0             display     GP106 [GeForce GTX 1060 6GB]
/0/100/3.1/0                   display     GP106 [GeForce GTX 1060 6GB]
