## Using `mpify` to Training "Distributedly" Fastai2/Course-V4 Jupyter Notebooks

###  To train a `fastai2` learner on multiple processes inside a Jupyter notebook ...

The `DataLoaders` must be re-created fresh on each process, because in its instantiation process, it would initialize a CUDA GPU context, which cannot be reused in another process.

Thus in all the examples here, both `DataBlock` and `DataLoaders` (often noted as `dls`) are all created inside the target function body.  For other variables (`path` of untar'ed dataset, or `df` a loaded DataFrame), or the many helper functions, they can be passed to the distributed training API `in_torchddp()` via `imports=` and `need=` parameters.

### Quick links to chapters `mpify`ed

[01_intro.ipynb](/examples/fastai2_course-v4_01_intro_distrib.ipynb)

[05_pet_breeds.ipynb](/examples/fastai2_course-v4_05_pet_breeds_distrib.ipynb)

[06 multicat.ipynb](/examples/fastai2_course-v4_06_multicat_distrib.ipynb)

[07 Sizing and TTA.ipynb](/examples/fastai2_course-v4_07_sizing_tta_distrib.ipynb)

[08 Collab.ipynb](/examples/fastai2_course-v4_08_collab_distrib.ipynb)


Below are distributed training of examples correspond to fastai2 course-v4 <a href='https://github.com/fastai/course-v4/blob/master/nbs/06_multicat.ipynb' target='_blank'>`06_multicat.ipynb`</a>

### <a name='06multicat'></a> 06 Multicat  - Multi-Label Classifications

The Chapter 6 "Multi-Label Classification" notebook builds a `learn` object is from
several pieces across many cells. 

`DataBlock` object needs `path`, and a few other functions: `get_x, get_y, splitter`, and `dls` needs `df`.  So we group some of them together in `need=`.

I do notice accuracy degradation using the same 3 epochs, of `0.81`, as oppose to the `0.95` range in the book.

So what can we do?  Save the model after the first training, then use `load=filename` flag, tweak `nepochs` and `freeze_epochs=` values and call `in_torchddp()` again.



In [None]:
# These are defined earlier in the notebook:
from utils import *
from fastai2.vision.all import *

path = untar_data(URLs.PASCAL_2007)
df = pd.read_csv(path/'train.csv')

def get_x(r): return path/'train'/r['fname']
def get_y(r): return r['labels'].split(' ')
def splitter(df):
    train = df.index[~df['is_valid']].tolist()
    valid = df.index[df['is_valid']].tolist()
    return train,valid

The chapter defines the above variables and functions across several cells.

The target function accepts a `load` parameter for loading saved model state.  It uses `resnet50`, and the `Learner.fine_tune()` training method, with `nepochs` as the position argument.  Other training arguments like `base_lr`, `freeze_epochs` are handled by `**kwargs`.

In [None]:

# to perform those trainings in DDP

from mpify import in_torchddp
ngpus = 3  # Modify to your taste

def train_multicat(nepochs, *args, load:str=None, **kwargs):
    dblock = DataBlock(blocks=(ImageBlock, MultiCategoryBlock),
                   splitter=splitter,
                   get_x=get_x, 
                   get_y=get_y,
                   item_tfms = RandomResizedCrop(128, min_scale=0.35))
    dls = dblock.dataloaders(df)
    
    learn = cnn_learner(dls, resnet50, metrics=partial(accuracy_multi, thresh=0.2))
    
    if load: learn.load(load); print(f'Model and state loaded from {load}')

    with learn.distrib_ctx():
        learn.fine_tune(nepochs, *args, **kwargs)
        
    return learn
        
imports='''
from utils import *
from fastai2.vision.all import *
from fastai2.distributed import *
'''
need="path df get_x get_y splitter"

learn = in_torchddp(ngpus, train_multicat, 3, base_lr=3e-3, freeze_epochs=4,
                    imports=imports, need=need)


The next training uses a different dataset `BIWI_HEAD_POSE`, and a different sets of helper routines.
And I missed `img2pose` in the first pass scooping them out.   Thanks to a `NameError` exception, I can simply add it to `need=`.

#### To Train Distributedly More Than Once ...
What if training a few epochs seem not getting good enough accuracy, how to train "distributedly" again in Jupyter?

Because `mpify.in_torchddp()` returns the resulting `Learner` object to the main Jupyter shell, user can save its state to a file, and use it in subsequent new `in_torchddp()` calls where a new group of processes will be spawned from scratch, and start training after `load`ing from the file -- *as demonstrated below*.


In [None]:
path = untar_data(URLs.BIWI_HEAD_POSE)

def img2pose(x): return Path(f'{str(x)[:-7]}pose.txt')

cal = np.genfromtxt(path/'01'/'rgb.cal', skip_footer=6)
def get_ctr(f):
    ctr = np.genfromtxt(img2pose(f), skip_header=3)
    c1 = ctr[0] * cal[0][0]/ctr[2] + cal[0][2]
    c2 = ctr[1] * cal[1][1]/ctr[2] + cal[1][2]
    return tensor([c1,c2])

def train_biwi(nepochs, *args, load:str=None, **kwargs):
    biwi = DataBlock(
        blocks=(ImageBlock, PointBlock),
        get_items=get_image_files,
        get_y=get_ctr,
        splitter=FuncSplitter(lambda o: o.parent.name=='13'),
        batch_tfms=[*aug_transforms(size=(240,320)), 
                    Normalize.from_stats(*imagenet_stats)])

    dls = biwi.dataloaders(path)

    learn = cnn_learner(dls, resnet18, y_range=(-1,1))
    if load: learn.load(load); print(f'Model and state loaded from {load}')

    lr = 1e-2
    with learn.distrib_ctx(): learn.fine_tune(nepochs, lr)
    return learn

imports='''
from utils import *
from fastai2.vision.all import *
from fastai2.distributed import *
improt numpy as np
'''

need="path cal get_ctr img2pose"

learn = in_torchddp(ngpus, train_biwi, 3, imports=imports, need=need)

# Not satisfied with the accuracy?  Save then train 5 more epochs, starting from the current state
learn.save('biwi_after3')
learn = in_torchddp(ngpus, train_biwi, 5, load='biwi_after3', imports=imports, need=need)