Skip to content

Latest commit

 

History

History
139 lines (118 loc) · 5.77 KB

README_AUTOTRAIN.md

File metadata and controls

139 lines (118 loc) · 5.77 KB

junky lib: PyTorch utilities

Autotrain

This is a tool for PyTorch model's hyperparameters selection. May work in parallel using multiple devices. If some of parallel threads die during training (because of MemoryError of anything), their tasks will be redone after all other threads have finished their work.

import junky
junky.torch_autotrain(
    make_model_method, train_method, create_loaders_method=None,
    make_model_args=(), make_model_kwargs=None, make_model_fit_params=None,
    train_args=(), train_kwargs=None, devices=torch.device('cpu'),
    best_model_file_name='model.pt', best_model_device=None, seed=None
)

Args:

make_model_method: method to create the model. Returns the model and, if specified, some other params that should be passed to train_method. The method has a signature as follows:
callable(*make_model_args, **make_model_kwargs,**fit_kwargs) -> model|tuple(model, <other train args>).
Here, fit_kwargs - params that are constructed from make_model_fit_params.

train_method: method to train and validate the model. Signature:
train_method: callable(device, loaders, model, *other_train_args, best_model_backup_method, log_prefix, *train_args, **train_kwargs) -> <train statistics>.
Here:
device - one of the devices that is assigned to train the model;
loaders - the return of create_loaders_method or () if create_loaders_method is None (default);
other_train_args - params returned by make_model_method besides the model (if any). E.g.: optimizer, criterion, etc.;
best_model_backup_method - method that saves the best model over all runs. Signature:
callable(best_model, best_model_score).
This method must be invoked in train_method to save the best model;
log_prefix - prefix that should use train_method in the beginning of any output. Elsewise, you can't distinct messages from parallel threads.

create_loaders_method: method to create torch.utils.data.DataLoaders objects to use in train_method. Every thread creates it only once and then passes to train_method of every model that this thread is assigned for. The signature of create_loaders_method:
callable() -> <loader>|tuple(<loaders>).
If None (default), train_method must create loaders by itself.

Important: you can't use one DataLoader in several threads. You must have separate DataLoader for every thread; otherwise, your training is gonna be broken.

make_model_args: positional args (of tuple type) for make_model_method. Will be passed as is.

make_model_kwargs: keyword args (of dict type) for make_model_method. Will be passed as is.

make_model_fit_params: a list of combinations of varying make_model_method's fit_kwargs among which we want to find the best. The type of make_model_fit_params: iterable of iterables; nestedness is unlimited. Examples:
[('a', [50, 100]), ('b': [.1, .5])] produces fit_kwargs:

{'a': 50, 'b': .1},
{'a': 50, 'b': .5},
{'a': 100, 'b': .1},
{'a': 100, 'b': .5};

[('a', [50, 100]), [('b': [.1, .5])], [('b': None), ('c': ['X', 'Y'])]] produces

{'a': 50, 'b': .1},
{'a': 50, 'b': .5},
{'a': 100, 'b': .1},
{'a': 100, 'b': .5},
{'a': 50, 'b': None, 'c': 'X'},
{'a': 50, 'b': None, 'c': 'Y'},
{'a': 100, 'b': None, 'c': 'X'},
{'a': 100, 'b': None, 'c': 'Y'}.

train_args: positional args (of tuple type) for train_method. Will be passed as is.

train_kwargs: keyword args (or dict type) for train_method. Will be passed as is.

devices: what devices to use for training. This can be a separate device, a list of available devices, or a dict of available devices with max number of simultaneous threads. The possible types are: <device>, tuple(<device>), dict({<device>: int}). Examples:
torch.device('cpu') - one thread on CPU (default);
('cuda:0', 'cuda:1', 'cuda:2') - 3 GPU, 1 thread on each;
{'cuda:0': 3, 'cuda:1': 3} - 2 GPU, 3 threads on each.
NB: <device> == (<device>,) == {<device>: 1}

best_model_file_name: file name for the best model when saving. Default 'model.pt'.

best_model_device: the device where the best model will be loaded. If None, the best model will not be loaded in memory.

The tool returns tuple(best_model, best_model_name, best_model_score, best_model_params, stats). Here:
best_model - the best model if best_model_device is not None, else None;
best_model_name - the key of the best model stats;
best_model_score - the score of the best model;
best_model_params - fit_kwargs of the best model;
stats - all returns of all train_methods. Format:
[(<model name>, <model best score>, <model params>, <*train_method* return>), ...]
stats is sorted by <model best score>, in such a way that stats[0] corresponds to the best model.

Sometimes, it's necessary to extract results from the ouput of torch_autotrain(). The method to do so is:

junky.parse_autotrain_log(log_fn, silent=False)

Here, log_fn is a file name of the torch_autotrain() log file.

silent: if True, suppress output.

Returns list([tuple(<model name>, <model best score>, <model params>, <is training finished>)] sorted by <model best score>.

NB: if you use torch_autotrain() from jupyter notebook, you don't have to copy only its output. Usually, you can just select and copy full text from the notebook page and save it to "log" file. Then, pass this file to parse_autotrain_log().

If training of some model has not finished yet, it's name in output will be started from * sign.