Skeletor is a lightweight wrapper for research code. It is meant to enable fast, parallelizable prototyping without sacrificing reproducibility or ease of experiment analysis.
You can install it with:
pip install skeletor-ml
Why use skeletor?
Tracking and analyzing experiment results is easy. Skeletor uses track, which provides a simple interface to log metrics throughout training and to view those metrics in a pandas DataFrame afterwards. It can log locally and to S3. Compared to other logging tools, track has minimal overhead and a very simple interface. No longer do you need to decorate every function or specify a convoluted experiment pipeline.
Orchestrating many experiments in parallel is simple and robust. Almost every experiment tracking framework implements its own scheduling and hyperparameter search algorithms. Luckily, I don't trust myself to do this correctly. Instead, skeletor uses ray, a high-performance distributed execution framework. In particular, it uses ray tune for scalable hyperparameter search.
Necessary packages are listed in
pip install skeletor-ml to get started.
A basic example
train.py might look like:
import skeletor from skeletor.models import build_model from skeletor.datasets import build_dataset from skeletor.optimizers import build_optimizer import track def add_args(parser): parser.add_argument('--arch', default='resnet50') parser.add_argument('--lr', default=0.1, type=float) def train(epoch, trainloader, model, optimizer): ... return avg_train_loss def test(epoch, testloader, model): ... return avg_test_loss def experiment(args): trainloader, testloader = build_dataset('cifar10') model = build_model(args.arch, num_classes=10) opt = build_optimizer('SGD', lr=args.lr) for epoch in range(200): track.debug("Starting epoch %d" % epoch) train_loss = train(epoch, trainloader, model, opt) test_loss = test(epoch, testloader, model) track.metric(iteration=epoch, train_loss=train_loss, test_loss=test_loss) skeletor.supply_args(add_args) skeletor.execute(experiment)
You just have to supply (1) a function that adds your desired arguments to an
ArgumentParser object, and (2) a function that runs the experiment using the parsed arguments. You can then use
track to log statistics during training.
You can supply a third function to run analysis after training.
skeletor.supply_postprocess(postprocess_fn) takes in a user-defined function of the form
proj is a
Internally, the basic experiment flow is:
run add_args(parser) -> parse the args -> run experiment_fn(args) -> optionally run postprocess_fn(proj)
To launch an experiment in
train.py, you just do
python train.py <my args> <experimentname>. The results will go in
<logroot>/<experimentname>. For example, you can do something like
CUDA_VISIBLE_DEVICES=0 python train.py --arch ResNet50 --lr .1 resnet_cifar
The same code can be used to launch several experiments in parallel. Suppose I have a config called
config.yaml that looks like:
arch: ResNet50 lr: grid_search: [.001, .01, .1, 1.0]
I can test out all of these learning rates at the same time by running:
CUDA_VISIBLE_DEVICES=0,1 python train.py --config=config.yaml --self_host=2 resnet_cifar
Ray will handle scheduling the jobs across all available resources.
track records) will be stored in
track docs for how to access these records as DataFrames.
You can find an example of running a grid search for training a residual network on CIFAR-10 in PyTorch in
Getting experiment results
I added a utility in
skeletor.proc for converting all
track trial records for an experiment into a single Pandas DataFrame. It can also pickle it.
That means if I run an experiment like above called
resnet_cifar, I can access all of the results for all the trials as a single DataFrame by calling
Registering custom models, dataloaders, and optimizers
Registering custom classes allows you to construct an instance of the specified class by calling
build_optimizer with the class string name. This is useful for hyperparameter searching because you can search over these choices directly by class name.
I try to provide a simple interface for registering custom implementations with skeletor. For example, I can register a custom
Model class by calling
skeletor.models.add_model(Model). This allows me to create models through
skeletor.models.build_model('Model'). You can also register entire modules full of definitions at once. There are analogous functions
add_dataset, add_optimizer for datasets and optimizers.
class MyNetwork(Module): ... skeletor.models.add_model(MyNetwork) arch_name = 'MyNetwork' model = skeletor.models.build_model(arch_name)
Help me out / Things to Do
We have active issues! Feel free to suggest new improvements or add PRs to contribute.