Intro

This is to-go pytorch template utilizing lightning and wandb. This template uses Lightning CLI for config management. It follows most of Lightning CLI docs but, integrated with wandb. Since Lightning CLI instantiate classes on-the-go, there were some work-around while integrating WandbLogger to the template. This might not be the best practice, but still it works and quite convinient.

How To Use

It uses Lightning CLI, so most of its usage can be found at its official docs.
There are some added arguments related to wandb.

--name or -n: Name of the run, displayed in wandb
--version or -v: Version of the run, displayed in wandb as tags

Basic cmdline usage is as follows.
We assume cwd is project root dir.

`fit` stage

python src/main.py fit -c configs/config.yaml -n debug-fit-run -v debug-version

If using wandb for logging, change "project" key in cli_module/rich_wandb.py If you want to access log directory in your LightningModule, you can access as follows.

log_root_dir = self.logger.log_dir or self.logger.save_dir

Clean Up Wandb Artifacts

If using wandb for logging, model ckpt files are uploaded to wandb.
Since the size of ckpt files are too large, clean-up process needed.
Clean-up process delete all model ckpt artifacts without any aliases (e.g. best, lastest) To toggle off the clean-up process, add the following to config.yaml. Then every version of model ckpt files will be saved to wandb.

trainer:
  logger:
    init_args:
      clean: false

Model Checkpoint

One can save model checkpoints using Lightning Callbacks. It contains model weight, and other state_dict for resuming train.
There are several ways to save ckpt files at either local or cloud.

Just leave everything in default, ckpt files will be saved locally. (at logs/${name}/${version}/fit/checkpoints)
If you want to save ckpt files as wandb Artifacts, add the following config. (The ckpt files will be saved locally too.)

trainer:
  logger:
    init_args:
      log_model: all

If you want to save ckpt files in cloud rather than local, you can change the save path by adding the config. (The ckpt files will NOT be saved locally.)

model_ckpt:
  dirpath: gs://bucket_name/path/for/checkpoints

`AsyncCheckpointIO` Plugins

You can set async checkpoint saving by providing config as follows.

trainer:
  plugins:
    - AsyncCheckpointIO

Automatic Batch Size Finder

Just add BatchSizeFinder callbacks in the config

trainer:
  callbacks:
    - class_path: BatchSizeFinder

Or add them in the cmdline.

python src/main.py fit -c configs/config.yaml --trainer.callbacks+=BatchSizeFinder

NEW! `tune.py` for lr_find and batch size find

python src/tune.py -c configs/config.yaml

NOTE: No subcommand in cmdline

Resume

Basically all logs are stored in logs/${name}/${version}/${job_type} where ${name} and ${version} are configured in yaml file or cmdline. {job_type} can be one of fit, test, validate, etc.

`test` stage

python src/main.py test -c configs/config.yaml -n debug-test-run -v debug-version --ckpt_path YOUR_CKPT_PATH

TODO

Check pretrained weight loading
Consider multiple optimizer using cases (i.e. GAN)
Add instructions in README (on-going)
Clean code

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
configs		configs
scripts		scripts
src		src
.gitignore		.gitignore
README.md		README.md
prep_data.py		prep_data.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Intro

How To Use

`fit` stage

Clean Up Wandb Artifacts

Model Checkpoint

`AsyncCheckpointIO` Plugins

Automatic Batch Size Finder

NEW! `tune.py` for lr_find and batch size find

Resume

`test` stage

TODO

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

jeongmindo-onepredict/lightning-codebase

Folders and files

Latest commit

History

Repository files navigation

Intro

How To Use

fit stage

Clean Up Wandb Artifacts

Model Checkpoint

AsyncCheckpointIO Plugins

Automatic Batch Size Finder

NEW! tune.py for lr_find and batch size find

Resume

test stage

TODO

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

`fit` stage

`AsyncCheckpointIO` Plugins

NEW! `tune.py` for lr_find and batch size find

`test` stage

Packages