Learning rate schedules and Mamba layer #282

erwulff · 2023-12-05T19:40:57Z

Implement LR schedules in the PyTorch training code. Add MambaLayer and make it configurable through parameter config files and for HPO.

Also contains the following changes.

--load now takes as input the path to a saved checkpoint from which to start a new training. I.e., a training starts from epoch 1, with LR schedules starting from the beginning. The only thing that is loaded is the pre-trained model weights from the checkpoint.
--resume-training is a new command line arg that takes as input a path to a training directory containing an unfinished training and attempts to restore it and continue the training from the last saved checkpoint.

Learning rate versus training step of a training using the cosinedecay LR schedule and was interrupted halfway through and then continued.

Learning rate versus training step of a training using the onecycle LR schedule and was interrupted halfway through and then continued.

specify full path to checkpoint using --load-cehckpoint

* fix: update parameter files * fix: better comet-ml logging * update flatiron Ray Train submissions scripts * update sbatch script * log overridden config to comet-ml instead of original * fix: checkpoint loading specify full path to checkpoint using --load-cehckpoint * feat: implement LR schedules in the PyTorch training code * update sbatch scripts * feat: LR schedules support checkpointing and resuming training * update sbatch scripts * update ray tune search space * fix: dropout parameter not taking effect on torch gnn-lsh model * make more gnn-lsh parameters confgiurable * make activation function configurable * update raytune search space * feat: add MambaLayer * update raytune search space * update pyg-cms.yaml * fix loading of checkpoint in testing with raytrain based run

erwulff and others added 9 commits November 29, 2023 08:43

fix: update parameter files

1779e8f

fix: better comet-ml logging

72e66e0

update flatiron Ray Train submissions scripts

11b1969

update sbatch script

0f034bf

log overridden config to comet-ml instead of original

1e27fb1

Merge branch 'jpata:main' into dev

cacc1cf

fix: checkpoint loading

ff02b87

specify full path to checkpoint using --load-cehckpoint

feat: implement LR schedules in the PyTorch training code

b8626d5

update sbatch scripts

5d410ec

erwulff marked this pull request as ready for review December 5, 2023 20:58

erwulff and others added 10 commits December 7, 2023 02:29

feat: LR schedules support checkpointing and resuming training

c8a3c72

Merge branch 'main' into feat_LR_schedules

3b38268

update sbatch scripts

bc4996d

update ray tune search space

5c4bdab

fix: dropout parameter not taking effect on torch gnn-lsh model

479c168

make more gnn-lsh parameters confgiurable

1389c1c

make activation function configurable

4ae3c5f

update raytune search space

5f8a4db

feat: add MambaLayer

c2ae621

update raytune search space

420faf8

erwulff changed the title ~~Learning rate schedules~~ Learning rate schedules and Mamba layer Dec 9, 2023

erwulff added 2 commits December 11, 2023 03:23

update pyg-cms.yaml

0380a55

fix loading of checkpoint in testing with raytrain based run

bae3679

jpata merged commit 56fbf57 into jpata:main Dec 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Learning rate schedules and Mamba layer #282

Learning rate schedules and Mamba layer #282

Uh oh!

erwulff commented Dec 5, 2023 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Learning rate schedules and Mamba layer #282

Learning rate schedules and Mamba layer #282

Uh oh!

Conversation

erwulff commented Dec 5, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

erwulff commented Dec 5, 2023 •

edited

Loading