Skip to content

Conversation

@erwulff
Copy link
Collaborator

@erwulff erwulff commented Dec 5, 2023

Implement LR schedules in the PyTorch training code. Add MambaLayer and make it configurable through parameter config files and for HPO.

Also contains the following changes.

  • --load now takes as input the path to a saved checkpoint from which to start a new training. I.e., a training starts from epoch 1, with LR schedules starting from the beginning. The only thing that is loaded is the pre-trained model weights from the checkpoint.
  • --resume-training is a new command line arg that takes as input a path to a training directory containing an unfinished training and attempts to restore it and continue the training from the last saved checkpoint.

Screenshot 2023-12-06 at 13 28 12
Learning rate versus training step of a training using the cosinedecay LR schedule and was interrupted halfway through and then continued.

Screenshot 2023-12-06 at 13 27 41
Learning rate versus training step of a training using the onecycle LR schedule and was interrupted halfway through and then continued.

@erwulff erwulff marked this pull request as ready for review December 5, 2023 20:58
@erwulff erwulff changed the title Learning rate schedules Learning rate schedules and Mamba layer Dec 9, 2023
@jpata jpata merged commit 56fbf57 into jpata:main Dec 11, 2023
farakiko pushed a commit to farakiko/particleflow that referenced this pull request Jan 23, 2024
* fix: update parameter files

* fix: better comet-ml logging

* update flatiron Ray Train submissions scripts

* update sbatch script

* log overridden config to comet-ml instead of original

* fix: checkpoint loading

specify full path to checkpoint using --load-cehckpoint

* feat: implement LR schedules in the PyTorch training code

* update sbatch scripts

* feat: LR schedules support checkpointing and resuming training

* update sbatch scripts

* update ray tune search space

* fix: dropout parameter not taking effect on torch gnn-lsh model

* make more gnn-lsh parameters confgiurable

* make activation function configurable

* update raytune search space

* feat: add MambaLayer

* update raytune search space

* update pyg-cms.yaml

* fix loading of checkpoint in testing with raytrain based run
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants