Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementing pytorch-lightning #94

Merged
merged 16 commits into from
Dec 9, 2021

Conversation

asogaard
Copy link
Collaborator

@asogaard asogaard commented Dec 7, 2021

This PR introduces pytorch-lightning (PyL) to handle the main training loop. This introduces and number of quality-of-life benefits and means that we can reduce maintenance overhead quite substantially since the custom trainer classes and callbacks could be replaced more or less wholesale by PyL's built-in classes. Custom callbacks (e.g. the PiecewiseLinearLR scheduler) have been reimplemented using PyL's classes and have been confirmed to behave exactly as the original class it is replacing. A new example scripts examples/test_model_training_pytorch_lightning.py shows how to train a model in the PyL framework.

However, in the interest of ensuring full reproducibility of the paper results, I have agreed with @RasmusOrsoe to keep all of the old (i.e. non-PyL) model and training code in the src/gnn_reco/legacy/ module for the time being. After the low-energy performance paper is completed, we will tag the repo version and start deprecating these in favour of the functionality in the main branch. The example script examples/test_model_training_sqlite.py shows how to perform energy regression using this legacy code. The two example scripts have been tested and found to yield similar losses during training.

This PR addresses the problem with energy reconstruction pointed out in #88
Closes #76

Tagging @BozianuLeon and @kaareendrup as this might be of interest to you.

@asogaard asogaard added bug Something isn't working feature New feature or request labels Dec 7, 2021
@asogaard asogaard self-assigned this Dec 7, 2021
Copy link
Collaborator

@RasmusOrsoe RasmusOrsoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Good to see that PyL doesnt change the usage much.

One question; in the original implementation of the PiecewiseLinear Scheduler, it is possible to specify the initial LR (lowest bound), the maximal LR and the ending LR so three points in total. Is this still possible in the PyL implementation?

Rasmus

@asogaard
Copy link
Collaborator Author

asogaard commented Dec 9, 2021

Yes, @RasmusOrsoe! I have implemented a generic piecewise linear learning rate scheduler here, i.e. one where you can specify an arbitrary number of "milestones" (i.e. epoch/step counts) and the corresponding learning rate multipliers for each. Then the scheduler interpolates linearly between these milestones.

A special case is then having a starting LR multiplier, an end LR multiplier, and a "peak" somewhere in between. I have included an example of this, that should mirror the original implementation exactly, here.

@RasmusOrsoe RasmusOrsoe self-requested a review December 9, 2021 11:58
Copy link
Collaborator

@RasmusOrsoe RasmusOrsoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! Thanks!

@asogaard asogaard merged commit e3a7ec6 into graphnet-team:main Dec 9, 2021
@asogaard asogaard deleted the pytorch-lightning branch December 9, 2021 12:01
@asogaard
Copy link
Collaborator Author

asogaard commented Dec 9, 2021

Tagging @kaareendrup cf. our chat earlier today.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working feature New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use pytorch-lightning for managing training
2 participants