Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SaveModelCallback every nth epoch #3375

Merged
merged 2 commits into from
Jun 15, 2021
Merged

Conversation

KeremTurgutlu
Copy link
Contributor

@KeremTurgutlu KeremTurgutlu commented May 17, 2021

Minor modification to allow saving a model every nth epoch by passing an integer to the existing every_epoch parameter. Main motivation is to reduce the disk space occupied by saving models especially when using with_opt=True during long runs. For example, a training run with 300 epochs can create ~100 GB+ checkpoint files but usually people are fine saving every 10/20/50 epoch during training.

Also, added a test with synthetic learner which trains 4 epoch and saves every 2nd epoch starting from epoch 0.

P.S. If desired, we can also extend this callback to allow every nth iteration. It would be particularly useful when training set is large and users would like to save checkpoints before waiting for a single epoch to complete. I made a similar modification during training CLIP with 40m image-text pairs, simply by adding something like:

def after_batch(self): if self.train_iter % self.every_iter == 0: self._save(f'{self.fname}_iter{self.train_iter}')

@KeremTurgutlu KeremTurgutlu requested a review from jph00 as a code owner May 17, 2021 15:06
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@KeremTurgutlu KeremTurgutlu changed the title Savemodel cb Save Model every nth epoch May 17, 2021
@KeremTurgutlu KeremTurgutlu changed the title Save Model every nth epoch SaveModelCallback every nth epoch May 17, 2021
@jph00
Copy link
Member

jph00 commented Jun 15, 2021

Thanks Kerem :)

@jph00 jph00 merged commit 9edf88a into fastai:master Jun 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants