This repository was archived by the owner on Jun 3, 2025. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 158
[BugFix][Torchvision] update optimizer state dict before transfer learning #1358
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…params from saved optimizer(s) state_dict
corey-nm
reviewed
Feb 1, 2023
corey-nm
reviewed
Feb 1, 2023
… state_dict only when `args.resume` is set
KSGulin
reviewed
Feb 2, 2023
KSGulin
previously approved these changes
Feb 2, 2023
KSGulin
approved these changes
Feb 2, 2023
dbogunowicz
approved these changes
Feb 2, 2023
rahul-tuli
added a commit
that referenced
this pull request
Feb 2, 2023
…rning (#1358) * Add: an `_update_checkpoint_optimizer(...)` for deleting mismatching params from saved optimizer(s) state_dict * Remove: _update_checkpoint_optimizer in favor of loading in the optim state_dict only when `args.resume` is set * Remove: un-needed imports * Address review comments * Style
bfineran
pushed a commit
that referenced
this pull request
Feb 2, 2023
…rning (#1358) * Add: an `_update_checkpoint_optimizer(...)` for deleting mismatching params from saved optimizer(s) state_dict * Remove: _update_checkpoint_optimizer in favor of loading in the optim state_dict only when `args.resume` is set * Remove: un-needed imports * Address review comments * Style
bfineran
pushed a commit
that referenced
this pull request
Feb 3, 2023
…rning (#1358) * Add: an `_update_checkpoint_optimizer(...)` for deleting mismatching params from saved optimizer(s) state_dict * Remove: _update_checkpoint_optimizer in favor of loading in the optim state_dict only when `args.resume` is set * Remove: un-needed imports * Address review comments * Style
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Torchvision Integration Sparse Transfer Learn Bugfix
Current State
As of now, our torchvision integration has a bug, where during
Sparse Transfer Learning the number of output classes can be
mismatched b/w
saved optimizer state_dict(from pretraining)and there-created optimizer(when starting finetuning)This leads to broken flows and errors whenever there is a mismatch in the
number of output classes b/w upstream and downstream datasets
For example:
Sparse Transfer Learning a resnet50 model originally trained on ImageNet to
Imagenette
COMMAND:
sparseml.image_classification.train \ --recipe "zoo:cv/classification/resnet_v1-50/pytorch/sparseml/imagenet/pruned95-none?recipe_type=transfer-classification" \ --checkpoint-path "zoo:cv/classification/resnet_v1-50/pytorch/sparseml/imagenet/pruned95_quant-none" \ --arch-key resnet50 \ --dataset-path /home/XXXXX/datasets/imagenette-160ERROR:
Proposed Fix
Proposed fix, is to only load in optimizer state_dict if
a previous training run is to be resumed or when
--resumeflag is set. A finetuning run should be considered as a new
run where the optimizer state does NOT need to be loaded in.
We also raise a warning when the optim state_dict is not loaded
After This Pull Request
The original command works as expected, even when the number of output classes
are mismatched b/w upstream(ImageNet has 1000 classes) and
downstream(Imagenette has 10 classes) dataset
OUTPUT:
NOTE: The proposed fixes are in relation to this TICKET