Additional SOTA ingredients on Classification Recipe #4493

datumbox · 2021-09-28T11:35:49Z

Partially resolves #3995

Add support for the following SOTA ingredients in our classification recipe:

EMA per iteration + adjusted decay
AdamW support
Support custom weight decay for Normalization layers
FixRes corrections

Based on the work of @pdollar and @mannatsingh on pycls. Inspired from their work on "Early Convolutions Help Transformers See Better". Also contains improvements from the work of @TouvronHugo on "Fixing the train-test resolution discrepancy".

cc @datumbox @sallysyw

kazhang

LGTM overall. I would recommend to split into a norm PR and an EMA PR so that we could easily track changes in commit history.

references/classification/utils.py

datumbox · 2021-09-29T09:58:24Z

@kazhang Awesome thanks for confirming.

I need to the changes "stacked" for now to be able to test them on the new recipes but I can certainly split it prior merging. I wanted to get your eyes here early as some of the approaches are adopted from ClassyVision, which you know very well. :)

prabhat00155 · 2021-09-29T22:33:27Z

Thanks @datumbox, looks good to me. Could you share the training logs for the EMA run?

datumbox · 2021-09-30T10:15:48Z

@prabhat00155 Sure thing. Still running stuff. Happy to provide them when I'm done.

references/classification/train.py

datumbox

Adding comments to improve reviewing:

datumbox · 2021-10-21T10:07:04Z

references/classification/README.md

+torchrun --nproc_per_node=8 train.py --model inception_v3
+      --val-resize-size 342 --val-crop-size 299 --train-crop-size 299 --test-only --pretrained
+```
+


Since we removed the hardcoding of parameters based on model names, we now need to provide extra parameters.

One thing in my mind, not related to this PR, is that if we can also let users pass kwargs to the models through command line? (in addition to the train.py arguments)
For example, when I train the ViT model, training from scratch and fine-tuning require 2 different heads, in this case I want to configure the representation_size differently, and currently I need to manually change the python defaults to reflect this.
wdyt?

We will probably need to introduce more parameters to be able to do this. We will do it to enable your work but it's also part of the reason why the ArgumentParser is a poor solution. Hopefully this will be deprecated by the STL work you are preparing!

datumbox · 2021-10-21T10:07:57Z

references/classification/presets.py

            else:
                aa_policy = autoaugment.AutoAugmentPolicy(auto_augment_policy)
-                trans.append(autoaugment.AutoAugment(policy=aa_policy))
+                trans.append(autoaugment.AutoAugment(policy=aa_policy, interpolation=interpolation))


The change on interpolation here is non-BC but I consider this a bug rather than a previous feature. On the previous recipe there was a mismatch between the interpolation used for resizing and the one used for AA methods.

datumbox · 2021-10-21T10:10:05Z

references/classification/train.py

@@ -40,16 +38,19 @@ def train_one_epoch(
            loss.backward()
        optimizer.step()

+        if model_ema and i % args.model_ema_steps == 0:
+            model_ema.update_parameters(model)


Moving EMA updates on per iterration level than on epoch.

datumbox · 2021-10-21T10:10:16Z

references/classification/train.py

+            model_ema.update_parameters(model)
+            if epoch < args.lr_warmup_epochs:
+                # Reset ema buffer to keep copying weights during warmup period
+                model_ema.n_averaged.fill_(0)


Always copy the weights during warmup.

datumbox · 2021-10-21T10:10:45Z

references/classification/train.py

-        resize_size, crop_size = sizes[e_type]
-        interpolation = InterpolationMode.BICUBIC
+    val_resize_size, val_crop_size, train_crop_size = args.val_resize_size, args.val_crop_size, args.train_crop_size
+    interpolation = InterpolationMode(args.interpolation)


Remove hardcoding of resize/crops based on model names. Instead use parameters.

datumbox · 2021-10-21T10:12:32Z

references/classification/train.py

        )
+    elif opt_name == "adamw":
+        optimizer = torch.optim.AdamW(parameters, lr=args.lr, weight_decay=args.weight_decay)


Adding AdamW necessary for training ViT.

datumbox · 2021-10-21T10:12:53Z

references/classification/train.py

+        adjust = args.world_size * args.batch_size * args.model_ema_steps / args.epochs
+        alpha = 1.0 - args.model_ema_decay
+        alpha = min(1.0, alpha * adjust)
+        model_ema = utils.ExponentialMovingAverage(model_without_ddp, device=device, decay=1.0 - alpha)


Parameterize EMA independently from epochs.

datumbox · 2021-10-21T10:13:37Z

references/classification/train.py

-        lr_scheduler.load_state_dict(checkpoint["lr_scheduler"])
+        if not args.test_only:
+            optimizer.load_state_dict(checkpoint["optimizer"])
+            lr_scheduler.load_state_dict(checkpoint["lr_scheduler"])


Quality of life improvement to avoid the super annoying error messages if you don't define all optimizer params during validation.

datumbox · 2021-10-21T10:14:10Z

references/classification/train.py

+        if model_ema:
+            evaluate(model_ema, criterion, data_loader_test, device=device, log_suffix="EMA")
+        else:
+            evaluate(model, criterion, data_loader_test, device=device)


Choose which model to validate depending on the flag provided.

datumbox · 2021-10-21T10:14:39Z

references/classification/train.py

-        default=0.9,
-        help="decay factor for Exponential Moving Average of model parameters(default: 0.9)",
+        default=0.99998,
+        help="decay factor for Exponential Moving Average of model parameters (default: 0.99998)",


Reconfiguring default value of EMA now that we do per iter instead of per epoch

n00b q: Is this default value 0.99998 used most often?

It's a good guess for ImageNet, considering the typical batch size for 8 gpus. The reason of changing this so drastically is because we switch from update per epoch to updates every X iters (X=32, configurable).

references/classification/train.py

yiwen-song · 2021-10-21T18:25:26Z

references/classification/train.py

-def train_one_epoch(
-    model, criterion, optimizer, data_loader, device, epoch, print_freq, amp=False, model_ema=None, scaler=None
-):
+def train_one_epoch(model, criterion, optimizer, data_loader, device, epoch, args, model_ema=None, scaler=None):


Though I'm not a huge fan of passing the whole args to a single method (as it's not clear what are actually needed by this function), but I can see you do this just to reduce the number of arguments.
In the future we might want to add some type hints for all the args used in this script and also some documentation.

Indeed, args is passed to reduce the number of parameters (merge 4 to 1). This is used in other places of the script such as, so I just use the same pattern:

vision/references/classification/train.py

Line 106 in e08c9e3

def load_data(traindir, valdir, args):

Concerning type hints/documentation, I think you are right. For some reason most of the string args don't define it. I've raised a new #4694 issue to improve it.

yiwen-song · 2021-10-21T18:28:42Z

Overall LGTM, would you like to share the training logs for the EMA run? and I think we are good to go!

datumbox · 2021-10-21T19:40:22Z

@sallysyw Thanks for the review. Still got a few jobs running (will post everything once I finish and possibly write a blogpost), but I'll send you the logs of the best current model.

github-actions · 2021-10-22T11:31:20Z

Hey @datumbox!

You merged this PR, but no labels were added. The list of valid labels is available at https://github.com/pytorch/vision/blob/main/.github/process_commit.py

Summary: * Update EMA every X iters. * Adding AdamW optimizer. * Adjusting EMA decay scheme. * Support custom weight decay for Normalization layers. * Fix identation bug. * Change EMA adjustment. * Quality of life changes to faciliate testing * ufmt format * Fixing imports. * Adding FixRes improvement. * Support EMA in store_model_weights. * Adding interpolation values. * Change train_crop_size. * Add interpolation option. * Removing hardcoded interpolation and sizes from the scripts. * Fixing linter. * Incorporating feedback from code review. Reviewed By: NicolasHug Differential Revision: D31916313 fbshipit-source-id: 6136c02dd6d511d0f327b5a72c9056a134abc697

* Update EMA every X iters. * Adding AdamW optimizer. * Adjusting EMA decay scheme. * Support custom weight decay for Normalization layers. * Fix identation bug. * Change EMA adjustment. * Quality of life changes to faciliate testing * ufmt format * Fixing imports. * Adding FixRes improvement. * Support EMA in store_model_weights. * Adding interpolation values. * Change train_crop_size. * Add interpolation option. * Removing hardcoded interpolation and sizes from the scripts. * Fixing linter. * Incorporating feedback from code review.

Update EMA every X iters.

063ca56

datumbox added the module: reference scripts label Sep 28, 2021

datumbox marked this pull request as draft September 28, 2021 11:35

datumbox mentioned this pull request Sep 28, 2021

[RFC] TorchVision with Batteries included - Phase 1 #3911

Closed

16 tasks

datumbox changed the title ~~[WIP] Additional SOTA ingredient on Classification Recipe~~ [WIP] Additional SOTA ingredients on Classification Recipe Sep 28, 2021

facebook-github-bot added the cla signed label Sep 28, 2021

datumbox added 3 commits September 28, 2021 13:03

Adding AdamW optimizer.

02b4d42

Adjusting EMA decay scheme.

33a90f7

Support custom weight decay for Normalization layers.

cfdeede

datumbox force-pushed the references/optimizations branch from 5724c1c to cfdeede Compare September 28, 2021 16:57

datumbox added the module: ops label Sep 28, 2021

Fix identation bug.

7ecc6d8

datumbox requested a review from kazhang September 28, 2021 18:20

datumbox marked this pull request as ready for review September 28, 2021 18:20

kazhang reviewed Sep 29, 2021

View reviewed changes

references/classification/utils.py Outdated Show resolved Hide resolved

Change EMA adjustment.

0563f9e

datumbox force-pushed the references/optimizations branch from c95812c to 0563f9e Compare September 29, 2021 22:28

Merge branch 'main' into references/optimizations

764fe02

datumbox and others added 2 commits October 1, 2021 09:25

Merge branch 'main' into references/optimizations

19e7d49

Quality of life changes to faciliate testing

d188ee0

pytorch-probot bot added the ciflow/default label Oct 4, 2021

datumbox and others added 3 commits October 5, 2021 10:35

Merge branch 'main' into references/optimizations

a630986

ufmt format

6655dac

Fixing imports.

dc0edb9

datumbox force-pushed the references/optimizations branch from 6437870 to dc0edb9 Compare October 5, 2021 09:50

datumbox and others added 2 commits October 7, 2021 14:35

Merge branch 'main' into references/optimizations

e4a098f

Adding FixRes improvement.

2e93296

Merge branch 'main' into references/optimizations

dadb2f5

datumbox commented Oct 8, 2021

View reviewed changes

references/classification/train.py Outdated Show resolved Hide resolved

datumbox and others added 9 commits October 13, 2021 14:07

Support EMA in store_model_weights.

6859fa2

Merge branch 'main' into references/optimizations

8a9e1a8

Merge branch 'main' into references/optimizations

17eaf48

Adding interpolation values.

950636e

Change train_crop_size.

9a6a443

Merge branch 'main' into references/optimizations

2ce484a

Add interpolation option.

e699eca

Merge branch 'main' into references/optimizations

d861b33

Removing hardcoded interpolation and sizes from the scripts.

9ee69c4

datumbox commented Oct 21, 2021

View reviewed changes

datumbox changed the title ~~[WIP] Additional SOTA ingredients on Classification Recipe~~ Additional SOTA ingredients on Classification Recipe Oct 21, 2021

Fixing linter.

bc5a2bd

datumbox force-pushed the references/optimizations branch from bf9bfb6 to bc5a2bd Compare October 21, 2021 10:24

yiwen-song reviewed Oct 21, 2021

View reviewed changes

references/classification/train.py Outdated Show resolved Hide resolved

Incorporating feedback from code review.

14a3323

yiwen-song reviewed Oct 21, 2021

View reviewed changes

yiwen-song approved these changes Oct 21, 2021

View reviewed changes

Merge branch 'main' into references/optimizations

c3c65d2

datumbox merged commit b280c31 into pytorch:main Oct 22, 2021

datumbox deleted the references/optimizations branch October 22, 2021 11:31

datumbox added the enhancement label Oct 22, 2021

datumbox mentioned this pull request Feb 22, 2022

AttributeError: module 'torchvision.ops._utils' has no attribute 'split_normalization_params' #5456

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Additional SOTA ingredients on Classification Recipe #4493

Additional SOTA ingredients on Classification Recipe #4493

datumbox commented Sep 28, 2021 •

edited

Loading

kazhang left a comment

datumbox commented Sep 29, 2021

prabhat00155 commented Sep 29, 2021

datumbox commented Sep 30, 2021

datumbox left a comment

datumbox Oct 21, 2021

yiwen-song Oct 21, 2021

datumbox Oct 21, 2021

datumbox Oct 21, 2021

datumbox Oct 21, 2021

datumbox Oct 21, 2021

datumbox Oct 21, 2021

datumbox Oct 21, 2021

datumbox Oct 21, 2021

datumbox Oct 21, 2021

datumbox Oct 21, 2021

datumbox Oct 21, 2021

yiwen-song Oct 21, 2021

datumbox Oct 21, 2021

yiwen-song Oct 21, 2021 •

edited

Loading

datumbox Oct 21, 2021

yiwen-song commented Oct 21, 2021

datumbox commented Oct 21, 2021

github-actions bot commented Oct 22, 2021

Additional SOTA ingredients on Classification Recipe #4493

Additional SOTA ingredients on Classification Recipe #4493

Conversation

datumbox commented Sep 28, 2021 • edited Loading

kazhang left a comment

Choose a reason for hiding this comment

datumbox commented Sep 29, 2021

prabhat00155 commented Sep 29, 2021

datumbox commented Sep 30, 2021

datumbox left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yiwen-song Oct 21, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yiwen-song commented Oct 21, 2021

datumbox commented Oct 21, 2021

github-actions bot commented Oct 22, 2021

datumbox commented Sep 28, 2021 •

edited

Loading

yiwen-song Oct 21, 2021 •

edited

Loading