What happens when EMAHook and GradientCumulativeOptimizerHook is both used? #1509

kwanyoungpark · 2023-04-24T02:15:19Z

kwanyoungpark
Apr 24, 2023

I am trying to reproduce the results of ConvNeXt-Tiny with my GPU server.

Since I have only small amount of GPU, I used gradient accumulation for it. I expected this would be equivalent to larger batch size since there is no batch norm layers in ConvNeXt.

However, I couldn't reproduce the results (around 81.6%~81.7%).

After thinking about the config file, I found out that since GradientCumulativeOptimizerHook stales the update for 8 iterations, EMAHook might be called 8 times for single update (Which would be similar to momentum=8e-4).

Is this concern valid? If is valid, would there be a possible fix?

Snippet of my config:
optimizer_config = dict(type="GradientCumulativeOptimizerHook", cumulative_iters=8, grad_clip=None)
custom_hooks = [dict(type='EMAHook', momentum=1e-4, priority='ABOVE_NORMAL')]

Answered by Ezra-Yu

Apr 24, 2023

I am very sorry we have not done the relevant experiments, I think the biggest problem may not be on the EMA, generally speaking, EMA only has ~ 0.2 or so gain. And more frequent EMAs should not reduce the effectiveness of the model.

Maybe you can try --auto-scale and no GradientCumulativeOptimizerHook`. Our result was obtained by only using 16 GPUs, which is different from the 32 GPUs of the official paper.

View full answer

Ezra-Yu · 2023-04-24T03:47:57Z

Ezra-Yu
Apr 24, 2023
Collaborator

I am very sorry we have not done the relevant experiments, I think the biggest problem may not be on the EMA, generally speaking, EMA only has ~ 0.2 or so gain. And more frequent EMAs should not reduce the effectiveness of the model.

Maybe you can try --auto-scale and no GradientCumulativeOptimizerHook`. Our result was obtained by only using 16 GPUs, which is different from the 32 GPUs of the official paper.

1 reply

kwanyoungpark Apr 25, 2023
Author

Got it. I will try lr auto-scale (without GradientCumulativeOptimizerHook).
Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What happens when EMAHook and GradientCumulativeOptimizerHook is both used? #1509

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

What happens when EMAHook and GradientCumulativeOptimizerHook is both used? #1509

kwanyoungpark Apr 24, 2023

Replies: 1 comment · 1 reply

Ezra-Yu Apr 24, 2023 Collaborator

kwanyoungpark Apr 25, 2023 Author

kwanyoungpark
Apr 24, 2023

Replies: 1 comment 1 reply

Ezra-Yu
Apr 24, 2023
Collaborator

kwanyoungpark Apr 25, 2023
Author