Move EMA to after backward. #494

sf-wind · 2023-03-04T22:47:47Z

Summary:
Currently EMA computation is in the after step hook. It is in the critical path where no other work is available. This increases the training iteration time. This diff moves the EMA computation to after the backward but before the optimizer step. This way, the majority of the EMA computation time on the CPU can be hidden since CPU at that time is waiting for the GPU to finish the backward anyway. This change may completely hide the EMA CPU time. It reduces the EMA time from 20ms to 4ms, where the 4ms is the GPU time.

However, with this change, the EMA gets its value from the previous iteration value (since it is before step). but since we do many epochs of training, one iteration difference may not be significant.

Reviewed By: tglik

Differential Revision: D43527552

Summary: Currently the EMA implementation first does the multiplication and then does the addition. It requires two round trips from HBM. With the lerp operator, one kernel can do both. This change uses LERP to compute EMA instead. It reduces the GPU EMA computation time by 40%. Differential Revision: https://www.internalfb.com/diff/D43525938?entry_point=27 fbshipit-source-id: cc8389a5d93f52bfa472b1533ea52bb8c19834cd

Summary: Currently EMA computation is in the after step hook. It is in the critical path where no other work is available. This increases the training iteration time. This diff moves the EMA computation to after the backward but before the optimizer step. This way, the majority of the EMA computation time on the CPU can be hidden since CPU at that time is waiting for the GPU to finish the backward anyway. This change may completely hide the EMA CPU time. It reduces the EMA time from 20ms to 4ms, where the 4ms is the GPU time. However, with this change, the EMA gets its value from the previous iteration value (since it is before step). but since we do many epochs of training, one iteration difference may not be significant. Reviewed By: tglik Differential Revision: D43527552 fbshipit-source-id: 4eea88a935befad3cf6f8f20e6198f6b3a3169b6

facebook-github-bot · 2023-03-04T22:48:31Z

This pull request was exported from Phabricator. Differential Revision: D43527552

facebook-github-bot · 2023-03-05T17:15:20Z

This pull request has been merged in a7dc757.

Summary: X-link: facebookresearch/d2go#494 Currently EMA computation is in the after step hook. It is in the critical path where no other work is available. This increases the training iteration time. This diff moves the EMA computation to after the backward but before the optimizer step. This way, the majority of the EMA computation time on the CPU can be hidden since CPU at that time is waiting for the GPU to finish the backward anyway. This change may completely hide the EMA CPU time. It reduces the EMA time from 20ms to 4ms, where the 4ms is the GPU time. However, with this change, the EMA gets its value from the previous iteration value (since it is before step). but since we do many epochs of training, one iteration difference may not be significant. Reviewed By: tglik Differential Revision: D43527552 fbshipit-source-id: 1faa9d910b20cae0fc77da541bc0ad176bce18a8

sf-wind and others added 2 commits March 4, 2023 14:47

facebook-github-bot added CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported labels Mar 4, 2023

facebook-github-bot closed this in a7dc757 Mar 5, 2023

facebook-github-bot added the Merged label Mar 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move EMA to after backward. #494

Move EMA to after backward. #494

Uh oh!

sf-wind commented Mar 4, 2023

Uh oh!

facebook-github-bot commented Mar 4, 2023

Uh oh!

facebook-github-bot commented Mar 5, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Move EMA to after backward. #494

Move EMA to after backward. #494

Uh oh!

Conversation

sf-wind commented Mar 4, 2023

Uh oh!

facebook-github-bot commented Mar 4, 2023

Uh oh!

facebook-github-bot commented Mar 5, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants