Skip to content

Add support for FusedAdam to be mathematically equivalent to pytorch/AdamW#10106

Merged
baijumeswani merged 3 commits into
masterfrom
bmeswani/update_fused_adam
Jan 21, 2022
Merged

Add support for FusedAdam to be mathematically equivalent to pytorch/AdamW#10106
baijumeswani merged 3 commits into
masterfrom
bmeswani/update_fused_adam

Conversation

@baijumeswani
Copy link
Copy Markdown
Contributor

ORT's FusedAdam is currently mathematically equivalent to transformers/AdamW. Users wanting to work with pytorch/AdamW mathematical implementation would see convergence disparity because of the subtle differences.

This pull request introduces a way for users to select the implementation they want so that they can get the performance gains, as well as aligned convergence.

@baijumeswani baijumeswani added training issues related to ONNX Runtime training; typically submitted using template component:training-frontend labels Dec 21, 2021
Comment thread orttraining/orttraining/python/training/optim/fused_adam.py Outdated
Comment thread orttraining/orttraining/python/training/optim/fused_adam.py
@baijumeswani baijumeswani merged commit 1416065 into master Jan 21, 2022
@baijumeswani baijumeswani deleted the bmeswani/update_fused_adam branch January 21, 2022 21:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

training issues related to ONNX Runtime training; typically submitted using template

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants