Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Hotfix] Fix BOFT mixed precision #1925

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

Edenzzzz
Copy link

@Edenzzzz Edenzzzz commented Jul 14, 2024

The authors of BOFT seemingly forgot to cast some data types in the bf16/fp16 mixed precision setting, so me and @srguo24 fixed them during our research project.
See error below (reproducible when running any model with transformers trainer in bf16)
image

Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for providing this fix for BOFT. Do you have a small example that produces the error you showed? Ideally, we can turn this into a unit test for this bug.

@@ -77,9 +78,6 @@ def get_fbd_cuda():
if _FBD_CUDA is not None:
return _FBD_CUDA

# This import initializes cuda context and should thus be local, see issue 1877
from torch.utils.cpp_extension import load
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please undo this change? The import should be local. Maybe merging with/rebasing on the latest main is sufficient.

boft_rotation = butterfly_oft_mat @ boft_rotation
boft_scale = boft_s * boft_scale

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants