Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NVPTX] Custom lower integer<->bf16 conversions for sm_80 #74827

Closed
wants to merge 1 commit into from
Closed

[NVPTX] Custom lower integer<->bf16 conversions for sm_80 #74827

wants to merge 1 commit into from

Conversation

d0k
Copy link
Member

@d0k d0k commented Dec 8, 2023

sm_80 only has f32->bf16 conversions, the remaining integer conversions arrived with sm_90. Use a two-step conversion for sm_80.

There doesn't seem to be a way to express this promotion directly within the legalization framework, so fallback on Custom lowering.

sm_80 only has f32->bf16 conversions, the remaining integer conversions
arrived with sm_90. Use a two-step conversion for sm_80.

There doesn't seem to be a way to express this promotion directly within
the legalization framework, so fallback on Custom lowering.
@d0k d0k requested a review from Artem-B December 8, 2023 11:23
copybara-service bot pushed a commit to openxla/xla that referenced this pull request Dec 11, 2023
We tried this before with an intrinsic, but that breaks vectorization. Relying
on native LLVM types doesn't while delivering the same code improvements. The
downside is that LLVM now knows that it's a bfloat instead of a i16 and will
optimize based on it. While making this change I had to patch a bunch of holes
in the NVPTX LLVM backend, there might be more.

Depends on llvm/llvm-project#74827

PiperOrigin-RevId: 589102456
Copy link
Member

@Artem-B Artem-B left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM in general, but I'm curious whether FP_TO_BF16 and BF16_TO_FP would produce better/worse/same SASS for these conversions.

@@ -766,6 +766,12 @@ NVPTXTargetLowering::NVPTXTargetLowering(const NVPTXTargetMachine &TM,
AddPromotedToType(Op, MVT::bf16, MVT::f32);
}

for (MVT VT : {MVT::i1, MVT::i16, MVT::i32, MVT::i64}) {
setOperationAction(
{ISD::SINT_TO_FP, ISD::UINT_TO_FP, ISD::FP_TO_SINT, ISD::FP_TO_UINT},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we make it conditional on SM/PTX here, instead of checking in the custom lowering?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, done.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be useful to update the review with the latest changes. I got puzzled for a bit to see this pull request closed with this item marked as done, but unchanged. And then I went to check the actual commit and find the expected changes to be present there.

llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp Show resolved Hide resolved
llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp Show resolved Hide resolved
@d0k
Copy link
Member Author

d0k commented Dec 11, 2023

Thanks for the review :)

d0k added a commit that referenced this pull request Dec 11, 2023
sm_80 only has f32->bf16 conversions, the remaining integer conversions
arrived with sm_90. Use a two-step conversion for sm_80.

There doesn't seem to be a way to express this promotion directly within
the legalization framework, so fallback on Custom lowering.
@d0k d0k closed this Dec 11, 2023
copybara-service bot pushed a commit to openxla/xla that referenced this pull request Dec 12, 2023
We tried this before with an intrinsic, but that breaks vectorization. Relying
on native LLVM types doesn't while delivering the same code improvements. The
downside is that LLVM now knows that it's a bfloat instead of a i16 and will
optimize based on it. While making this change I had to patch a bunch of holes
in the NVPTX LLVM backend, there might be more.

Depends on llvm/llvm-project#74827

PiperOrigin-RevId: 589102456
copybara-service bot pushed a commit to openxla/xla that referenced this pull request Dec 12, 2023
We tried this before with an intrinsic, but that breaks vectorization. Relying
on native LLVM types doesn't while delivering the same code improvements. The
downside is that LLVM now knows that it's a bfloat instead of a i16 and will
optimize based on it. While making this change I had to patch a bunch of holes
in the NVPTX LLVM backend, there might be more.

Depends on llvm/llvm-project#74827

PiperOrigin-RevId: 590118269
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Dec 12, 2023
We tried this before with an intrinsic, but that breaks vectorization. Relying
on native LLVM types doesn't while delivering the same code improvements. The
downside is that LLVM now knows that it's a bfloat instead of a i16 and will
optimize based on it. While making this change I had to patch a bunch of holes
in the NVPTX LLVM backend, there might be more.

Depends on llvm/llvm-project#74827

PiperOrigin-RevId: 590118269
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants