-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[0009] A proposal for DXIL Function Scalarization #62
Conversation
This covers the motivations, background of how Scalarization is working in DXC and the approach to tasking we should take. This is also and updated proposal to what was decided in team meeting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments mostly calling out typos and bits I had trouble following.
proposals/NNNN-DXIL-Scalarization.md
Outdated
|
||
`DXILOpLowering` is also the last place for a functional reason. The scalarizer | ||
pass only operates on llvm intrinsics that are `TriviallyVectorizable`. Further | ||
it only converts the scalarized llvm intrinsics meaning there is no way for it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it only converts the scalarized llvm intrinsics meaning there is no way for it | |
it only converts to(?) scalarized llvm intrinsics meaning there is no way for it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no the is correct here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Further it only converts the scalarized llvm intrinsics
If this is correct, then I'm afraid I'm not sure I understand what it means. I would have thought that the scalarizer converts the vectorized intrinics to scalar ones. So I guess I've got a gap in my understanding here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So when i say llvm intrinsics i mean intrinsics defined in intrinsics.td. So im trying to make a distinction between the intrinsics that are exposed to all backends in intrinsics.td and the direct x intrinsics. The distinction being that if we want this to work with our backend we need a way of exposing the direct x intrinsics to the scalarizer pass.
proposals/NNNN-DXIL-Scalarization.md
Outdated
makes the most sense to use the Scalarizer pass. | ||
- [Scalarizer.cpp](https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Scalar/Scalarizer.cpp) | ||
|
||
The `scalarizer` pass with the `-scalarize-load-store` and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the plan for static global variables with vector or vector array type and groupshared variables with vector array type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those are handled by the same scalarizer pass with the load store flag. Here is a groushared example
groupshared float3 sharedData[2];
export void fn2() {
sharedData[0] = float3(1.0f, 2.0f, 3.0f);
sharedData[1] = float3(2.0f, 4.0f, 6.0f);
}
generates the following IR
@"sharedData" = local_unnamed_addr addrspace(3) global [2 x <3 x float>] zeroinitializer, align 16
define void @fn2 () local_unnamed_addr {
store <3 x float> <float 1.000000e+00, float 2.000000e+00, float 3.000000e+00>, ptr addrspace(3) @"sharedData", align 16
store <3 x float> <float 2.000000e+00, float 4.000000e+00, float 6.000000e+00>, ptr addrspace(3) getelementptr inbounds (i8, ptr addrspace(3) @"sharedData", i32 16), align 16
ret void
}
opt.exe -S -passes=scalarizer -scalarize-load-store scalarize_store.ll
@"sharedData" = local_unnamed_addr addrspace(3) global [2 x <3 x float>] zeroinitializer, align 16
define void @"?fn2@@YAXXZ"() local_unnamed_addr {
store float 1.000000e+00, ptr addrspace(3) @"sharedData", align 16
store float 2.000000e+00, ptr addrspace(3) getelementptr (float, ptr addrspace(3) @"sharedData", i32 1), align 4
store float 3.000000e+00, ptr addrspace(3) getelementptr (float, ptr addrspace(3) @"sharedData", i32 2), align 8
store float 2.000000e+00, ptr addrspace(3) getelementptr inbounds (i8, ptr addrspace(3) @"sharedData", i32 16), align 16
store float 4.000000e+00, ptr addrspace(3) getelementptr (float, ptr addrspace(3) getelementptr inbounds (i8, ptr addrspace(3) @"sharedData", i32 16), i32 1), align 4
store float 6.000000e+00, ptr addrspace(3) getelementptr (float, ptr addrspace(3) getelementptr inbounds (i8, ptr addrspace(3) @"sharedData", i32 16), i32 2), align 8
ret void
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This type of @"sharedData" is still [2 x <3 x float>] after opt.
We need it to be [2 x [3 x float]] for groupshared.
For static global, we could split it to 3 [2 x float] globals if possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My previous proposal covered global scope, but this one doesn't. i'll make that more explicit. This won't change the layout of global scope just how we access that data. The last piece that DXC does that this pass doesn't is to encode sharedData
as a 6 element 1d array. All the store operations are correct.
!20 = !DIGlobalVariable(name: "sharedData", linkageName: "\01?sharedData@@3PAV?$vector@M$02@@A.v.1dim", scope: !0, file: !1, line: 1, type: !21, isLocal: false, isDefinition: true, variable: [6 x float] addrspace(3)* @"\01?sharedData@@3PAV?$vector@M$02@@A.v.1dim")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we'll have module pass to cover global variables?
Flat multiple dim array into 1d array should be a separate pass, not part of scalarization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There will have to be another pass to handle data layout like this. Thats already something I'm considering because of memory intrinsics.
proposals/NNNN-DXIL-Scalarization.md
Outdated
* Author(s): [Farzon Lotfi](https://github.com/farzonl) | ||
* Sponsor: [Farzon Lotfi](https://github.com/farzonl) | ||
* Status: **Under Consideration** | ||
* Impacted Projects: Clang |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LLVM here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in our past proposals in HLSL specs we have used Clang to indicate the upstream LLVM project and DXC to indicate DXC, and Clang and DXC to indicate both. I'm keeping that same distinction, but open to changing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should probably just remove the Impacted Projects
line from the template in this repo. Nothing goes into this repo that isn't targeting LLVM/Clang.
48aff81
to
3169aef
Compare
proposals/NNNN-DXIL-Scalarization.md
Outdated
|
||
`DXILOpLowering` is also the last place for a functional reason. The scalarizer | ||
pass only operates on llvm intrinsics that are `TriviallyVectorizable`. Further | ||
it only converts the scalarized llvm intrinsics meaning there is no way for it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Further it only converts the scalarized llvm intrinsics
If this is correct, then I'm afraid I'm not sure I understand what it means. I would have thought that the scalarizer converts the vectorized intrinics to scalar ones. So I guess I've got a gap in my understanding here.
a9e708e
to
4112bcc
Compare
This covers the motivations, background of how Scalarization is working in DXC
and the approach to tasking we should take.
This is also and updated proposal to what was decided in team meeting
It does not cover Data scalarization.