Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[0009] A proposal for DXIL Function Scalarization #62

Merged
merged 5 commits into from
Sep 11, 2024

Conversation

farzonl
Copy link
Member

@farzonl farzonl commented Sep 5, 2024

This covers the motivations, background of how Scalarization is working in DXC
and the approach to tasking we should take.
This is also and updated proposal to what was decided in team meeting

It does not cover Data scalarization.

This covers the motivations, background of how Scalarization is working in DXC
 and the approach to tasking we should take.
This is also and updated proposal to what was decided in team meeting
Copy link
Collaborator

@damyanp damyanp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments mostly calling out typos and bits I had trouble following.

proposals/NNNN-DXIL-Scalarization.md Outdated Show resolved Hide resolved
proposals/NNNN-DXIL-Scalarization.md Outdated Show resolved Hide resolved
proposals/NNNN-DXIL-Scalarization.md Outdated Show resolved Hide resolved
proposals/NNNN-DXIL-Scalarization.md Outdated Show resolved Hide resolved
proposals/NNNN-DXIL-Scalarization.md Outdated Show resolved Hide resolved
proposals/NNNN-DXIL-Scalarization.md Outdated Show resolved Hide resolved

`DXILOpLowering` is also the last place for a functional reason. The scalarizer
pass only operates on llvm intrinsics that are `TriviallyVectorizable`. Further
it only converts the scalarized llvm intrinsics meaning there is no way for it
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
it only converts the scalarized llvm intrinsics meaning there is no way for it
it only converts to(?) scalarized llvm intrinsics meaning there is no way for it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no the is correct here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Further it only converts the scalarized llvm intrinsics

If this is correct, then I'm afraid I'm not sure I understand what it means. I would have thought that the scalarizer converts the vectorized intrinics to scalar ones. So I guess I've got a gap in my understanding here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So when i say llvm intrinsics i mean intrinsics defined in intrinsics.td. So im trying to make a distinction between the intrinsics that are exposed to all backends in intrinsics.td and the direct x intrinsics. The distinction being that if we want this to work with our backend we need a way of exposing the direct x intrinsics to the scalarizer pass.

proposals/NNNN-DXIL-Scalarization.md Outdated Show resolved Hide resolved
proposals/NNNN-DXIL-Scalarization.md Outdated Show resolved Hide resolved
proposals/NNNN-DXIL-Scalarization.md Outdated Show resolved Hide resolved
makes the most sense to use the Scalarizer pass.
- [Scalarizer.cpp](https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Scalar/Scalarizer.cpp)

The `scalarizer` pass with the `-scalarize-load-store` and
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the plan for static global variables with vector or vector array type and groupshared variables with vector array type?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are handled by the same scalarizer pass with the load store flag. Here is a groushared example

groupshared float3 sharedData[2]; 
export void fn2() { 
    sharedData[0] = float3(1.0f, 2.0f, 3.0f); 
    sharedData[1] = float3(2.0f, 4.0f, 6.0f); 
} 

generates the following IR

@"sharedData" = local_unnamed_addr addrspace(3) global [2 x <3 x float>] zeroinitializer, align 16 
define void @fn2 () local_unnamed_addr { 
    store <3 x float> <float 1.000000e+00, float 2.000000e+00, float 3.000000e+00>, ptr addrspace(3) @"sharedData", align 16 
    store <3 x float> <float 2.000000e+00, float 4.000000e+00, float 6.000000e+00>, ptr addrspace(3)   getelementptr inbounds (i8, ptr addrspace(3) @"sharedData", i32 16), align 16 
    ret void
 }
opt.exe -S -passes=scalarizer -scalarize-load-store  scalarize_store.ll 
@"sharedData" = local_unnamed_addr addrspace(3) global [2 x <3 x float>] zeroinitializer, align 16 
define void @"?fn2@@YAXXZ"() local_unnamed_addr { 
store float 1.000000e+00, ptr addrspace(3) @"sharedData", align 16 
store float 2.000000e+00, ptr addrspace(3) getelementptr (float, ptr addrspace(3) @"sharedData", i32 1), align 4 
store float 3.000000e+00, ptr addrspace(3) getelementptr (float, ptr addrspace(3) @"sharedData", i32 2), align 8 
store float 2.000000e+00, ptr addrspace(3) getelementptr inbounds (i8, ptr addrspace(3) @"sharedData", i32 16), align 16 
store float 4.000000e+00, ptr addrspace(3) getelementptr (float, ptr addrspace(3) getelementptr inbounds (i8, ptr addrspace(3) @"sharedData", i32 16), i32 1), align 4 
store float 6.000000e+00, ptr addrspace(3) getelementptr (float, ptr addrspace(3) getelementptr inbounds (i8, ptr addrspace(3) @"sharedData", i32 16), i32 2), align 8 
ret void 
} 

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This type of @"sharedData" is still [2 x <3 x float>] after opt.
We need it to be [2 x [3 x float]] for groupshared.

For static global, we could split it to 3 [2 x float] globals if possible.

Copy link
Member Author

@farzonl farzonl Sep 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My previous proposal covered global scope, but this one doesn't. i'll make that more explicit. This won't change the layout of global scope just how we access that data. The last piece that DXC does that this pass doesn't is to encode sharedData as a 6 element 1d array. All the store operations are correct.

!20 = !DIGlobalVariable(name: "sharedData", linkageName: "\01?sharedData@@3PAV?$vector@M$02@@A.v.1dim", scope: !0, file: !1, line: 1, type: !21, isLocal: false, isDefinition: true, variable: [6 x float] addrspace(3)* @"\01?sharedData@@3PAV?$vector@M$02@@A.v.1dim")

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we'll have module pass to cover global variables?
Flat multiple dim array into 1d array should be a separate pass, not part of scalarization.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There will have to be another pass to handle data layout like this. Thats already something I'm considering because of memory intrinsics.

* Author(s): [Farzon Lotfi](https://github.com/farzonl)
* Sponsor: [Farzon Lotfi](https://github.com/farzonl)
* Status: **Under Consideration**
* Impacted Projects: Clang
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LLVM here?

Copy link
Member Author

@farzonl farzonl Sep 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in our past proposals in HLSL specs we have used Clang to indicate the upstream LLVM project and DXC to indicate DXC, and Clang and DXC to indicate both. I'm keeping that same distinction, but open to changing it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should probably just remove the Impacted Projects line from the template in this repo. Nothing goes into this repo that isn't targeting LLVM/Clang.

@farzonl farzonl changed the title A proposal for DXIL Scalarization A proposal for DXIL Function Scalarization Sep 10, 2024

`DXILOpLowering` is also the last place for a functional reason. The scalarizer
pass only operates on llvm intrinsics that are `TriviallyVectorizable`. Further
it only converts the scalarized llvm intrinsics meaning there is no way for it
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Further it only converts the scalarized llvm intrinsics

If this is correct, then I'm afraid I'm not sure I understand what it means. I would have thought that the scalarizer converts the vectorized intrinics to scalar ones. So I guess I've got a gap in my understanding here.

@farzonl farzonl merged commit 9fea720 into llvm:main Sep 11, 2024
@farzonl farzonl deleted the DXIL-Scalarization-Proposal branch September 22, 2024 19:30
@farzonl farzonl changed the title A proposal for DXIL Function Scalarization [0009] A proposal for DXIL Function Scalarization Sep 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

4 participants