Skip to content

Conversation

@dchigarev
Copy link
Contributor

@dchigarev dchigarev commented Oct 14, 2025

As suggested here the PR adds an optional layout attribute for LoadGather and StoreScatter ops.

For the load-op the attribute describes the layout of the result (ex layout_result_0), and for store-op it describes the layout for the vector-to-store operand (ex layout_operand_0).

The PR also reworks xegpu-unroll and wg-to-sg-distribute passes to consider and carry the new layout attribute.

The helper utility function getDistributeLayoutAttr is reworked to return either layout_operand/result_0 or layout for load/store ops (denepding on which one is set) to preserve backward compatibility. The setDistributeLayoutAttr function can now only set the new layout attr.

@dchigarev
Copy link
Contributor Author

Some tests are failing after rebasing on the main branch. Fixing those...

// CHECK-NEXT: %[[T2:.*]] = xegpu.create_tdesc %[[ARG1]], %[[CST]] : memref<256xf16>, vector<16xindex> ->
// CHECK-SAME: !xegpu.tensor_desc<16x16xf16, #xegpu.scatter_tdesc_attr<chunk_size = 16 : i64>, #xegpu.layout<lane_layout = [16, 1], lane_data = [1, 2]>>
// CHECK-NEXT: %{{.*}} = xegpu.load %[[T2]], %[[CST0]] {layout_result_0 = #xegpu.layout<lane_layout = [16, 1], lane_data = [1, 2]>}
// CHECK-NEXT: %{{.*}} = xegpu.load %[[T2]], %[[CST0]] <{layout = #xegpu.layout<lane_layout = [16, 1], lane_data = [1, 2]>}>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need to change the layout here.
It is checking whether the propagation set the temporarily layout attribute for the load result correct.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may create separate test that test how the propagation honor the user's setting. Say, user to set a different layout like
layout = #xegpu.layout<lane_layout = [4, 4], lane_data = [1, 2]
for store and expect it propagating from store to load.

Once user set it, the propagation should honor user's setting instead of using its default one.

Note that these xegpu.load variant is to be deprecated. Please just focus on xegpu.load variant that has memref as input.
Also the test may not use chunk_size. We don't really expect user to use the chunk load.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may create separate test that test how the propagation honor the user's setting.

this case seems to be not supported yet

layout_result_0 = #xegpu.layout<lane_layout = [16, 1], lane_data = [1, 2]>
} : memref<256xf16>, vector<16xindex>, vector<16xi1> -> vector<16x8xf16>
%3 = xegpu.load %src[%offset], %1 <{chunk_size=8,
layout = #xegpu.layout<lane_layout = [16, 1], lane_data = [1, 2]>
Copy link
Contributor

@Jianhui-Li Jianhui-Li Oct 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would leave these two tests as is.

%offset = arith.constant {layout_result_0 = #xegpu.layout<sg_layout = [8, 4], sg_data = [32, 4]>} dense<0> : vector<256x16xindex>
%mask = arith.constant {layout_result_0 = #xegpu.layout<sg_layout = [8, 4], sg_data = [32, 4]>} dense<1> : vector<256x16xi1>
%load = xegpu.load %src[%offset], %mask {chunk_size = 1, layout_result_0 = #xegpu.layout<sg_layout = [8, 4], sg_data = [32, 4]>, l1_hint = #xegpu.cache_hint<cached>}
%load = xegpu.load %src[%offset], %mask {chunk_size = 1, layout = #xegpu.layout<sg_layout = [8, 4], sg_data = [32, 4]>, l1_hint = #xegpu.cache_hint<cached>}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't think you need to touch these tests.
These tests assume the temporary layout attributes being assigned, so not related to the permanent layout.

@dchigarev dchigarev force-pushed the dchigarev/load_layout branch 2 times, most recently from feb4def to b58eb10 Compare October 23, 2025 08:54
…r ops

Signed-off-by: dchigarev <dmitry.chigarev@intel.com>
Signed-off-by: dchigarev <dmitry.chigarev@intel.com>
Signed-off-by: dchigarev <dmitry.chigarev@intel.com>
@dchigarev dchigarev force-pushed the dchigarev/load_layout branch from 1ed9416 to 5fd0ba1 Compare October 25, 2025 14:10
Signed-off-by: dchigarev <dmitry.chigarev@intel.com>
/*l2_hint=*/xegpu::CachePolicyAttr{},
/*l3_hint=*/xegpu::CachePolicyAttr{});
/*l3_hint=*/xegpu::CachePolicyAttr{},
/*layout=*/nullptr);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#163071 to fill the gap

Comment on lines -288 to +303
// CHECK: xegpu.store %[[VAL]], %[[ARG0]][%[[CST]]], %[[MASK]] <{chunk_size = 1 : i64, l1_hint = #xegpu.cache_hint<cached>}>
// CHECK-SAME: {layout_operand_0 = #xegpu.layout<inst_data = [8]>, layout_operand_2 = #xegpu.layout<inst_data = [8]>,
// CHECK: xegpu.store %[[VAL]], %[[ARG0]][%[[CST]]], %[[MASK]] <{chunk_size = 1 : i64, l1_hint = #xegpu.cache_hint<cached>, layout = #xegpu.layout<inst_data = [8]>}>
// CHECK-SAME: {layout_operand_2 = #xegpu.layout<inst_data = [8]>,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setDistributeLayoutAttr now always sets "permament" layout attr for store/load ops, that's why temp layout becomes "permament" here (I suppose that should be okay, since all the other passes can already treat this permament attribute)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants