-
Notifications
You must be signed in to change notification settings - Fork 794
[ESIMD] Fix implementations of block_load(usm, ...) and block_load(acc) #11797
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ESIMD] Fix implementations of block_load(usm, ...) and block_load(acc) #11797
Conversation
…c,...) 1) Fix the big mess in E2E test for block_load(). Test did not really check the mask variant. It also used wrong alignments. 2) Fix the comments for USM and ACC block_load implementations. 3) Minor optimization for ACC block_load functions that do not accept the byte_offset operand. We can assume align16 for them. Signed-off-by: Klochkov, Vyacheslav N <vyacheslav.n.klochkov@intel.com>
| static_assert(!PropertyListT::template has_property<cache_hint_L3_key>(), | ||
| "L3 cache hint is reserved. The old/experimental L3 LSC cache " | ||
| "hint is cache_level::L2 now."); | ||
| properties Props{cache_hint_L1<L1Hint>, cache_hint_L2<L2Hint>, alignment<16>}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we want to ignore the properties-given alignment in this and the other cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
buffers/accessors should reference aligned memory on device. If byte_offset is zero we load from aligned device-buffer.
If user says the alignment is 4 it is unnecessarily pessimistic, which will cause less efficient code-gen.
If alignment is more than 16, e.g. 256, it will not give any extra benefit to code-gen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
silently ignoring it seems somewhat strange to me, could we static assert if it is specified or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
having something like:
static_assert(!IsUserAlignedmentSpecified || UserAlignment >= 16, "Alignment is too pessimistic, specify 16 or more for more efficient code-gen");seems too ... (annoying?).
Compiler reserves the right to optimize the code. This ignoring of smaller alignment is the optimization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My worry here is that we're silently ignoring explicit user specification that a user might expect to applied. We already silently ignore any other properties in the properties vector not related to ESIMD and just warn people in a comment but that seems reasonable to me since it's a totally different thing, if we have some ESIMD APIs that honor alignment and some that silently ignore it, that seems like it might be confusing to the user. If the alignment property spec was "aligned by at least X bytes" that's one thing, but I think the spec is exactly X bytes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
passing alignment means "the address is aligned at least to "N"-bytes", but it may NOT be aligned at less than N-bytes (only N-bytes aligned guaranteed).
It is same as if you call USM aligned_alloc(align=16, size); which means the requested memory must be aligned at at least 16-bytes, but the returned address may be 256-bytes aligned too (accidentally or intentionally, depending on implementation).
In this case user's call of block_load<float, N>(acc, {alignement<4>}); means that user guarantees only 4-byte alignment. If by some heuristics we can guarantee better alignment, then why not use it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok that convinces me, thanks
Fix the big mess in E2E test for block_load(). Test did not really
check the mask variant. It also used wrong alignments.
Fix the comments for USM and ACC block_load implementations.
Minor optimization for ACC block_load functions that do not accept
the byte_offset operand. We can assume align16 for them.