Skip to content

Conversation

sebpop
Copy link
Contributor

@sebpop sebpop commented Sep 3, 2025

Add detailed comments explaining each function's memory access patterns and why they should/shouldn't be unroll-and-jammed:

  • fore_aft_*: Dependencies between fore block and aft block

  • fore_sub_*: Dependencies between fore block and sub block

  • sub_aft_*: Dependencies between sub block and aft block

  • sub_sub_*: Dependencies within sub block

  • *_less: Backward dependency (i-1) - safe for fore/aft, fore/sub, sub/aft; unsafe for sub/sub due to jamming conflicts

  • *_eq: Same iteration dependency (i+0) - safe due to preserved execution order

  • *_more: Forward dependency (i+1) - unsafe due to write-after-write races between unrolled iterations, except sub/sub case creates conflicts

Add detailed comments explaining each function's memory access patterns
and why they should/shouldn't be unroll-and-jammed:

- fore_aft_*: Dependencies between fore block and aft block
- fore_sub_*: Dependencies between fore block and sub block
- sub_aft_*: Dependencies between sub block and aft block
- sub_sub_*: Dependencies within sub block

- *_less: Backward dependency (i-1) - safe for fore/aft, fore/sub, sub/aft;
  unsafe for sub/sub due to jamming conflicts
- *_eq: Same iteration dependency (i+0) - safe due to preserved execution order
- *_more: Forward dependency (i+1) - unsafe due to write-after-write races
  between unrolled iterations, except sub/sub case creates conflicts
Copy link
Collaborator

@sjoerdmeijer sjoerdmeijer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, explaining these test-cases looks very useful to me, LGTM.

Copy link
Member

@Meinersbur Meinersbur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Since you alreadt reverse-engineered the functions, pseudocode for the loop nests would have been nice, but adding an analysis is already very helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants