Skip to content

Conversation

sebpop
Copy link
Contributor

@sebpop sebpop commented Sep 3, 2025

Add detailed comments explaining why each function should/shouldn't be unroll-and-jammed based on memory access patterns and dependencies.

Fix loop bounds to ensure array accesses are within array bounds:

  • sub_sub_less: j starts from 1 (not 0) to ensure j-1 >= 0
  • sub_sub_less_3d: k starts from 1 (not 0) to ensure k-1 >= 0
  • sub_sub_outer_scalar: j starts from 1 (not 0) to ensure j-1 >= 0

…ds (NFC)

- Add detailed comments explaining why each function should/shouldn't be
  unroll-and-jammed based on memory access patterns and dependencies.
- Fix loop bounds to ensure array accesses are within array bounds:
  * sub_sub_less: j starts from 1 (not 0) to ensure j-1 >= 0
  * sub_sub_less_3d: k starts from 1 (not 0) to ensure k-1 >= 0
  * sub_sub_outer_scalar: j starts from 1 (not 0) to ensure j-1 >= 0
Copy link
Collaborator

@sjoerdmeijer sjoerdmeijer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a good fix to me.

Comment on lines +61 to +63
; No dependency conflict: A[i+1][j] from iteration (i,j) doesn't conflict with
; any A[i'][j'] from unrolled j iterations since j' values are different and
; i+1 from current doesn't overlap with i' from unrolled iterations.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sum = 0;
for (int i = 0; i < N; ++i) // N > 0
  for (int j = 0; j < N; ++j) 
    sum += i * B[j];
    A[i][j] = 1;
    A[i+1][j] = sum;

unrolled j iterations

The i (outer loop) iterations are unrolled with unroll-and-jam, not the j iterations.

since j' values are different and i+1 from current doesn't overlap with i' from unrolled iteration

A[i+1][j] and A[i'][j] with i'==i+1 (next iteration that will be in the same body ofter unroll-and-jam) do point to the same memory, so there seems to be a hazard between the iterations. It is just there is no iteration from [i,j+1] to [i,N-1], and [i+1,0] to [i+1,j-1] (iterations that execute between [i,j] and [i',j] in the original loop) do access A[i+1][j] or A[i+2][j]. Maybe this is saying this, but I have difficulty understanding this explanation.

Factor-2 unroll:

sum0 = 0;
sum1 = 0;
for (int i = 0; i < N; i+=2)
  for (int j = 0; j < N; ++j) 
    sum0 += i * B[j];
    A[i][j] = 1;
    A[i + 1][j] = sum0;
    
    sum1 += (i+1) * B[j];
    A[i+1][j] = 1;
    A[i+2][j] = sum1;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants