Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[herd] Non temporal memory accesses (AArch64) #492

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

maranget
Copy link
Member

@maranget maranget commented Jan 4, 2023

This PR handles non-temporal memory accesses. They have a noticeable impact over the memory models, as an address dependency that ends into a non-temporal read does not create order. See Arm Architecture Reference Manual, section C.3.2.9 "Load/store scalar SIMD and floating-point":

In addition, there is an exception to the usual memory ordering rules. If an address dependency exists between two memory reads, and a load non-temporal pair instruction generated the second read, then in the absence of any other barrier mechanism to achieve order, those memory accesses can be observed in any order by the other observers within the shareability domain of the memory addresses being accessed.

@maranget
Copy link
Member Author

This PR now includes one single commit that updates the cat file. All other operations on non-temporal accesses have been merged into master (cf. PR #549).

@jalglave
Copy link
Member

Thanks @maranget. Please could you add a test to illustrate the required change too?
thanks in advance,
Jade

@maranget
Copy link
Member Author

maranget commented Apr 18, 2023

Here you are. Please see test L070 and commit 71adffa.

According to documentation, there are no address dependencies
that ends in a non-temporal read:

  In addition, there is an exception to the usual memory ordering
  rules. If an address dependency exists between two memory reads, and a
  Load Non-temporal Pair instruction generated the second read, then in
  the absence of any other barrier mechanism to achieve order, the
  memory accesses can be observed in any order by the other observers
  within the shareability domain of the memory addresses being
  accessed.
Added tests illustrate the "non temporal" exception to
address dependencies. More precisely L070 illustrates
the exception, while L071 is a control.
@murzinv
Copy link

murzinv commented Jan 11, 2024

Although PR quotes only Load/store scalar SIMD and floating-point the same applies to ordinal Load/store non-temporal pair [1]

If an address dependency exists between two memory reads, and a Load Non-temporal Pair instruction generated the second read, then in the absence of any other barrier mechanism to achieve order, the memory accesses can be observed in any order by the other observers within the shareability domain of the memory addresses being accessed

and SVE as well [2]

If an address dependency exists between two Read Memory and an SVE non-temporal vector load instruction
generated the second read, then in the absence of any other barrier mechanism to achieve order, the memory
accesses can be observed in any order by the other observers within the shareability domain of the memory
addresses being accessed.

With #749 I can confirm that

$ diyone7 -arch AArch64 -variant neon "DMB.STdWWNePaP Rfe DpAddrdRPNePaN Fre" | tee T
AArch64 A
"DMB.STdWWNePaP Rfe DpAddrdRPNePaN FreNePaNNePa"
Generator=diyone7 (version 7.56+03)
Prefetch=0:x=F,0:y=W,1:y=F,1:x=T
Com=Rf Fr
Orig=DMB.STdWWNePaP Rfe DpAddrdRPNePaN FreNePaNNePa
{
int x[2];

0:X0=x; 0:X2=y;
1:X0=y; 1:X3=x;
}
 P0                | P1                    ;
 MOVI V0.4S,#1     | LDR W1,[X0]           ;
 MOVI V1.4S,#2     | EOR W2,W1,W1          ;
 STP S0,S1,[X0,#0] | ADD X4,X3,W2,SXTW     ;
 DMB ST            | LDNP S0,S1,[X4,#0]    ;
 MOV W1,#1         | ADD V2.4S,V0.4S,V1.4S ;
 STR W1,[X2]       | FMOV W5,S2            ;
exists (1:X1=1 /\ 1:X5=0)

now allowed

$ herd7 -variant neon T
Test A Allowed
States 8
1:X1=0; 1:X5=0;
1:X1=0; 1:X5=1;
1:X1=0; 1:X5=2;
1:X1=0; 1:X5=3;
1:X1=1; 1:X5=0;
1:X1=1; 1:X5=1;
1:X1=1; 1:X5=2;
1:X1=1; 1:X5=3;
Ok
Witnesses
Positive: 1 Negative: 7
Condition exists (1:X1=1 /\ 1:X5=0)
Observation A Sometimes 1 7
Time A 0.07
Hash=50d7d966df3b6eff816a143e4a5eb710

Just my 2p. Hope that would help to justify the change.

[1] C3.2.4 Load/store non-temporal pair, ARM DDI 0487J.
[2] Rule RVMDYZ, ARM DDI 0487J.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants