Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[gen]DC CVAU and IC IVAU relaxations #707

Merged
merged 1 commit into from
Feb 2, 2024
Merged

Conversation

hugookeeffe
Copy link
Contributor

This pull request implements the ability to generate DC CVAU and IC IVAU instructions as a fence instruction. This allows the user to implement these instructions with more fine-grain control into litmus tests.

This PR also implements the ability for dependencies to be generated before Insert relaxations.

The relaxations for DC CVAU and IC IVAU are:

DC.CVAU<n/p>
IC.IVAU<n/p>

where n and p allow the user to choose whether DC/IC act on the previous location or the next location

examples that couldn't be generated previously:

diyone7 -moreedges true -arch AArch64 -variant self "PodWW DC.CVAUp DMB.ISH Rfe PodRR DSB.ISH IC.IVAUn DSB.ISH ISB Ifre"
AArch64 A
"PodWW DC.CVAUp DMB.ISH Rfe PodRR DSB.ISH IC.IVAUn DSB.ISH ISB Ifre"
Generator=diyone7 (version 7.56+03)
Com=Rf Ifr
Orig=PodWW DC.CVAUp DMB.ISH Rfe PodRR DSB.ISH IC.IVAUn DSB.ISH ISB Ifre
{
0:X0=NOP; 0:X1=P1:Lself00; 0:X3=x;
1:X0=x; 1:X2=P1:Lself00;
}
 P0          | P1          ;
 STR W0,[X1] | LDR W1,[X0] ;
 DC CVAU,X1  | DSB ISH     ;
 DMB ISH     | IC IVAU,X2  ;
 MOV W2,#1   | DSB ISH     ;
 STR W2,[X3] | ISB         ;
             | Lself00:    ;
             | B L00       ;
             | MOV W3,#2   ;
             | B L01       ;
             | L00:        ;
             | MOV W3,#1   ;
             | L01:        ;
exists (1:X1=1 /\ 1:X3=1)
diyone7 -moreedges true -arch AArch64 -variant self "[DpCtrlIsbdR DC.CVAUn DSB.ISH IC.IVAUn DSB.ISH ISB Ifre] DMB.ISHdWW Rfe"
AArch64 A
"DpCtrlIsbdR DC.CVAUn DSB.ISH IC.IVAUn DSB.ISH ISB Ifre DMB.ISHdWW Rfe"
Generator=diyone7 (version 7.56+03)
Com=Ifr Rf
Orig=DpCtrlIsbdR DC.CVAUn DSB.ISH IC.IVAUn DSB.ISH ISB Ifre DMB.ISHdWW Rfe
{
0:X0=x; 0:X2=P0:Lself00;
1:X0=NOP; 1:X1=P0:Lself00; 1:X3=x;
}
 P0           | P1          ;
 LDR W1,[X0]  | STR W0,[X1] ;
 CBNZ W1,LC00 | DMB ISH     ;
 LC00:        | MOV W2,#1   ;
 ISB          | STR W2,[X3] ;
 DC CVAU,X2   |             ;
 DSB ISH      |             ;
 IC IVAU,X2   |             ;
 DSB ISH      |             ;
 ISB          |             ;
 Lself00:     |             ;
 B L01        |             ;
 MOV W3,#2    |             ;
 B L02        |             ;
 L01:         |             ;
 MOV W3,#1    |             ;
 L02:         |             ;
exists (0:X1=1 /\ 0:X3=1)

@hugookeeffe hugookeeffe changed the title Di insert [gen]DC CVAU and IC IVAU relaxations Oct 26, 2023
@maranget maranget force-pushed the master branch 2 times, most recently from a23d589 to 5905deb Compare October 31, 2023 13:09
@hugookeeffe hugookeeffe force-pushed the DI_insert branch 2 times, most recently from de16555 to ea2295c Compare October 31, 2023 13:30
@hugookeeffe
Copy link
Contributor Author

This PR has also been tested using the following configuration file. This generates 500 tests for me and scanning through the tests I haven't been able to see anything clearly wrong.

Note that this conf file uses -stdout true in order not to generate files, make sure to remove this line if you desire for litmus test files to be generated instead of output to stdout.

-arch AArch64
-variant self
-safe DC.CVAUn DC.CVAUp IC.IVAUn IC.IVAUp DSB.ISHd** DpAddrdR DpAddrsR Ifre Rfe DpCtrlIsbdR PodWW PodRR PodRW PodWR PosRR PosWW PosWR PosRW [DC.CVAUn, Ifre, PodWW, DSB.ISH, IC.IVAUp] [Ifre PodW* DC.CVAUp DSB.ISH IC.IVAUp DSB.ISH ISB] [Ifre PosW* DC.CVAUp DSB.ISH IC.IVAUp DSB.ISH ISB] [ Pod*R DSB.ISH DC.CVAUn DSB.ISH IC.IVAUn DSB.ISH ISB Ifre] [ DpCtrlIsbdR DC.CVAUn DSB.ISH IC.IVAUn DSB.ISH ISB Ifre] [ Pod*R DSB.ISH IC.IVAUn DSB.ISH ISB Ifre PodW* DC.CVAUp DSB.ISH] [PodRR Ifre PodW* DC.CVAUp DSB.ISH IC.IVAUp DSB.ISH]
-moreedges true
-stdout true
-ins 10
-nprocs 2

@murzinv
Copy link

murzinv commented Nov 3, 2023

Hi @hugookeeffe
Thanks a lot for your work and example config file. I noticed that generated litmus test uses specialized syntax for NOP like 0:X0=NOP; can it be changed to unified syntax for instructions like 0:X0=instr:"NOP";?

@hugookeeffe
Copy link
Contributor Author

Hi @murzinv

Thanks a lot for your work and example config file. I noticed that generated litmus test uses specialized syntax for NOP like 0:X0=NOP; can it be changed to unified syntax for instructions like 0:X0=instr:"NOP";?

I completely agree, I have another PR open at the moment, #703, where I have made the change for this.

Copy link
Collaborator

@artkhyzha artkhyzha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Specifically, I looked through the code, the examples and some of the generated tests, which all seemed good.

@hugookeeffe hugookeeffe mentioned this pull request Nov 21, 2023
gen/AArch64Arch_gen.ml Outdated Show resolved Hide resolved
gen/AArch64Compile_gen.ml Outdated Show resolved Hide resolved
gen/AArch64Arch_gen.ml Outdated Show resolved Hide resolved
Comment on lines 2184 to 2191
| D.CTRL ->
let c,st = emit_ctrl_gen csel st vdep r1 in
let init,cs,st = emit_fence st p init n f in
None,init,insert_isb false c cs,st
| D.CTRLISYNC ->
let c,st = emit_ctrl_gen csel st vdep r1 in
let init,cs,st = emit_fence st p init n f in
None,init,insert_isb true c cs,st
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This two cases can be merged into one like so:

Suggested change
| D.CTRL ->
let c,st = emit_ctrl_gen csel st vdep r1 in
let init,cs,st = emit_fence st p init n f in
None,init,insert_isb false c cs,st
| D.CTRLISYNC ->
let c,st = emit_ctrl_gen csel st vdep r1 in
let init,cs,st = emit_fence st p init n f in
None,init,insert_isb true c cs,st
| D.CTRL | D.CTRLISYNC ->
let c,st = emit_ctrl_gen csel st vdep r1 in
let init,cs,st = emit_fence st p init n f in
None,init,insert_isb (is_ctrlisync dp) c cs,st

What do you think?

Although, I think CTRLISYNC is redundant (one can insert an ISB pseudo relation as well).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the suggestion! I have implemented the change

let vdep = node2vdep n1 in
match dp with
| D.ADDR
| D.DATA -> let init,cs, st = emit_fence st p init n f in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried generating some examples with DpAddr and DpCtrl but I didn't get what I was expecting. For example:

$> diyone7 -moreedges true -arch AArch64 -variant self "DMB.ISHdWW Rfe [DpAddrdR ISB DC.CVAUp DSB.ISH IC.IVAUp DSB.ISH ISB Ifre]"

Throws an error that there can't be an address dependency to code. Which makes sense. But then I think I don't know how to use DpAddr.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, that is strange, what behaviour would you expect here? Ctrl does seem to work, but not Addr. should we expect both to work or neither?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure what are the meaningful tests we can produce with DpAddr*R then. For example, if we ignore the address dependency to code location:

$> diyone7 -moreedges true -arch AArch64 -variant self "DMB.ISHdWW Rfe [DpAddrdR DC.CVAUn DSB.ISH IC.IVAUn DSB.ISH ISB] Fre"
AArch64 A
"DMB.ISHdWW Rfe DpAddrdR DC.CVAUn DSB.ISH IC.IVAUn DSB.ISH ISB Fre"
Generator=diyone7 (version 7.56+03)
Com=Rf Fr
Orig=DMB.ISHdWW Rfe DpAddrdR DC.CVAUn DSB.ISH IC.IVAUn DSB.ISH ISB Fre
{
0:X1=x; 0:X3=y;
1:X0=y; 1:X2=x;
}
 P0          | P1                  ;
 MOV W0,#1   | LDR W1,[X0]         ;
 STR W0,[X1] | DC CVAU,X2          ;
 DMB ISH     | DSB ISH             ;
 MOV W2,#1   | IC IVAU,X2          ;
 STR W2,[X3] | DSB ISH             ;
             | ISB                 ;
             | EOR W3,W1,W1        ;
             | LDR W4,[X2,W3,SXTW] ;
exists (1:X1=1 /\ 1:X4=0)

The DC doesn't have an address dependency from the 1st LDR in P1.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ignoring ifetch for a minute:

$> diyone7 -moreedges true -arch AArch64 -variant self "DMB.ISHdWW Rfe DpAddrdR DC.CVAUn DSB.ISH IC.IVAUn DSB.ISH ISB Fre"
AArch64 A
"DMB.ISHdWW Rfe DpAddrdR DC.CVAUn DSB.ISH IC.IVAUn DSB.ISH ISB Fre"
Generator=diyone7 (version 7.56+03)
Com=Rf Fr
Orig=DMB.ISHdWW Rfe DpAddrdR DC.CVAUn DSB.ISH IC.IVAUn DSB.ISH ISB Fre
{
0:X1=x; 0:X3=y;
1:X0=y; 1:X2=x;
}
 P0          | P1                  ;
 MOV W0,#1   | LDR W1,[X0]         ;
 STR W0,[X1] | DC CVAU,X2          ;
 DMB ISH     | DSB ISH             ;
 MOV W2,#1   | IC IVAU,X2          ;
 STR W2,[X3] | DSB ISH             ;
             | ISB                 ;
             | EOR W3,W1,W1        ;
             | LDR W4,[X2,W3,SXTW] ;
exists (1:X1=1 /\ 1:X4=0)

But I would expect:

AArch64 A
"DMB.ISHdWW Rfe DpAddrdR DC.CVAUn DSB.ISH IC.IVAUn DSB.ISH ISB Fre"
Generator=diyone7 (version 7.56+03)
Com=Rf Fr
Orig=DMB.ISHdWW Rfe DpAddrdR DC.CVAUn DSB.ISH IC.IVAUn DSB.ISH ISB Fre
{
0:X1=x; 0:X3=y;
1:X0=y; 1:X2=x;
}
 P0          | P1                  ;
 MOV W0,#1   | LDR W1,[X0]         ;
             | EOR W3,W1,W1        ;
             | ADD X5,X2,W3,SXTW   ;
 STR W0,[X1] | DC CVAU,X5          ;
 DMB ISH     | DSB ISH             ;
 MOV W2,#1   | IC IVAU,X5          ;
 STR W2,[X3] | DSB ISH             ;
             | ISB                 ;
             | LDR W4,[X5] ;
exists (1:X1=1 /\ 1:X4=0)

And if it won't be too hard, I think for ifetch we should get this test:

diyone7 -moreedges true -arch AArch64 -variant self "DMB.ISHdWW Rfe DpAddrdR DC.CVAUn DSB.ISH IC.IVAUn DSB.ISH ISB Ifre"
AArch64 A
"DMB.ISHdWW Rfe DpAddrdR DC.CVAUn DSB.ISH IC.IVAUn DSB.ISH ISB Ifre"
Generator=diyone7 (version 7.56+03)
Com=Rf Ifr
Orig=DMB.ISHdWW Rfe DpAddrdR DC.CVAUn DSB.ISH IC.IVAUn DSB.ISH ISB Ifre
{
0:X0=NOP; 0:X1=P1:Lself00; 0:X3=x;
1:X0=x; 1:X2=P1:Lself00;
}
 P0          | P1          ;
 STR W0,[X1] | LDR W1,[X0] ;
             | EOR W4,W1,W1 ;
             | ADD X3,X2,W4,SXTW ;
 DMB ISH     | DC CVAU,X3  ;
 MOV W2,#1   | DSB ISH     ;
 STR W2,[X3] | IC IVAU,X3  ;
             | DSB ISH     ;
             | ISB         ;
             | Lself00:    ;
             | B L00       ;
             | MOV W3,#2   ;
             | B L01       ;
             | L00:        ;
             | MOV W3,#1   ;
             | L01:        ;
exists (1:X1=1 /\ 1:X3=1)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Nikos, I have implemented this change now for the normal case without ifetch, however the case with ifetch is a littl more complicated so for now I am throwing an error. Let me know if it is now doing what you expect it to.

I will now also rebase the patch on top of master

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR has now been rebased. let me know if there is anything else you would like me to do for this

This change implements the ability to generate DC CVAU and IC IVAU
instructions as a fence instruction. This allows the user to implement
these instructions with more fine-grain control into litmus tests.

The relaxations for DC CVAU and IC IVAU are:

DC.CVAU<n/p>
IC.IVAU<n/p>

where n and p allow the user to choose whether the DC/IC instruction
will use the previous location or the next location.

Examples that couldn't be generated previously:

$> diyone7 -moreedges true -arch AArch64 -variant self "PodWW DC.CVAUp DMB.ISH Rfe PodRR DSB.ISH IC.IVAUn DSB.ISH ISB FrePI"
AArch64 A
"PodWWIP DC.CVAUp DMB.ISH Rfe PodRR DSB.ISH IC.IVAUn DSB.ISH ISB FrePI"
Generator=diyone7 (version 7.56+03)
Com=Rf Fr
Orig=PodWWIP DC.CVAUp DMB.ISH Rfe PodRR DSB.ISH IC.IVAUn DSB.ISH ISB FrePI
{
0:X0=instr:"NOP"; 0:X1=P1:Lself00; 0:X3=x;
1:X0=x; 1:X2=P1:Lself00;
}
 P0          | P1          ;
 STR W0,[X1] | LDR W1,[X0] ;
 DC CVAU,X1  | DSB ISH     ;
 DMB ISH     | IC IVAU,X2  ;
 MOV W2,herd#1   | DSB ISH     ;
 STR W2,[X3] | ISB         ;
             | Lself00:    ;
             | B .+12      ;
             | MOV W3,herd#2   ;
             | B .+8       ;
             | MOV W3,herd#1   ;
exists (1:X1=1 /\ 1:X3=1)

$> diyone7 -moreedges true -arch AArch64 -variant self "DMB.ISHdWW Rfe DpCtrlIsbdR DC.CVAUn DSB.ISH IC.IVAUn DSB.ISH ISB FrePI"
AArch64 A
"DMB.ISHdWWIP Rfe DpCtrlIsbdR DC.CVAUn DSB.ISH IC.IVAUn DSB.ISH ISB FrePI"
Generator=diyone7 (version 7.56+03)
Com=Rf Fr
Orig=DMB.ISHdWWIP Rfe DpCtrlIsbdR DC.CVAUn DSB.ISH IC.IVAUn DSB.ISH ISB FrePI
{
0:X0=instr:"NOP"; 0:X1=P1:Lself00; 0:X3=x;
1:X0=x; 1:X2=P1:Lself00;
}
 P0          | P1           ;
 STR W0,[X1] | LDR W1,[X0]  ;
 DMB ISH     | CBNZ W1,LC00 ;
 MOV W2,herd#1   | LC00:        ;
 STR W2,[X3] | ISB          ;
             | DC CVAU,X2   ;
             | DSB ISH      ;
             | IC IVAU,X2   ;
             | DSB ISH      ;
             | ISB          ;
             | Lself00:     ;
             | B .+12       ;
             | MOV W3,herd#2    ;
             | B .+8        ;
             | MOV W3,herd#1    ;
exists (1:X1=1 /\ 1:X3=1)
@relokin
Copy link
Member

relokin commented Feb 1, 2024

Thanks @hugookeeffe. This PR LGTM and I will merge tomorrow at the end of the day unless anyone objects or needs more time to review this.

@relokin relokin merged commit b8ad302 into herd:master Feb 2, 2024
@relokin
Copy link
Member

relokin commented Feb 2, 2024

Merged into master, thanks @hugookeeffe!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants