Fixes for Neon and SVE #864

murzinv · 2024-05-17T12:48:09Z

While looking at issue #860 I noticed few other issues with Neon, so bundled all fixes together.

murzinv · 2024-05-17T16:04:35Z

herd/AArch64Sem.ml

-      let sxt_op sz =
-        match sz with
-        | MachSize.Quad when not morello  -> M.unitT
-        | _ ->
-           M.op1 (Op.Sxt sz)
-      and uxt_op sz =
-        match sz with
-        | MachSize.Quad when not morello  -> M.unitT
-        | _ ->
-           M.op1 (Op.Mask sz)
+      let sxt_op sz = M.op1 (Op.Sxt sz)
+      and uxt_op sz = M.op1 (Op.Mask sz)
      let sxtw_op = sxt_op MachSize.Word
      and uxtw_op = uxt_op MachSize.Word


Ok, I see that it broke other tests. Of cause I can do fix local to write_reg_neon_sz but I'm afraid that we can be bitten by that special casing somewhere else. That makes me wonder why we have special case for MachiSize.Quad in the first place?

That's strange I would have assumed that this was some optimisation. I'll check.

maranget · 2024-05-23T17:24:09Z

Hi @murzin. The problem originates in considering that masking symbolic addresses with a 64 bits mask is illegal. The "optimisation" you have removed was hiding this. Allowing such neutral masking looks more robust than hiding the problem under the carpet. I push a fix.

maranget · 2024-05-23T17:30:25Z

By the way, have you thought about implementing vector values as arbitrary length integers from the Zarith library, as we do for ASL bitfields?

murzinv · 2024-05-28T09:18:42Z

Hi @murzin. The problem originates in considering that masking symbolic addresses with a 64 bits mask is illegal. The "optimisation" you have removed was hiding this. Allowing such neutral masking looks more robust than hiding the problem under the carpet. I push a fix.

Thanks for having a look into it @maranget ! Please, let me know when fix is available, so I can re-base on top or include it into the series, whatever you prefer 😉

murzinv · 2024-05-28T09:24:56Z

By the way, have you thought about implementing vector values as arbitrary length integers from the Zarith library, as we do for ASL bitfields?

Never thought about that to be honest. Primary because 128-bit value already supported by herd and was enough for me, secondary because I was not aware of Zarith library. If there are advantages in moving to Zarith over exiting 128-bit support it would definitely motivate me having a close look 😉

maranget · 2024-05-28T17:56:10Z

It's only a suggestion. One advantage would be the ability to have configurable vector lengths.

maranget · 2024-05-28T17:57:53Z

Thanks for having a look into it @maranget ! Please, let me know when fix is available, so I can re-base on top or include it into the series, whatever you prefer 😉

The quite simple fix is in the last commit I have pushed into your branch. Feel free to adopt or re-write it :)

murzinv · 2024-05-29T08:24:15Z

Thanks for having a look into it @maranget ! Please, let me know when fix is available, so I can re-base on top or include it into the series, whatever you prefer 😉

The quite simple fix is in the last commit I have pushed into your branch. Feel free to adopt or re-write it :)

Great thanks! I've just put it in front of the series since it is prerequisite for the rest of the fixes.

This amount to assuming that those addresses are 64bit wide (or less...).

maranget · 2024-06-03T11:05:14Z

herd/tests/instructions/AArch64.neon/V67.litmus.expected

@@ -0,0 +1,10 @@
+Test 67 Required


Suggested change

Test 67 Required

Test V67 Required

maranget · 2024-06-03T12:39:28Z

Hi @murzinv. LGTM. Ready for merge as soon as the typo in test V67 is fixed.

Commits 93ee43a ("[all] Update the LDRS[BH] instructions.") and b4aefa6 ("[all,aarch64] Exhaustive implementation of LD<OP>") added special case for `MachSize.Quad` in `uxt_op` and `sxt_op`. Unfortunately, that breaks some operations for vector instructions. Consider the test ``` AArch64 T { uint8_t x[16]; 0 : X0 = x; } P0; MOVI V0.16B, herd#1 ; (* V0 = {1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1} *) MOVI V1.16B, herd#2 ; (* V1 = {2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2} *) EOR V2.8B, V1.8B, V0.8B; (* V2 = {3,3,3,3,3,3,3,3,0,0,0,0,0,0,0,0} *) ST1{V2.16B}, [X0]; locations[x;] ``` When run on hardware it produces ``` 10000 :>x={3,3,3,3,3,3,3,3,0,0,0,0,0,0,0,0}; ``` howevere when tun with `herd` it prooduces ``` States 1 x={3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3}; ``` The reason is that `EOR` instruction is modeled by `simd_op` as ``` ... | AArch64.EOR -> fun (v1,v2) -> M.op Op.Xor v1 v2 ... end >>= fun v -> write_reg_neon_sz sz r1 v ii ``` In other words it exclusive-or 128 bit input values and offload destination how much should be written to the output register to `write_reg_neon_sz`, which in turn would perform zero extension as ``` uxt_op sz v >>= fun v -> ``` The problem is that `8B` register size is `MachSize.Quad` which means that it falls under special case which would not apply mask so full 128-bit value gets written to the output register. Let's restore intended behavior by removing special case on `Machsize.Quad` Suggested-by: Nikos Nikoleris <nikos.nikoleris@arm.com> Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>

It was reported in herd#860 that SVE vector ADD instruction does not operate element-wise. It tuns out that Neon vector ADD instruction has the same issue. Fix both by operating element-wise. Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>

maranget · 2024-06-03T14:08:04Z

Merged, thanks @murzinv.

murzinv commented May 17, 2024

View reviewed changes

murzinv force-pushed the fix-neon-sve branch from 7d4cc31 to c16bdfe Compare May 29, 2024 08:22

[herd] Allow "Quad" masks on symbolic addresses

3d6cc27

This amount to assuming that those addresses are 64bit wide (or less...).

murzinv force-pushed the fix-neon-sve branch from c16bdfe to cd8c6ff Compare June 3, 2024 10:37

maranget reviewed Jun 3, 2024

View reviewed changes

Vladimir Murzin added 2 commits June 3, 2024 14:42

murzinv force-pushed the fix-neon-sve branch from cd8c6ff to 10aefe7 Compare June 3, 2024 13:42

maranget merged commit 454d306 into herd:master Jun 3, 2024
3 checks passed

murzinv deleted the fix-neon-sve branch June 3, 2024 14:25

murzinv mentioned this pull request Jun 3, 2024

[herd] SVE Vector ADD instruction does not operate element-wise #860

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes for Neon and SVE #864

Fixes for Neon and SVE #864

murzinv commented May 17, 2024

murzinv May 17, 2024

maranget May 23, 2024

maranget commented May 23, 2024

maranget commented May 23, 2024

murzinv commented May 28, 2024

murzinv commented May 28, 2024 •

edited

Loading

maranget commented May 28, 2024

maranget commented May 28, 2024

murzinv commented May 29, 2024

maranget Jun 3, 2024

murzinv Jun 3, 2024

maranget commented Jun 3, 2024

maranget commented Jun 3, 2024

Fixes for Neon and SVE #864

Fixes for Neon and SVE #864

Conversation

murzinv commented May 17, 2024

murzinv May 17, 2024

Choose a reason for hiding this comment

maranget May 23, 2024

Choose a reason for hiding this comment

maranget commented May 23, 2024

maranget commented May 23, 2024

murzinv commented May 28, 2024

murzinv commented May 28, 2024 • edited Loading

maranget commented May 28, 2024

maranget commented May 28, 2024

murzinv commented May 29, 2024

maranget Jun 3, 2024

Choose a reason for hiding this comment

murzinv Jun 3, 2024

Choose a reason for hiding this comment

maranget commented Jun 3, 2024

maranget commented Jun 3, 2024

murzinv commented May 28, 2024 •

edited

Loading