Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes for Neon and SVE #864

Merged
merged 3 commits into from
Jun 3, 2024
Merged

Fixes for Neon and SVE #864

merged 3 commits into from
Jun 3, 2024

Conversation

murzinv
Copy link

@murzinv murzinv commented May 17, 2024

While looking at issue #860 I noticed few other issues with Neon, so bundled all fixes together.

Comment on lines -109 to 112
let sxt_op sz =
match sz with
| MachSize.Quad when not morello -> M.unitT
| _ ->
M.op1 (Op.Sxt sz)
and uxt_op sz =
match sz with
| MachSize.Quad when not morello -> M.unitT
| _ ->
M.op1 (Op.Mask sz)
let sxt_op sz = M.op1 (Op.Sxt sz)
and uxt_op sz = M.op1 (Op.Mask sz)
let sxtw_op = sxt_op MachSize.Word
and uxtw_op = uxt_op MachSize.Word
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I see that it broke other tests. Of cause I can do fix local to write_reg_neon_sz but I'm afraid that we can be bitten by that special casing somewhere else. That makes me wonder why we have special case for MachiSize.Quad in the first place?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's strange I would have assumed that this was some optimisation. I'll check.

@maranget
Copy link
Member

Hi @murzin. The problem originates in considering that masking symbolic addresses with a 64 bits mask is illegal. The "optimisation" you have removed was hiding this. Allowing such neutral masking looks more robust than hiding the problem under the carpet. I push a fix.

@maranget
Copy link
Member

By the way, have you thought about implementing vector values as arbitrary length integers from the Zarith library, as we do for ASL bitfields?

@murzinv
Copy link
Author

murzinv commented May 28, 2024

Hi @murzin. The problem originates in considering that masking symbolic addresses with a 64 bits mask is illegal. The "optimisation" you have removed was hiding this. Allowing such neutral masking looks more robust than hiding the problem under the carpet. I push a fix.

Thanks for having a look into it @maranget ! Please, let me know when fix is available, so I can re-base on top or include it into the series, whatever you prefer 😉

@murzinv
Copy link
Author

murzinv commented May 28, 2024

By the way, have you thought about implementing vector values as arbitrary length integers from the Zarith library, as we do for ASL bitfields?

Never thought about that to be honest. Primary because 128-bit value already supported by herd and was enough for me, secondary because I was not aware of Zarith library. If there are advantages in moving to Zarith over exiting 128-bit support it would definitely motivate me having a close look 😉

@maranget
Copy link
Member

It's only a suggestion. One advantage would be the ability to have configurable vector lengths.

@maranget
Copy link
Member

Thanks for having a look into it @maranget ! Please, let me know when fix is available, so I can re-base on top or include it into the series, whatever you prefer 😉

The quite simple fix is in the last commit I have pushed into your branch. Feel free to adopt or re-write it :)

@murzinv
Copy link
Author

murzinv commented May 29, 2024

Thanks for having a look into it @maranget ! Please, let me know when fix is available, so I can re-base on top or include it into the series, whatever you prefer 😉

The quite simple fix is in the last commit I have pushed into your branch. Feel free to adopt or re-write it :)

Great thanks! I've just put it in front of the series since it is prerequisite for the rest of the fixes.

This amount to assuming that those addresses
are 64bit wide (or less...).
@@ -0,0 +1,10 @@
Test 67 Required
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Test 67 Required
Test V67 Required

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@maranget
Copy link
Member

maranget commented Jun 3, 2024

Hi @murzinv. LGTM. Ready for merge as soon as the typo in test V67 is fixed.

Vladimir Murzin added 2 commits June 3, 2024 14:42
Commits 93ee43a ("[all] Update the LDRS[BH] instructions.") and
b4aefa6 ("[all,aarch64] Exhaustive implementation of LD<OP>")
added special case for `MachSize.Quad` in `uxt_op` and `sxt_op`.

Unfortunately, that breaks some operations for vector instructions.

Consider the test

```
AArch64 T
{
  uint8_t x[16];
  0 : X0 = x;
}
P0;
MOVI V0.16B, herd#1        ; (* V0 = {1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1} *)
MOVI V1.16B, herd#2        ; (* V1 = {2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2} *)
EOR V2.8B, V1.8B, V0.8B; (* V2 = {3,3,3,3,3,3,3,3,0,0,0,0,0,0,0,0} *)
ST1{V2.16B}, [X0];

locations[x;]
```

When run on hardware it produces

```
10000 :>x={3,3,3,3,3,3,3,3,0,0,0,0,0,0,0,0};
```

howevere when tun with `herd` it prooduces
```
States 1
x={3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3};
```

The reason is that `EOR` instruction is modeled by `simd_op` as

```
 ...
 | AArch64.EOR -> fun (v1,v2) -> M.op Op.Xor v1 v2
 ...
 end >>=
   fun v -> write_reg_neon_sz sz r1 v ii
```

In other words it exclusive-or 128 bit input values and offload
destination how much should be written to the output register to
`write_reg_neon_sz`, which in turn would perform zero extension as

```
uxt_op sz v >>= fun v ->
```

The problem is that `8B` register size is `MachSize.Quad` which means
that it falls under special case which would not apply mask so full
128-bit value gets written to the output register.

Let's restore intended behavior by removing special case on
`Machsize.Quad`

Suggested-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>
It was reported in herd#860 that SVE vector ADD instruction does not
operate element-wise. It tuns out that Neon vector ADD instruction has
the same issue. Fix both by operating element-wise.

Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>
@maranget maranget merged commit 454d306 into herd:master Jun 3, 2024
3 checks passed
@maranget
Copy link
Member

maranget commented Jun 3, 2024

Merged, thanks @murzinv.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants