-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes for Neon and SVE #864
Conversation
let sxt_op sz = | ||
match sz with | ||
| MachSize.Quad when not morello -> M.unitT | ||
| _ -> | ||
M.op1 (Op.Sxt sz) | ||
and uxt_op sz = | ||
match sz with | ||
| MachSize.Quad when not morello -> M.unitT | ||
| _ -> | ||
M.op1 (Op.Mask sz) | ||
let sxt_op sz = M.op1 (Op.Sxt sz) | ||
and uxt_op sz = M.op1 (Op.Mask sz) | ||
let sxtw_op = sxt_op MachSize.Word | ||
and uxtw_op = uxt_op MachSize.Word |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I see that it broke other tests. Of cause I can do fix local to write_reg_neon_sz
but I'm afraid that we can be bitten by that special casing somewhere else. That makes me wonder why we have special case for MachiSize.Quad
in the first place?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's strange I would have assumed that this was some optimisation. I'll check.
Hi @murzin. The problem originates in considering that masking symbolic addresses with a 64 bits mask is illegal. The "optimisation" you have removed was hiding this. Allowing such neutral masking looks more robust than hiding the problem under the carpet. I push a fix. |
By the way, have you thought about implementing vector values as arbitrary length integers from the Zarith library, as we do for ASL bitfields? |
Thanks for having a look into it @maranget ! Please, let me know when fix is available, so I can re-base on top or include it into the series, whatever you prefer 😉 |
Never thought about that to be honest. Primary because 128-bit value already supported by |
It's only a suggestion. One advantage would be the ability to have configurable vector lengths. |
The quite simple fix is in the last commit I have pushed into your branch. Feel free to adopt or re-write it :) |
Great thanks! I've just put it in front of the series since it is prerequisite for the rest of the fixes. |
This amount to assuming that those addresses are 64bit wide (or less...).
@@ -0,0 +1,10 @@ | |||
Test 67 Required |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test 67 Required | |
Test V67 Required |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Hi @murzinv. LGTM. Ready for merge as soon as the typo in test |
Commits 93ee43a ("[all] Update the LDRS[BH] instructions.") and b4aefa6 ("[all,aarch64] Exhaustive implementation of LD<OP>") added special case for `MachSize.Quad` in `uxt_op` and `sxt_op`. Unfortunately, that breaks some operations for vector instructions. Consider the test ``` AArch64 T { uint8_t x[16]; 0 : X0 = x; } P0; MOVI V0.16B, herd#1 ; (* V0 = {1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1} *) MOVI V1.16B, herd#2 ; (* V1 = {2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2} *) EOR V2.8B, V1.8B, V0.8B; (* V2 = {3,3,3,3,3,3,3,3,0,0,0,0,0,0,0,0} *) ST1{V2.16B}, [X0]; locations[x;] ``` When run on hardware it produces ``` 10000 :>x={3,3,3,3,3,3,3,3,0,0,0,0,0,0,0,0}; ``` howevere when tun with `herd` it prooduces ``` States 1 x={3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3}; ``` The reason is that `EOR` instruction is modeled by `simd_op` as ``` ... | AArch64.EOR -> fun (v1,v2) -> M.op Op.Xor v1 v2 ... end >>= fun v -> write_reg_neon_sz sz r1 v ii ``` In other words it exclusive-or 128 bit input values and offload destination how much should be written to the output register to `write_reg_neon_sz`, which in turn would perform zero extension as ``` uxt_op sz v >>= fun v -> ``` The problem is that `8B` register size is `MachSize.Quad` which means that it falls under special case which would not apply mask so full 128-bit value gets written to the output register. Let's restore intended behavior by removing special case on `Machsize.Quad` Suggested-by: Nikos Nikoleris <nikos.nikoleris@arm.com> Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>
It was reported in herd#860 that SVE vector ADD instruction does not operate element-wise. It tuns out that Neon vector ADD instruction has the same issue. Fix both by operating element-wise. Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>
Merged, thanks @murzinv. |
While looking at issue #860 I noticed few other issues with Neon, so bundled all fixes together.