Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLVM Generates Wrong IndexOf Span code #14261

Closed
alexanderkyte opened this issue Apr 29, 2019 · 7 comments · Fixed by #14292

Comments

@alexanderkyte
Copy link
Member

@alexanderkyte alexanderkyte commented Apr 29, 2019

This issue is a duplicate of #14143 . Making this issue to better track the history of this.

Some deep digging of the TechEmpower benchmark shows that the observed cause happens when we mis-parse the HTTP header. This happens because the offsets returned by the IndexOf operator to split the header by newlines doesn't work as expected. It returns the index of the last newline, at offset 193 in the crashing reproduction.

A gist with the relevant Span fixture is linked with executable instructions here: https://gist.github.com/alexanderkyte/d7bc0222cbcf7135f2513f99556ddd80

@alexanderkyte

This comment has been minimized.

Copy link
Member Author

@alexanderkyte alexanderkyte commented Apr 29, 2019

@alexanderkyte

This comment has been minimized.

Copy link
Member Author

@alexanderkyte alexanderkyte commented Apr 29, 2019

And this performs as expected when using fullAOT without llvm:

akyte@Alexanders-MacBook-Pro-3 ~/Projects/SpanCrash/SpanCrash/bin/Debug (master*) $ rm -rf mono_aot* *dylib* ; MONO_VERBOSE_METHOD="ConstStrLen" MONO_EXECUTABLE="/Users/akyte/mono_llvm_opt/mono/mini/mono-sgen --" MONO_PATH=/Users/akyte/mono_llvm_opt/mcs/class/lib/net_4_x /Users/akyte/mono_llvm_opt/runtime/mono-wrapper --aot-path=/Users/akyte/mono_llvm_opt/mcs/class/lib/net_4_x --aot=keep-temps SpanCrash.exe
Mono Ahead of Time compiler - compiling assembly /Users/akyte/Projects/SpanCrash/SpanCrash/bin/Debug/SpanCrash.exe
AOTID 6CF9A9C8-7DBA-99C9-CCED-7730AF6C7B16
Compiled: 4/4
Executing the native assembler: "clang"  -c -x assembler -o /var/folders/00/v8q_733j1hd8br7nxhxs41d80000gn/T/mono_aot_xm0HTG.o /var/folders/00/v8q_733j1hd8br7nxhxs41d80000gn/T/mono_aot_xm0HTG
Executing the native linker: clang --shared -o /Users/akyte/Projects/SpanCrash/SpanCrash/bin/Debug/SpanCrash.exe.dylib.tmp  /var/folders/00/v8q_733j1hd8br7nxhxs41d80000gn/T/mono_aot_xm0HTG.o 
Executing dsymutil: dsymutil "/Users/akyte/Projects/SpanCrash/SpanCrash/bin/Debug/SpanCrash.exe.dylib"
Retained input file.
JIT time: 5 ms, Generation time: 1 ms, Assembly+Link time: 169 ms.
akyte@Alexanders-MacBook-Pro-3 ~/Projects/SpanCrash/SpanCrash/bin/Debug (master*) $ MONO_VERBOSE_METHOD="ConstStrLen" MONO_EXECUTABLE="/Users/akyte/mono_llvm_opt/mono/mini/mono-sgen --" MONO_PATH=/Users/akyte/mono_llvm_opt/mcs/class/lib/net_4_x /Users/akyte/mono_llvm_opt/runtime/mono-wrapper --aot-path=/Users/akyte/mono_llvm_opt/mcs/class/lib/net_4_x SpanCrash.exe 
Line Index Was 24
@alexanderkyte

This comment has been minimized.

@alexanderkyte

This comment has been minimized.

Copy link
Member Author

@alexanderkyte alexanderkyte commented Apr 29, 2019

So this seems to work when using some in-tree types but not when using a proper VS4M project with the nuget.

https://gist.github.com/alexanderkyte/c41042febf7d7511fb15bb997bf1c77b

@alexanderkyte

This comment has been minimized.

Copy link
Member Author

@alexanderkyte alexanderkyte commented Apr 29, 2019

So it looks like this is caused by mono miscomputing the inline of System.Numerics.Vector:Equals<byte> into SpanHelpers.IndexOf<byte> from System.Memory

@alexanderkyte

This comment has been minimized.

Copy link
Member Author

@alexanderkyte alexanderkyte commented Apr 29, 2019

image

@alexanderkyte

This comment has been minimized.

Copy link
Member Author

@alexanderkyte alexanderkyte commented Apr 29, 2019

Disabling this single inline makes it all work, surprisingly.

alexanderkyte added a commit to alexanderkyte/mono that referenced this issue Apr 30, 2019
Fixes mono#14261
Fixes mono#14143

This was causing an issue when trying to inline
System.Numerics.Vectors.Vector`1<byte>:Equals into System.Span.IndexOf
from System.Memory.dll as provided by nuget. I believe that it is
comparing the result of a SIMD operation on two bytes there with a
vector full of zero bytes.

When we produce the Equals call (no inline), we spill the SIMD <4 x i32>
vector to the stack in the caller, and we then later load it
with the widened type used here <16 x i8>.

When we inline, we end up making the args into XMOV opcodes,
and then doing the "pcmpeqb" mini opcode with the return values
from those operations. This becomes a register to register no-op
copy, and the original <4 x i32> gets compared against the <16 x i8>
type:

```
  %116 = icmp eq <16 x i8> zeroinitializer, <4 x i32> %115
```

This does not go the way we want it to. It reliably leads to IndexOf
breaking and only finding the desired byte in the portion not processed
by the SIMD code (in the last few elements of the array).

This fix checks if the input dimensions differ, and does a conversion.
This produces the following IR:

```
  %114 = bitcast <4 x i32> %99 to <16 x i8>
  %115 = icmp eq <16 x i8> %114, %113
  %116 = sext <16 x i1> %115 to <16 x i8>
  %117 = icmp eq <16 x i8> zeroinitializer, %116
```
alexanderkyte added a commit to alexanderkyte/mono that referenced this issue Apr 30, 2019
Fixes mono#14261
Fixes mono#14143

This was causing an issue when trying to inline
System.Numerics.Vectors.Vector`1<byte>:Equals into System.Span.IndexOf
from System.Memory.dll as provided by nuget. I believe that it is
comparing the result of a SIMD operation on two SIMD registers there with a
SIMD register full of zero bytes.

When we produce the Equals call (no inline), we spill the SIMD <4 x i32>
vector to the stack in the caller, and we then later load it
with the widened type used here <16 x i8>.

When we inline, we end up making the args into XMOV opcodes,
and then doing the "pcmpeqb" mini opcode with the return values
from those operations. This becomes a register to register no-op
copy, and the original <4 x i32> gets compared against the <16 x i8>
type:

```
  %116 = icmp eq <16 x i8> zeroinitializer, <4 x i32> %115
```

This does not go the way we want it to. It reliably leads to IndexOf
breaking and only finding the desired byte in the portion not processed
by the SIMD code (in the last few elements of the array).

This fix checks if the input dimensions differ, and does a conversion.
This produces the following IR:

```
  %114 = bitcast <4 x i32> %99 to <16 x i8>
  %115 = icmp eq <16 x i8> %114, %113
  %116 = sext <16 x i1> %115 to <16 x i8>
  %117 = icmp eq <16 x i8> zeroinitializer, %116
```
alexanderkyte added a commit that referenced this issue May 2, 2019
* [llvm] Widen vector equality (OP_PCMPEQx)

Fixes #14261
Fixes #14143

This was causing an issue when trying to inline
System.Numerics.Vectors.Vector`1<byte>:Equals into System.Span.IndexOf
from System.Memory.dll as provided by nuget. I believe that it is
comparing the result of a SIMD operation on two SIMD registers there with a
SIMD register full of zero bytes.

When we produce the Equals call (no inline), we spill the SIMD <4 x i32>
vector to the stack in the caller, and we then later load it
with the widened type used here <16 x i8>.

When we inline, we end up making the args into XMOV opcodes,
and then doing the "pcmpeqb" mini opcode with the return values
from those operations. This becomes a register to register no-op
copy, and the original <4 x i32> gets compared against the <16 x i8>
type:

```
  %116 = icmp eq <16 x i8> zeroinitializer, <4 x i32> %115
```

This does not go the way we want it to. It reliably leads to IndexOf
breaking and only finding the desired byte in the portion not processed
by the SIMD code (in the last few elements of the array).

This fix checks if the input dimensions differ, and does a conversion.
This produces the following IR:

```
  %114 = bitcast <4 x i32> %99 to <16 x i8>
  %115 = icmp eq <16 x i8> %114, %113
  %116 = sext <16 x i1> %115 to <16 x i8>
  %117 = icmp eq <16 x i8> zeroinitializer, %116
```

* [llvm] Widen vector greater-than

Here we do the same trick used in the above fix, but for the parallel greater-than operation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.