Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Techempower Platform Plaintext Hangs on Every Request using LLVM JIT #14143

Closed
brianrob opened this issue Apr 19, 2019 · 23 comments · Fixed by #14292
Assignees

Comments

@brianrob
Copy link

@brianrob brianrob commented Apr 19, 2019

When running the .NET techempower platform plaintext benchmark, every request hangs when running using the LLVM JIT. Things work properly using the built-in JIT.

Steps to Reproduce

  1. Build and run a container using this dockerfile: https://gist.github.com/brianrob/ba8214e89aef5406eaa8746ed83bf5f0#file-dockerfile. This will start the ASP.NET service.
  2. Exec into the container and run the following command to trigger a request:
curl --header "Accept: text/plain,text/html;q=0.9,application/xhtml+xml;q=0.9,application/xml;q=0.8,*/*;q=0.7" --header "Connection: keep-alive" http://localhost:8080/plaintext

Current Behavior

The HTTP request initiated by curl hangs.

Expected Behavior

The HTTP request initiated by curl succeeds and returns "Hello, World!"

On which platforms did you notice this

[ ] macOS
[ X ] Linux
[ ] Windows

Version Used:

Mono JIT compiler version 6.3.0.202 (tarball Fri Apr 12 18:28:05 UTC 2019)
Copyright (C) 2002-2014 Novell, Inc, Xamarin Inc and Contributors. www.mono-project.com
        TLS:           __thread
        SIGSEGV:       altstack
        Notifications: epoll
        Architecture:  amd64
        Disabled:      none
        Misc:          softdebug
        Interpreter:   yes
        LLVM:          yes(610)
        Suspend:       hybrid
        GC:            sgen (concurrent by default)

Stacktrace

I've captured crash logs during the hang by sending a SIGABRT signal to the process running with LLVM JIT. I've also captured crash logs after a successful request when using the built-in JIT:

LLVM-JIT
BuiltIn-JIT

This doesn’t look like a classic deadlock in that there are a bunch of threads backed up behind a single lock and one doing a bunch of work, or attempting to take another lock while holding the original lock. In comparing the differences, I noticed that there are the same number of threads in both cases, and that all but one of the threads are stopped at the same location. The difference is that in the good case, the newest thread is a parked worker thread, waiting for more work:

Thread 14 (Thread 0x7f087dc4a700 (LWP 248)):
#0  0x00007f0886c5a720 in futex_abstimed_wait_cancelable (private=0, abstime=0x7f087dc49d90, expected=0, futex_word=0x55d93d20f248 <worker+72>) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
#1  do_futex_wait (sem=sem@entry=0x55d93d20f248 <worker+72>, abstime=abstime@entry=0x7f087dc49d90) at sem_waitcommon.c:111
#2  0x00007f0886c5a7ef in __new_sem_wait_slow (sem=0x55d93d20f248 <worker+72>, abstime=0x7f087dc49d90) at sem_waitcommon.c:181
#3  0x00007f0886c5a8a2 in sem_timedwait (sem=sem@entry=0x55d93d20f248 <worker+72>, abstime=abstime@entry=0x7f087dc49d90) at sem_timedwait.c:36
#4  0x000055d93ce3f342 in mono_os_sem_timedwait (flags=MONO_SEM_FLAGS_ALERTABLE, timeout_ms=<optimized out>, sem=0x55d93d20f248 <worker+72>) at ../../mono/utils/mono-os-semaphore.h:252
#5  mono_coop_sem_timedwait (sem=0x55d93d20f248 <worker+72>, flags=MONO_SEM_FLAGS_ALERTABLE, timeout_ms=<optimized out>) at ../../mono/utils/mono-coop-semaphore.h:55
#6  worker_park () at threadpool-worker-default.c:391
#7  worker_thread (unused=unused@entry=0x0) at threadpool-worker-default.c:490
#8  0x000055d93cddbe31 in start_wrapper_internal (stack_ptr=<optimized out>, start_info=0x0) at threads.c:1220
#9  start_wrapper (data=0x7f0868158a10) at threads.c:1293
#10 0x00007f0886c524a4 in start_thread (arg=0x7f087dc4a700) at pthread_create.c:456
#11 0x00007f08860f7d0f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:9

In the bad case, the new thread is a second timer thread:

Thread 14 (Thread 0x7f09e35db700 (LWP 35)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225
#1  0x000056426279c8ab in mono_os_cond_timedwait (cond=cond@entry=0x564262b23180 <sleep_cond>, mutex=mutex@entry=0x564262b231c0 <sleep_mutex>, timeout_ms=timeout_ms@entry=500) at mono-os-mutex.c:75
#2  0x00005642627a7522 in mono_coop_cond_timedwait (cond=0x564262b23180 <sleep_cond>, mutex=0x564262b231c0 <sleep_mutex>, timeout_ms=500) at ../../mono/utils/mono-coop-mutex.h:100
#3  sleep_interruptable (alerted=0x7f09e35dadb4, ms=500) at mono-threads.c:1619
#4  mono_thread_info_sleep (ms=ms@entry=500, alerted=alerted@entry=0x7f09e35dadb4) at mono-threads.c:1651
#5  0x0000564262744e42 in monitor_thread (unused=unused@entry=0x0) at threadpool-worker-default.c:708
#6  0x00005642626e0e31 in start_wrapper_internal (stack_ptr=<optimized out>, start_info=0x0) at threads.c:1220
#7  start_wrapper (data=0x7f09dc0b5ce0) at threads.c:1293
#8  0x00007f09f09d14a4 in start_thread (arg=0x7f09e35db700) at pthread_create.c:456
#9  0x00007f09efe76d0f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:9

The first timer thread appears to be blocked inside of a managed WaitHandle.

Thread 7 (Thread 0x7f09e3dff700 (LWP 14)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225
#1  0x000056426279c8ab in mono_os_cond_timedwait (cond=cond@entry=0x564263ab00f0, mutex=mutex@entry=0x564263ab00c8, timeout_ms=timeout_ms@entry=999) at mono-os-mutex.c:75
#2  0x00005642626f9de5 in mono_coop_cond_timedwait (timeout_ms=999, mutex=0x564263ab00c8, cond=0x564263ab00f0) at ../../mono/utils/mono-coop-mutex.h:100
#3  mono_w32handle_timedwait_signal_naked (alerted=0x7f09e3dfe060, poll=0, timeout=999, mutex=0x564263ab00c8, cond=0x564263ab00f0) at w32handle.c:650
#4  mono_w32handle_timedwait_signal_handle (handle_data=0x564263ab00b8, timeout=999, alerted=alerted@entry=0x7f09e3dfe060, poll=0) at w32handle.c:765
#5  0x00005642626fb497 in mono_w32handle_wait_one (handle=<optimized out>, timeout=timeout@entry=999, alertable=alertable@entry=1) at w32handle.c:883
#6  0x00005642626fbb17 in mono_w32handle_wait_multiple (handles=handles@entry=0x7f09e3dfe878, nhandles=nhandles@entry=1, waitall=waitall@entry=0, timeout=timeout@entry=999, alertable=alertable@entry=1, error=error@entry=0x7f09e3dfe7c0) at w32handle.c:982
#7  0x00005642626dd7fb in ves_icall_System_Threading_WaitHandle_Wait_internal (handles=handles@entry=0x7f09e3dfe878, numhandles=numhandles@entry=1, waitall=waitall@entry=0 '\000', timeout=timeout@entry=999, error=error@entry=0x7f09e3dfe7c0) at threads.c:2242
#8  0x00005642626745e7 in ves_icall_System_Threading_WaitHandle_Wait_internal_raw (a0=0x7f09e3dfe878, a1=1, a2=<optimized out>, a3=999, error=0x7f09e3dfe7c0) at ../../mono/metadata/icall-def.h:1263
#9  0x0000000041e3eb4f in ?? ()
#10 0x0000000000000000 in ?? ()

I tried to get the managed stack further below, but gdb couldn’t go any deeper than this.

From some very naïve review of the code and looking at:

monitor_ensure_running (void)

I would have only expected there to be one timer thread. This doesn’t feel like anywhere near a smoking gun, but more of a first glance at a noticed difference. I'm wondering if this is worth pursuing.

@brianrob

This comment has been minimized.

Copy link
Author

@brianrob brianrob commented Apr 19, 2019

@alexanderkyte

This comment has been minimized.

Copy link
Member

@alexanderkyte alexanderkyte commented Apr 19, 2019

So @lambdageek made a good observation that this type of issue can be caused by linking two copies of a library in, each with different static addresses for locks and other singletons.

Given @directhex's recent fixing of all of that, it makes me think that it's worth ruling that out first.

@brianrob

This comment has been minimized.

Copy link
Author

@brianrob brianrob commented Apr 19, 2019

Great. Is there a specific build I should try that contains this fix?

@lambdageek

This comment has been minimized.

Copy link
Member

@lambdageek lambdageek commented Apr 19, 2019

So @lambdageek made a good observation that this type of issue can be caused by linking two copies of a library in, each with different static addresses for locks and other singletons.

To clarify: we've seen (historically, not recently) that in some configurations we will occasionally get into a situation where a mono binary will load a shared library (like libmono-llvm.0.so) which will in turn (incorrectly) load libmonosgen-2.0.so.

So we end up with multiple copies of Mono symbols in a single process (one copy from the mono binary, the other copy from the shared library). At that point really bizarre things start to happen like locks being broken in impossible ways (because different symbols end up seeing different addresses for various global locks).

Given that we see two monitor threads - which shouldn't happen - this might be one of those instances.

@brianrob

This comment has been minimized.

Copy link
Author

@brianrob brianrob commented Apr 19, 2019

Here is the list of loaded modules when running info sharedlibrary in gdb. I don't see libmonosgen-2.0.so in either. The only difference I see is that mscorlib.dll.so is loaded in the built-in JIT case but not in the LLVM case.

LLVM:

From                To                  Syms Read   Shared Object Library
0x00007faa140db1c0  0x00007faa140ebafe  Yes (*)     /lib/x86_64-linux-gnu/libz.so.1
0x00007faa13ed30e0  0x00007faa13ed5ecf  Yes         /lib/x86_64-linux-gnu/librt.so.1
0x00007faa13ccdd80  0x00007faa13cce94e  Yes         /lib/x86_64-linux-gnu/libdl.so.2
0x00007faa13ab5ab0  0x00007faa13ac2831  Yes         /lib/x86_64-linux-gnu/libpthread.so.0
0x00007faa137b97e0  0x00007faa13861d49  Yes (*)     /usr/lib/x86_64-linux-gnu/libstdc++.so.6
0x00007faa1342f680  0x00007faa1349b8da  Yes         /lib/x86_64-linux-gnu/libm.so.6
0x00007faa13215a90  0x00007faa13225895  Yes (*)     /lib/x86_64-linux-gnu/libgcc_s.so.1
0x00007faa12e93940  0x00007faa12fbd613  Yes         /lib/x86_64-linux-gnu/libc.so.6
0x00007faa142f3aa0  0x00007faa1430efb0  Yes         /lib64/ld-linux-x86-64.so.2
0x00007faa12c6d2b0  0x00007faa12c71745  Yes         /lib/x86_64-linux-gnu/libnss_compat.so.2
0x00007faa12a57fd0  0x00007faa12a63271  Yes         /lib/x86_64-linux-gnu/libnsl.so.1
0x00007faa1284a0e0  0x00007faa128503f7  Yes         /lib/x86_64-linux-gnu/libnss_nis.so.2
0x00007faa126381d0  0x00007faa1263de91  Yes         /lib/x86_64-linux-gnu/libnss_files.so.2
0x00007faa10fb43b0  0x00007faa11d6b30f  Yes         /usr/lib/libmono-llvm.so
0x00007faa0b340570  0x00007faa0b389dc5  Yes         /usr/lib/../lib/libmono-native.so

Built-in JIT:

0x00007fb4268f31c0  0x00007fb426903afe  Yes (*)     /lib/x86_64-linux-gnu/libz.so.1
0x00007fb4266eb0e0  0x00007fb4266edecf  Yes         /lib/x86_64-linux-gnu/librt.so.1
0x00007fb4264e5d80  0x00007fb4264e694e  Yes         /lib/x86_64-linux-gnu/libdl.so.2
0x00007fb4262cdab0  0x00007fb4262da831  Yes         /lib/x86_64-linux-gnu/libpthread.so.0
0x00007fb425fd17e0  0x00007fb426079d49  Yes (*)     /usr/lib/x86_64-linux-gnu/libstdc++.so.6
0x00007fb425c47680  0x00007fb425cb38da  Yes         /lib/x86_64-linux-gnu/libm.so.6
0x00007fb425a2da90  0x00007fb425a3d895  Yes (*)     /lib/x86_64-linux-gnu/libgcc_s.so.1
0x00007fb4256ab940  0x00007fb4257d5613  Yes         /lib/x86_64-linux-gnu/libc.so.6
0x00007fb426b0baa0  0x00007fb426b26fb0  Yes         /lib64/ld-linux-x86-64.so.2
0x00007fb4254852b0  0x00007fb425489745  Yes         /lib/x86_64-linux-gnu/libnss_compat.so.2
0x00007fb42526ffd0  0x00007fb42527b271  Yes         /lib/x86_64-linux-gnu/libnsl.so.1
0x00007fb4250620e0  0x00007fb4250683f7  Yes         /lib/x86_64-linux-gnu/libnss_nis.so.2
0x00007fb424e501d0  0x00007fb424e55e91  Yes         /lib/x86_64-linux-gnu/libnss_files.so.2
0x00007fb41ef1f470  0x00007fb41f48a8d0  Yes         /usr/lib/mono/aot-cache/amd64/mscorlib.dll.so
0x00007fb41ea62570  0x00007fb41eaabdc5  Yes (*)     /usr/lib/../lib/libmono-native.so
@lambdageek

This comment has been minimized.

Copy link
Member

@lambdageek lambdageek commented Apr 19, 2019

In the bad case, the new thread is a second timer thread:
...
monitor_ensure_running
...
I would have only expected there to be one timer thread.

I don't think this is the right reading of what Threads 14 and 7 are doing in the bad case. I don't know why both of them are named "Timer-Scheduler" (that's unexpected)

  • Thread 14 is a threadpool monitor_thread. It's an implementation detail of the threadpool. (It's responsible for spinning up new worker threads if there when there is demand). Indeed there should be at most 1 of these running per process.
  • Thread 7 is just a normal managed thread (probably a Timer-Scheduler like it says on the thread name) calling System.Threading.Monitor:Wait. I think there should be 1 of these per appdomain.

I don't know why Thread 14 also has the name "Timer-Scheduler" - that's definitely not what it is. (Maybe an appdomain shutdown and then the OS reused the pthread and didn't clear the thread name? Can that happen?)


@brianrob questions:

  1. Is ASP.NET using multiple appdomains? I don't know if our LLVM JIT has had testing with multiple domains (our AOT compiler doesn't support multiple appdomains, and the LLVM backend is mostly focused on AOT).
  2. Let's try to eliminate another broad possibility: Could you try setting the environment variable MONO_THREADS_SUSPEND=preemptive and running with the LLVM JIT. Does it still hang?
@steveisok

This comment has been minimized.

Copy link
Contributor

@steveisok steveisok commented Apr 20, 2019

@lambdageek - As far as I know ASP.NET will always create a child domain to run the web application.

It would be possible to get it to run in one domain if mono supported tweaking the primary domain before it loads. I was trying to do that a while back and it doesn't seem that's supported.

@brianrob

This comment has been minimized.

Copy link
Author

@brianrob brianrob commented Apr 22, 2019

ASP.NET is not using multiple appdomains. @steveisok, you are likely referring to the desktop implementation of ASP.NET. In this case, we are actually running ASP.NET Core which runs on .NET Core which does not support multiple appdomains.

I also tried running with MONO_THREADS_SUSPEND=preemptive but the requests still hung. I am starting to suspect something in the realm of codegen as the request needs to be very specifically crafted to include both of the headers that TechEmpower requires us to send or the hang won't occur:

This request hangs:

curl --header "Accept: text/plain,text/html;q=0.9,application/xhtml+xml;q=0.9,application/xml;q=0.8,*/*;q=0.7" --header "Connection: keep-alive" http://localhost:8080/plaintext

This request succeeds:

curl --header http://localhost:8080/plaintext
@alexanderkyte

This comment has been minimized.

Copy link
Member

@alexanderkyte alexanderkyte commented Apr 23, 2019

@brianrob could you please upload the MONO_LOG_LEVEL=verbose output for this? I think that there's probably a series of JIT compilations that corrupts some state.

@brianrob

This comment has been minimized.

Copy link
Author

@brianrob brianrob commented Apr 23, 2019

Sure. Here are logs for the LLVM-JIT and mini-JIT cases.

LLVM
Mini-JIT

For the LLVM case, I started the application, and then hit it with a single request. I waited a few seconds for things to be hung and for the log to go as quiescent as it was going to, and then captured the log.

For the mini-JIT case, I started the application and hit it with a single request. After the request succeeded, I captured the log.

@lambdageek

This comment has been minimized.

Copy link
Member

@lambdageek lambdageek commented Apr 23, 2019

Hm. Don't know what it means yet, but one difference is that in the "Mini-JIT" verbose log, I see some

Mono: io threadpool: wke
Mono: io threadpool: mod fd   8, operations = RD | WR | ...
Mono: io threadpool: wai
Mono: io threadpool: cal fd   8, events = .. | WR | ...

so (I think) workers are waking up due to some write events happening, but in the "LLVM" verbose log there are no writes, only reads. not sure if that's just timing related or a symptom of something real.

@alexanderkyte

This comment has been minimized.

Copy link
Member

@alexanderkyte alexanderkyte commented Apr 23, 2019

I'm looking into this right now.

@brianrob

This comment has been minimized.

Copy link
Author

@brianrob brianrob commented Apr 23, 2019

Awesome. Thanks!

@alexanderkyte

This comment has been minimized.

Copy link
Member

@alexanderkyte alexanderkyte commented Apr 23, 2019

Worth noting is that we aren't seeing mono hang.

root@7cc3548a02f2:/app# curl --header "Accept: text/plain,text/html;q=0.9,application/xhtml+xml;q=0.9,application/xml;q=0.8,*/*;q=0.7" --header "Connection: keep-alive" http://127.0.0.1:8080/plaintext
^C
root@7cc3548a02f2:/app# curl --header "" http://127.0.0.1:8080/plaintext
Hello, World!root@7cc3548a02f2:/app# 
root@7cc3548a02f2:/app# 

That time when we aren't seeing a response, Mono is just waiting on a socket with no work to do. We seem to be failing fast.

@alexanderkyte

This comment has been minimized.

Copy link
Member

@alexanderkyte alexanderkyte commented Apr 23, 2019

So if I run in the highest verbosity mode to see when we run the JIT, warm up, and then run the crashing curl command, I don't see the JIT run at all. That means that we don't have any functions that are compiled for the failing command that aren't already JIT'ed by the successful command.

@brianrob

This comment has been minimized.

Copy link
Author

@brianrob brianrob commented Apr 24, 2019

So if I run in the highest verbosity mode to see when we run the JIT, warm up, and then run the crashing curl command, I don't see the JIT run at all. That means that we don't have any functions that are compiled for the failing command that aren't already JIT'ed by the successful command.

I'm quite surprised at this. From what I saw, there is a bunch of jitting that happens in the context of the first request, even if the request hangs (due to the presence of the headers).

@alexanderkyte

This comment has been minimized.

Copy link
Member

@alexanderkyte alexanderkyte commented Apr 24, 2019

When the server is run with --trace :


root@ab5f099a89fd:/app# curl --header "Accept: text/plain,text/html;q=0.9,application/xhtml+xml;q=0.9,application/xml;q=0.8,*/*;q=0.7" --header "Connection: keep-alive" http://localhost:8080/plaintext
Hello, World!root@ab5f099a89fd:/app# 

@alexanderkyte

This comment has been minimized.

Copy link
Member

@alexanderkyte alexanderkyte commented Apr 24, 2019

@brianrob I think there was a misunderstanding.

You JIT in response to sending the message with an empty header the first time. If you then re-send the message with the non-empty header, you don't see any JIT output when running mono with "-v -v -v -v".

@alexanderkyte

This comment has been minimized.

Copy link
Member

@alexanderkyte alexanderkyte commented Apr 24, 2019

I followed the hint that --trace disables the output, saw what --trace changes, and reran with

ENV MONO_DEBUG gen-seq-points

in the Dockerfile. I saw it work correctly there as well.

curl --header "Accept: text/plain,text/html;q=0.9,application/xhtml+xml;q=0.9,application/xml;q=0.8,*/*;q=0.7" --header "Connection: keep-alive" http://127.0.0.1:8080/plaintext
Hello, World!root@a16e08c5ae51:/app# 

This implies that the added sequence points are the reason that it works with --trace.

Sequence points insert "padding" operations between other opcodes, which may disable certain optimizations.

@alexanderkyte

This comment has been minimized.

Copy link
Member

@alexanderkyte alexanderkyte commented Apr 24, 2019

In the failing case, it looks like the PlainText entry point isn't even executed. Trying to investigate what happens when we diverge, a bit difficult when it requires no sequence points and it requires the JIT.

@alexanderkyte

This comment has been minimized.

Copy link
Member

@alexanderkyte alexanderkyte commented Apr 24, 2019

at PlatformBenchmarks.BenchmarkApplication.PlainText (System.IO.Pipelines.PipeWriter pipeWriter) [0x00000] in <683564a5ad784b6aa21a0a61275a89ff>:0 
  at PlatformBenchmarks.BenchmarkApplication.ProcessRequestAsync () [0x00000] in <683564a5ad784b6aa21a0a61275a89ff>:0 
  at PlatformBenchmarks.BenchmarkApplication.ProcessRequestsAsync () [0x00000] in <683564a5ad784b6aa21a0a61275a89ff>:0 
  at System.Runtime.CompilerServices.AsyncMethodBuilderCore+MoveNextRunner.InvokeMoveNext (System.Object stateMachine) [0x00000] in <514421d496a743db83a5c2a5e4098ad2>:0 
  at System.Threading.ExecutionContext.RunInternal (System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, System.Object state, System.Boolean preserveSyncCtx) [0x00000] in <514421d496a743db83a5c2a5e4098ad2>:0 
  at System.Threading.ExecutionContext.Run (System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, System.Object state, System.Boolean preserveSyncCtx) [0x00000] in <514421d496a743db83a5c2a5e4098ad2>:0 
  at System.Runtime.CompilerServices.AsyncMethodBuilderCore+MoveNextRunner.Run () [0x00000] in <514421d496a743db83a5c2a5e4098ad2>:0 
  at System.Runtime.CompilerServices.ValueTaskAwaiter+<>c.<.cctor>b__9_0 (System.Object state) [0x00000] in <db70987ec4f64c33bf9deb66e2bcbaba>:0 
  at System.IO.Pipelines.ThreadPoolScheduler+<>c.<Schedule>b__0_0 (System.Object s) [0x00000] in <285d42c15ab1455ca31b0389730dcae1>:0 
  at System.Threading.QueueUserWorkItemCallback.WaitCallback_Context (System.Object state) [0x00000] in <514421d496a743db83a5c2a5e4098ad2>:0 
  at System.Threading.ExecutionContext.RunInternal (System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, System.Object state, System.Boolean preserveSyncCtx) [0x00000] in <514421d496a743db83a5c2a5e4098ad2>:0 
  at System.Threading.ExecutionContext.Run (System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, System.Object state, System.Boolean preserveSyncCtx) [0x00000] in <514421d496a743db83a5c2a5e4098ad2>:0 
  at System.Threading.QueueUserWorkItemCallback.System.Threading.IThreadPoolWorkItem.ExecuteWorkItem () [0x00000] in <514421d496a743db83a5c2a5e4098ad2>:0 
  at System.Threading.ThreadPoolWorkQueue.Dispatch () [0x00000] in <514421d496a743db83a5c2a5e4098ad2>:0 
  at System.Threading._ThreadPoolWaitCallback.PerformWaitCallback () [0x00000] in <514421d496a743db83a5c2a5e4098ad2>:0 ]

The managed stack we're seeing for the header-containing request is above. If we aren't printing in PlainText when we're failing, it indicates the failure is in one of the request processing functions. If it were silently discarding the request that would explain the server staying up but the client connection simply timing out.

@alexanderkyte

This comment has been minimized.

Copy link
Member

@alexanderkyte alexanderkyte commented Apr 24, 2019

It looks like there's some issue with parsing the request. The shared state ends up corrupted, leading to the request not getting processed. Looks unrelated to the thread pool.

Evidence:

I added some printfs here:

https://gist.github.com/alexanderkyte/78a0496b89b514ba51dc783a2e530151

No sequence points (bad) with header:

(ProcessRequestsAsync )
(ProcessRequestsAsyncOne)
(ProcessRequestsAsyncOne) read completed
Parse HTTP Request
Not empty
Check startline
ParseRequestLine
ParseRequestLine consumed 194
Check headers
ParseHeaders
ParseHeaders consumed 194
Check body
End of not empty
State update
done
(ProcessRequestsAsyncOne)
(ProcessRequestsAsyncOne) read completed
Parse HTTP Request

Sequence points (good) with header

(ProcessRequestsAsyncOne) read completed
Parse HTTP Request
Not empty
Check startline
ParseRequestLine
ParseRequestLine consumed 25
Check headers
ParseHeaders
ParseHeaders consumed 194
Check body
End of not empty
State update
done
(ProcessRequestsAsyncOne) Before process async

No sequence points (bad), no header

Parse HTTP Request
Not empty
Check startline
ParseRequestLine
ParseRequestLine consumed 85
Check headers
ParseHeaders
ParseHeaders consumed 87
Check body
End of not empty
State update
done

Sequence points (good), no header

(ProcessRequestsAsyncOne) read completed
Parse HTTP Request
Not empty
Check startline
ParseRequestLine
ParseRequestLine consumed 25
Check headers
ParseHeaders
ParseHeaders consumed 87
Check body
End of not empty
State update
done
(ProcessRequestsAsyncOne) Before process async
@alexanderkyte

This comment has been minimized.

Copy link
Member

@alexanderkyte alexanderkyte commented Apr 24, 2019

 LLVM failed for 'HttpParser`1.ParseRequestLine': opcode seq_point

Looks like seq-points actually just disables LLVM for the method

alexanderkyte added a commit to alexanderkyte/mono that referenced this issue Apr 30, 2019
Fixes mono#14261
Fixes mono#14143

This was causing an issue when trying to inline
System.Numerics.Vectors.Vector`1<byte>:Equals into System.Span.IndexOf
from System.Memory.dll as provided by nuget. I believe that it is
comparing the result of a SIMD operation on two bytes there with a
vector full of zero bytes.

When we produce the Equals call (no inline), we spill the SIMD <4 x i32>
vector to the stack in the caller, and we then later load it
with the widened type used here <16 x i8>.

When we inline, we end up making the args into XMOV opcodes,
and then doing the "pcmpeqb" mini opcode with the return values
from those operations. This becomes a register to register no-op
copy, and the original <4 x i32> gets compared against the <16 x i8>
type:

```
  %116 = icmp eq <16 x i8> zeroinitializer, <4 x i32> %115
```

This does not go the way we want it to. It reliably leads to IndexOf
breaking and only finding the desired byte in the portion not processed
by the SIMD code (in the last few elements of the array).

This fix checks if the input dimensions differ, and does a conversion.
This produces the following IR:

```
  %114 = bitcast <4 x i32> %99 to <16 x i8>
  %115 = icmp eq <16 x i8> %114, %113
  %116 = sext <16 x i1> %115 to <16 x i8>
  %117 = icmp eq <16 x i8> zeroinitializer, %116
```
alexanderkyte added a commit to alexanderkyte/mono that referenced this issue Apr 30, 2019
Fixes mono#14261
Fixes mono#14143

This was causing an issue when trying to inline
System.Numerics.Vectors.Vector`1<byte>:Equals into System.Span.IndexOf
from System.Memory.dll as provided by nuget. I believe that it is
comparing the result of a SIMD operation on two SIMD registers there with a
SIMD register full of zero bytes.

When we produce the Equals call (no inline), we spill the SIMD <4 x i32>
vector to the stack in the caller, and we then later load it
with the widened type used here <16 x i8>.

When we inline, we end up making the args into XMOV opcodes,
and then doing the "pcmpeqb" mini opcode with the return values
from those operations. This becomes a register to register no-op
copy, and the original <4 x i32> gets compared against the <16 x i8>
type:

```
  %116 = icmp eq <16 x i8> zeroinitializer, <4 x i32> %115
```

This does not go the way we want it to. It reliably leads to IndexOf
breaking and only finding the desired byte in the portion not processed
by the SIMD code (in the last few elements of the array).

This fix checks if the input dimensions differ, and does a conversion.
This produces the following IR:

```
  %114 = bitcast <4 x i32> %99 to <16 x i8>
  %115 = icmp eq <16 x i8> %114, %113
  %116 = sext <16 x i1> %115 to <16 x i8>
  %117 = icmp eq <16 x i8> zeroinitializer, %116
```
alexanderkyte added a commit that referenced this issue May 2, 2019
* [llvm] Widen vector equality (OP_PCMPEQx)

Fixes #14261
Fixes #14143

This was causing an issue when trying to inline
System.Numerics.Vectors.Vector`1<byte>:Equals into System.Span.IndexOf
from System.Memory.dll as provided by nuget. I believe that it is
comparing the result of a SIMD operation on two SIMD registers there with a
SIMD register full of zero bytes.

When we produce the Equals call (no inline), we spill the SIMD <4 x i32>
vector to the stack in the caller, and we then later load it
with the widened type used here <16 x i8>.

When we inline, we end up making the args into XMOV opcodes,
and then doing the "pcmpeqb" mini opcode with the return values
from those operations. This becomes a register to register no-op
copy, and the original <4 x i32> gets compared against the <16 x i8>
type:

```
  %116 = icmp eq <16 x i8> zeroinitializer, <4 x i32> %115
```

This does not go the way we want it to. It reliably leads to IndexOf
breaking and only finding the desired byte in the portion not processed
by the SIMD code (in the last few elements of the array).

This fix checks if the input dimensions differ, and does a conversion.
This produces the following IR:

```
  %114 = bitcast <4 x i32> %99 to <16 x i8>
  %115 = icmp eq <16 x i8> %114, %113
  %116 = sext <16 x i1> %115 to <16 x i8>
  %117 = icmp eq <16 x i8> zeroinitializer, %116
```

* [llvm] Widen vector greater-than

Here we do the same trick used in the above fix, but for the parallel greater-than operation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.