Possible race condition under load #89

anjackson · 2020-10-06T20:44:17Z

I've been running some load tests on OutbackCDX, and as indicated in this comment when I run 1000 threads (all running the same GET 100 times), I start seeing odd errors:

Premature end of chunk coded message body: closing chunk expected
Socket closed

It's rock solid at 750 threads, but at 1000 it goes wonky. The same client works fine at 1000 threads when running against an NGINX instance configured to respond as if it was OutbackCDX.

So, this seems like it might be a subtle race condition under load in OutbackCDX itself?

EDIT Sorry should have mentioned that this is irrespective of the number of threads OutbackCDX is configured to use (as long as it's plenty!) and doesn't see to be related to ulimits (which manifests itself differently).

The text was updated successfully, but these errors were encountered:

anjackson · 2020-10-06T21:08:57Z

Perhaps just an exception getting swallowed, maybe here.

ato · 2020-10-07T00:54:47Z

Ah, yes that sounds like a swallowed exception. galgeek recently added some more logging to WbCdxApi to help with that.

anjackson · 2020-10-07T14:36:29Z

Yeah, I dunno what's going on. Some clumsy handling of error conditions in my client was confusing things. Having tidied that up I only get Socket closed errors:

SEVERE: GET http://localhost:9090/fc?limit=100&matchType=exact&sort=reverse&url=http%3A%2F%2Facid.matkelly.com%2F failed with error Socket closed
java.net.SocketException: Socket closed
	at java.net.SocketInputStream.socketRead0(Native Method)
	at java.net.SocketInputStream.read(SocketInputStream.java:150)
	at java.net.SocketInputStream.read(SocketInputStream.java:121)
	at org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:136)
	at org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:152)
	at org.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:270)
	at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:140)
	at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57)
	at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:260)
	at org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:161)
	at org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:153)
	at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:271)
	at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:123)
	at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:254)
	at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:195)
	at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:86)
	at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:108)
	at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184)
	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:106)
	at uk.bl.wap.util.OutbackCDXClient.getCDXLine(OutbackCDXClient.java:268)

Having added some logging to OutbackCDX, I can see

java.net.SocketException: Broken pipe (Write failed)
	at java.net.SocketOutputStream.socketWrite0(Native Method)
	at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111)
	at java.net.SocketOutputStream.write(SocketOutputStream.java:155)
	at outbackcdx.NanoHTTPD$CountingOutputStream.write(NanoHTTPD.java:1092)
	at java.io.FilterOutputStream.write(FilterOutputStream.java:97)
	at outbackcdx.NanoHTTPD$ChunkingOutputStream.write(NanoHTTPD.java:702)
	at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221)

	at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:295)
	at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:141)
	at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:229)
	at java.io.BufferedWriter.flush(BufferedWriter.java:254)
	at outbackcdx.WbCdxApi.lambda$queryIndex$0(WbCdxApi.java:71)
	at outbackcdx.NanoHTTPD$Response.send(NanoHTTPD.java:539)

	at outbackcdx.NanoHTTPD$1$1.run(NanoHTTPD.java:210)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

(warning mixture of threads in that log)

So the error seems to be consistent with the client going away (broken pipe).

anjackson · 2020-10-07T14:42:07Z

I note my NGINX comparison would not have stretched the chunked transfer decoding, for example, so may not be a fair test. It's also possible I'm exhausting ephemeral ports on this machine, but it's an odd way for that error to show up.

ato · 2020-10-07T15:57:18Z

Yeah, that's consistent with the client closing the connection. Although it could obviously be doing that in response to something the server does like invalid chunked encoding. Can't be ephemeral port exhaustion because the connection was already established (unless the client library responds to being unable to open new connections by closing random existing ones or something).

One thing you could try is running OutbackCDX with the undocumented and experimental '-u' command-line option. This makes it use Undertow instead of NanoHTTPD as the http server. Chances are Undertow has more overhead but it's more battle tested and probably has better error handling.

anjackson · 2020-10-07T23:12:30Z

Oof, well, to give you an idea of how it's going, I tried upping the threads some more and got this!

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x000000010c33ad1c, pid=36661, tid=111619
#
# JRE version: Java(TM) SE Runtime Environment (8.0-b124) (build 1.8.0-ea-b124)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.0-b66 mixed mode bsd-amd64 compressed oops)
# Problematic frame:
# C  [libnet.dylib+0x9d1c]  Java_java_net_SocketInputStream_socketRead0+0x23d

So perhaps there are deeper problems.

Thanks for the tip about Undertow. It does seem slightly happier, reporting some errors on the back-end but not in the client (?!).

java.io.IOException: Broken pipe
	at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
	at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
	at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
	at sun.nio.ch.IOUtil.write(IOUtil.java:65)
	at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471)
	at org.xnio.nio.NioSocketConduit.write(NioSocketConduit.java:164)
	at io.undertow.server.protocol.http.HttpResponseConduit.write(HttpResponseConduit.java:609)
	at io.undertow.conduits.ChunkedStreamSinkConduit.flush(ChunkedStreamSinkConduit.java:270)
	at org.xnio.conduits.ConduitStreamSinkChannel.flush(ConduitStreamSinkChannel.java:162)
	at io.undertow.channels.DetachableStreamSinkChannel.flush(DetachableStreamSinkChannel.java:119)
	at org.xnio.channels.Channels.flushBlocking(Channels.java:63)
	at io.undertow.io.UndertowOutputStream.close(UndertowOutputStream.java:348)
	at outbackcdx.UWeb$UServer.sendResponse(UWeb.java:71)
	at outbackcdx.UWeb$UServer.dispatch(UWeb.java:42)
	at io.undertow.server.Connectors.executeRootHandler(Connectors.java:376)
	at io.undertow.server.HttpServerExchange$1.run(HttpServerExchange.java:830)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

and then later on:

Oct 07, 2020 11:54:01 PM io.undertow.server.Connectors executeRootHandler
ERROR: UT005071: Undertow request failed HttpServerExchange{ GET /fc}
java.lang.IllegalStateException: UT000002: The response has already been started
	at io.undertow.server.HttpServerExchange.setStatusCode(HttpServerExchange.java:1406)
	at outbackcdx.UWeb$UServer.sendResponse(UWeb.java:54)
	at outbackcdx.UWeb$UServer.dispatch(UWeb.java:49)
	at io.undertow.server.Connectors.executeRootHandler(Connectors.java:376)
	at io.undertow.server.HttpServerExchange$1.run(HttpServerExchange.java:830)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

Upped the threads to OutbackCDX and these errors no longer showed up. Could be coincidence.

Very odd speed profile. First column is parallel thread count. Third is seconds per thread (where each thread has the same workload). Gets faster up to 750 threads then big slowdown at 1000. Not clear why.

10	1620	162.0	0	1000
50	818	16.36	0	5000
100	1209	12.09	0	10000
250	2286	9.144	0	25000
500	4664	9.328	0	50000
750	6665	8.886666666666667	0	75000
1000	63796	63.796	0	100000

I think the problem is likely not with OutbackCDX, unless maybe something subtle going on with keepalive and chunked encoding. I'll do some more experiments on a different platform.

anjackson · 2020-10-08T09:07:28Z

Using Undertow, through Docker:

10	1522	152.2	0	1000
50	1890	37.8	0	5000
100	3621	36.21	0	10000
250	8615	34.46	0	25000
500	18705	37.41	0	50000
750	27317	36.422666666666665	0	75000
1000	86674	86.674	0	100000

No errors!

Then back to NanoHTTPD, still under Docker:

10	5633	563.3	0	1000
50	5720	114.4	0	5000
100	8543	85.43	0	10000
250	19600	78.4	0	25000
500	37367	74.734	0	50000
750	47824	63.76533333333333	0	75000
...LOTS OF SOCKET CLOSED BLARFING...
1000	71197	71.197	0	100000

So, while the underlying reasons are unclear, Undertow appears to be faster in general and more stable in particular. So I'll try using Undertow mode in production.

ato · 2020-10-08T11:41:54Z

Good to know. Maybe I'll make undertow mode the default then and deprecate nanohttpd.

anjackson closed this as completed Oct 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible race condition under load #89

Possible race condition under load #89

anjackson commented Oct 6, 2020 •

edited

Loading

anjackson commented Oct 6, 2020

ato commented Oct 7, 2020 •

edited

Loading

anjackson commented Oct 7, 2020

anjackson commented Oct 7, 2020

ato commented Oct 7, 2020 •

edited

Loading

anjackson commented Oct 7, 2020

anjackson commented Oct 8, 2020

ato commented Oct 8, 2020

Possible race condition under load #89

Possible race condition under load #89

Comments

anjackson commented Oct 6, 2020 • edited Loading

anjackson commented Oct 6, 2020

ato commented Oct 7, 2020 • edited Loading

anjackson commented Oct 7, 2020

anjackson commented Oct 7, 2020

ato commented Oct 7, 2020 • edited Loading

anjackson commented Oct 7, 2020

anjackson commented Oct 8, 2020

ato commented Oct 8, 2020

anjackson commented Oct 6, 2020 •

edited

Loading

ato commented Oct 7, 2020 •

edited

Loading

ato commented Oct 7, 2020 •

edited

Loading