Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

weird behavior during log wait condition on during container start #344

Open
jgangemi opened this issue Dec 3, 2015 · 30 comments
Open

weird behavior during log wait condition on during container start #344

jgangemi opened this issue Dec 3, 2015 · 30 comments

Comments

@jgangemi
Copy link
Collaborator

jgangemi commented Dec 3, 2015

so this is a fun one...

it seems like every so often, something is interrupting the stream returned from docker while waiting on a log condition. the annoying things is, i have no real test case that can easily reproduce this.

it seems to occur at random during a rather large monolithic build. each project that needs the container starts one, which in turns causes a bunch of sql scripts to run which take ~30s or so.

basically the read call is blocking on data it expects to be there, but isn't. i'm kind of at a loss as to what could be causing this b/c it just fails at random times on an operation that has been successful before. when running w/ the -X option, there it's also random as to where in the logs the stream cuts out.

any thoughts on this? i'm gonna play around in the logging code tomorrow and turn up docker's logging to see if it will tell me anything useful.

@rhuss
Copy link
Collaborator

rhuss commented Jan 20, 2016

@jgangemi could it be related to the concurrency issue we have in which in #259 ?

@jgangemi
Copy link
Collaborator Author

i think this problem was caused by a corrupt docker-machine instead b/c after re-creating it, the problem went away.

i do, however, have a different problem now related to stopping the container. stop will be issued and then the plugin will hang. i think it's waiting for the docker daemon to send something back that never comes.

[DEBUG] http-outgoing-5 << "HTTP/1.1 204 No Content[\r][\n]"
[DEBUG] http-outgoing-5 << "Server: Docker/1.9.1 (linux)[\r][\n]"
[DEBUG] http-outgoing-5 << "Date: Thu, 28 Jan 2016 15:42:40 GMT[\r][\n]"
[DEBUG] http-outgoing-5 << "[\r][\n]"
[DEBUG] http-outgoing-5 << HTTP/1.1 204 No Content
[DEBUG] http-outgoing-5 << Server: Docker/1.9.1 (linux)
[DEBUG] http-outgoing-5 << Date: Thu, 28 Jan 2016 15:42:40 GMT
[DEBUG] Connection can be kept alive indefinitely
[DEBUG] Connection [id: 5][route: {}->unix://127.0.0.1:1] can be kept alive indefinitely
[DEBUG] Connection released: [id: 5][route: {}->unix://127.0.0.1:1][total kept alive: 1; route allocated: 1 of 100; total allocated: 1 of 20]

and this is what i get when it hangs:

[DEBUG] Connection request: [route: {}->unix://127.0.0.1:1][total kept alive: 0; route allocated: 0 of 100; total allocated: 0 of 20]
[DEBUG] Connection leased: [id: 35][route: {}->unix://127.0.0.1:1][total kept alive: 0; route allocated: 1 of 100; total allocated: 1 of 20]
[DEBUG] Opening connection {}->unix://127.0.0.1:1
[DEBUG] Connecting to 0.0.0.0/0.0.0.0:1
[DEBUG] Connection established [family=PF_UNIX path=sock]<->[family=PF_UNIX path=/var/run/docker.sock]
[DEBUG] Executing request POST /v1.18/containers/5c30d1ac839c/stop?t=30 HTTP/1.1
[DEBUG] Target auth state: UNCHALLENGED
[DEBUG] Proxy auth state: UNCHALLENGED
[DEBUG] http-outgoing-35 >> POST /v1.18/containers/5c30d1ac839c/stop?t=30 HTTP/1.1
[DEBUG] http-outgoing-35 >> Accept: */*
[DEBUG] http-outgoing-35 >> Content-Type: application/json; charset=utf-8
[DEBUG] http-outgoing-35 >> Content-Length: 0
[DEBUG] http-outgoing-35 >> Host: 127.0.0.1:1
[DEBUG] http-outgoing-35 >> Connection: Keep-Alive
[DEBUG] http-outgoing-35 >> User-Agent: Apache-HttpClient/4.3.6 (java 1.5)
[DEBUG] http-outgoing-35 >> Accept-Encoding: gzip,deflate
[DEBUG] http-outgoing-35 >> "POST /v1.18/containers/5c30d1ac839c/stop?t=30 HTTP/1.1[\r][\n]"
[DEBUG] http-outgoing-35 >> "Accept: */*[\r][\n]"
[DEBUG] http-outgoing-35 >> "Content-Type: application/json; charset=utf-8[\r][\n]"
[DEBUG] http-outgoing-35 >> "Content-Length: 0[\r][\n]"
[DEBUG] http-outgoing-35 >> "Host: 127.0.0.1:1[\r][\n]"
[DEBUG] http-outgoing-35 >> "Connection: Keep-Alive[\r][\n]"
[DEBUG] http-outgoing-35 >> "User-Agent: Apache-HttpClient/4.3.6 (java 1.5)[\r][\n]"
[DEBUG] http-outgoing-35 >> "Accept-Encoding: gzip,deflate[\r][\n]"
[DEBUG] http-outgoing-35 >> "[\r][\n]"
[DEBUG] http-outgoing-34 << "8[\r][\n]"
[DEBUG] http-outgoing-34 << "[0x1][0x0][0x0][0x0][0x0][0x0][0x0]'[\r][\n]"
[DEBUG] http-outgoing-34 << "27[\r][\n]"
[DEBUG] http-outgoing-34 << "2016-01-28T15:57:10.338038105Z .......[\n]"
[DEBUG] http-outgoing-34 << "[\r][\n]"
[DEBUG] http-outgoing-34 << "HTTP/1.1 204 No Content[\r][\n]"
[DEBUG] http-outgoing-34 << "Server: Docker/1.9.1 (linux)[\r][\n]"
[DEBUG] http-outgoing-34 << "Date: Thu, 28 Jan 2016 15:57:10 GMT[\r][\n]"
[DEBUG] http-outgoing-34 << "[\r][\n]"

i haven't figured out if this is a docker issue or a plugin issue.

initially i thought it was a problem w/ it taking too long for docker to kill the container, so i added kill to the wait configuration (telling it to wait 30 seconds) but that didn't seem to resolve the issue.

thoughts?

@rhuss
Copy link
Collaborator

rhuss commented Jan 28, 2016

Sorry, no idea here so far. Could you create a thread dump when it hangs in order to see where the plugin is currently ?

@jgangemi
Copy link
Collaborator Author

maybe - let me see if i can get it to hang outside of jenkins and i can hit it w/ jstack

@jgangemi
Copy link
Collaborator Author

it looks like httpclient is waiting for input that is never going to come

"main" #1 prio=5 os_prio=0 tid=0x00007f306c00a000 nid=0x1caa runnable [0x00007f307271d000]
   java.lang.Thread.State: RUNNABLE
    at com.kenai.jffi.Foreign.invokeN3O1(Native Method)
    at com.kenai.jffi.Invoker.invokeN3(Invoker.java:1061)
    at jnr.enxio.channels.Native$LibC$jnr$ffi$1.read(Unknown Source)
    at jnr.enxio.channels.Native.read(Native.java:95)
    at jnr.enxio.channels.NativeSocketChannel.read(NativeSocketChannel.java:70)
    at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:59)
    - locked <0x0000000789e87e70> (a java.lang.Object)
    at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:109)
    at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103)
    - locked <0x0000000789e8a0d0> (a sun.nio.ch.ChannelInputStream)
    at java.io.FilterInputStream.read(FilterInputStream.java:133)
    at org.apache.http.impl.conn.LoggingInputStream.read(LoggingInputStream.java:87)
    at org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:136)
    at org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:152)
    at org.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:270)
    at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:140)
    at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57)
    at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:260)
    at org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:161)
    at org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:153)
    at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:271)
    at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:123)
    at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:254)
    at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:195)
    at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:86)
    at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:108)
    at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184)
    at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:72)
    at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:214)
    at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:160)
    at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:136)
    at org.jolokia.docker.maven.access.hc.ApacheHttpClientDelegate.post(ApacheHttpClientDelegate.java:83)
    at org.jolokia.docker.maven.access.hc.ApacheHttpClientDelegate.post(ApacheHttpClientDelegate.java:89)
    at org.jolokia.docker.maven.access.hc.DockerAccessWithHcClient.stopContainer(DockerAccessWithHcClient.java:183)
    at org.jolokia.docker.maven.service.RunService.shutdown(RunService.java:372)
    at org.jolokia.docker.maven.service.RunService.stopStartedContainers(RunService.java:157)
    - locked <0x0000000787efa388> (a org.jolokia.docker.maven.service.ContainerTracker)
    at org.jolokia.docker.maven.StopMojo.executeInternal(StopMojo.java:44)
    at org.jolokia.docker.maven.AbstractDockerMojo.execute(AbstractDockerMojo.java:164)
    at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:134)
    at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208)
    at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153)
    at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145)
    at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:116)
    at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:80)
    at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51)
    at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:128)
    at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:307)
    at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:193)
    at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:106)
    at org.apache.maven.cli.MavenCli.execute(MavenCli.java:862)
    at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:286)
    at org.apache.maven.cli.MavenCli.main(MavenCli.java:197)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:497)
    at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289)
    at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229)
    at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415)
    at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356)

i wonder if there's a way to force httpclient to timeout while waiting (which currently does not happen).

@jgangemi
Copy link
Collaborator Author

this is the output from a successful stop

[DEBUG] http-outgoing-2 >> Accept: */*
[DEBUG] http-outgoing-2 >> Content-Type: application/json; charset=utf-8
[DEBUG] http-outgoing-2 >> Content-Length: 0
[DEBUG] http-outgoing-2 >> Host: 127.0.0.1:1
[DEBUG] http-outgoing-2 >> Connection: Keep-Alive
[DEBUG] http-outgoing-2 >> User-Agent: Apache-HttpClient/4.3.6 (java 1.5)
[DEBUG] http-outgoing-2 >> Accept-Encoding: gzip,deflate
[DEBUG] http-outgoing-2 >> "POST /v1.18/containers/377cd265e73d/stop?t=30 HTTP/1.1[\r][\n]"
[DEBUG] http-outgoing-2 >> "Accept: */*[\r][\n]"
[DEBUG] http-outgoing-2 >> "Content-Type: application/json; charset=utf-8[\r][\n]"
[DEBUG] http-outgoing-2 >> "Content-Length: 0[\r][\n]"
[DEBUG] http-outgoing-2 >> "Host: 127.0.0.1:1[\r][\n]"
[DEBUG] http-outgoing-2 >> "Connection: Keep-Alive[\r][\n]"
[DEBUG] http-outgoing-2 >> "User-Agent: Apache-HttpClient/4.3.6 (java 1.5)[\r][\n]"
[DEBUG] http-outgoing-2 >> "Accept-Encoding: gzip,deflate[\r][\n]"
[DEBUG] http-outgoing-2 >> "[\r][\n]"
[DEBUG] http-outgoing-1 << "0[\r][\n]"
[DEBUG] http-outgoing-1 << "[\r][\n]"
[DEBUG] http-outgoing-2 << "HTTP/1.1 204 No Content[\r][\n]"
[DEBUG] http-outgoing-2 << "Server: Docker/1.9.1 (linux)[\r][\n]"
[DEBUG] http-outgoing-2 << "Date: Fri, 29 Jan 2016 01:33:35 GMT[\r][\n]"
[DEBUG] http-outgoing-2 << "[\r][\n]"
[DEBUG] http-outgoing-2 << HTTP/1.1 204 No Content
[DEBUG] http-outgoing-2 << Server: Docker/1.9.1 (linux)
[DEBUG] http-outgoing-2 << Date: Fri, 29 Jan 2016 01:33:35 GMT
[DEBUG] Connection can be kept alive indefinitely
[DEBUG] Connection [id: 2][route: {}->unix://127.0.0.1:1] can be kept alive indefinitely
[DEBUG] Connection released: [id: 2][route: {}->unix://127.0.0.1:1][total kept alive: 1; route allocated: 1 of 100; total allocated: 1 of 20]
[DEBUG] DOCKER> Shutdown: Wait 30 s after stopping and before killing container

so my current thought is the container is talking too long to shutdown and the .s ubuntu/upstart is outputting to let you know something is still happening is causing this behavior.

i'm still digging deeper

@brian98point6
Copy link

I'm running into same httpclient issue too. In my case, it's stuck on a GET to unix://127.0.0.1:1/v1.18/containers/23fb3c95d18c/json

"main@1" prio=5 tid=0x1 nid=NA runnable
  java.lang.Thread.State: RUNNABLE
      at com.kenai.jffi.Foreign.invokeN3O1(Foreign.java:-1)
      at com.kenai.jffi.Invoker.invokeN3(Invoker.java:1061)
      at jnr.enxio.channels.Native$LibC$jnr$ffi$1.read(Unknown Source:-1)
      at jnr.enxio.channels.Native.read(Native.java:95)
      at jnr.enxio.channels.NativeSocketChannel.read(NativeSocketChannel.java:70)
      at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:59)
      - locked <0x16d9> (a java.lang.Object)
      at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:109)
      at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103)
      - locked <0x16c8> (a sun.nio.ch.ChannelInputStream)
      at java.io.FilterInputStream.read(FilterInputStream.java:133)
      at org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:136)
      at org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:152)
      at org.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:270)
      at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:140)
      at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57)
      at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:260)
      at org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:161)
      at org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:153)
      at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:271)
      at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:123)
      at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:254)
      at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:195)
      at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:86)
      at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:108)
      at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184)
      at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:72)
      at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:214)
      at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:160)
      at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:136)
      at io.fabric8.maven.docker.access.hc.ApacheHttpClientDelegate.get(ApacheHttpClientDelegate.java:48)
      at io.fabric8.maven.docker.access.hc.DockerAccessWithHcClient.inspectContainer(DockerAccessWithHcClient.java:234)
      at io.fabric8.maven.docker.service.QueryService.getContainer(QueryService.java:45)
      at io.fabric8.maven.docker.StartMojo.exposeContainerProps(StartMojo.java:299)
      at io.fabric8.maven.docker.StartMojo.executeInternal(StartMojo.java:121)
      - locked <0x16b8> (a io.fabric8.maven.docker.StartMojo)
      at io.fabric8.maven.docker.AbstractDockerMojo.execute(AbstractDockerMojo.java:164)
      at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:134)
      at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208)
      at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153)
      at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145)
      at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:116)
      at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:80)
      at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51)
      at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:128)
      at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:307)
      at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:193)
      at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:106)
      at org.apache.maven.cli.MavenCli.execute(MavenCli.java:862)
      at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:286)
      at org.apache.maven.cli.MavenCli.main(MavenCli.java:197)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(NativeMethodAccessorImpl.java:-1)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      at java.lang.reflect.Method.invoke(Method.java:497)
      at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289)
      at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229)
      at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415)
      at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356)

This is running on an EC2 instance, Ubuntu 14.04
Docker version 1.7.0, build 0baf609
Apache Maven 3.3.3
Java version: 1.8.0_45
docker-maven-plugin: 0.15.0

Please let me know if there is any other information which will help.

@brian98point6
Copy link

Here is the tail-end of maven-debug logs I got from a different repro.

[DEBUG] http-outgoing-2 >> GET /v1.18/images/rafastealth%2Fkb-service%3A0.1-SNAPSHOT-4ac4f550d1f5d038/json HTTP/1.1
[DEBUG] http-outgoing-2 >> Accept: */*
[DEBUG] http-outgoing-2 >> Content-Type: application/json; charset=utf-8
[DEBUG] http-outgoing-2 >> Host: 127.0.0.1:1
[DEBUG] http-outgoing-2 >> Connection: Keep-Alive
[DEBUG] http-outgoing-2 >> User-Agent: Apache-HttpClient/4.3.6 (java 1.5)
[DEBUG] http-outgoing-2 >> Accept-Encoding: gzip,deflate
[DEBUG] http-outgoing-2 >> "GET /v1.18/images/rafastealth%2Fkb-service%3A0.1-SNAPSHOT-4ac4f550d1f5d038/json HTTP/1.1[\r][\n]"
[DEBUG] http-outgoing-2 >> "Accept: */*[\r][\n]"
[DEBUG] http-outgoing-2 >> "Content-Type: application/json; charset=utf-8[\r][\n]"
[DEBUG] http-outgoing-2 >> "Host: 127.0.0.1:1[\r][\n]"
[DEBUG] http-outgoing-2 >> "Connection: Keep-Alive[\r][\n]"
[DEBUG] http-outgoing-2 >> "User-Agent: Apache-HttpClient/4.3.6 (java 1.5)[\r][\n]"
[DEBUG] http-outgoing-2 >> "Accept-Encoding: gzip,deflate[\r][\n]"
[DEBUG] http-outgoing-2 >> "[\r][\n]"
[DEBUG] http-outgoing-1 << "HTTP/1.1 200 OK[\r][\n]"
[DEBUG] http-outgoing-1 << "Content-Type: application/json[\r][\n]"
[DEBUG] http-outgoing-1 << "Date: Fri, 29 Apr 2016 18:10:34 GMT[\r][\n]"
[DEBUG] http-outgoing-1 << "Content-Length: 1739[\r][\n]"
[DEBUG] http-outgoing-1 << "[\r][\n]"

What's interesting to me is that the request is made in one thread, and received in another thread (in the case http-outgoing-2 vs. http-outgoing-1). I can see it is the case with the OP's example of a failed POST, where the GET is made on http-outgoing-35 and the 200 OK is received on http-outgoing-34. In the OP's example of a successful POST, the same thread makes the request and receives the response.

Not sure if this is pertinent information...

@rhuss
Copy link
Collaborator

rhuss commented May 3, 2016

I think this is normal behaviour of Apache's HcClient to send the request on one thread and receive it on another. At least this is what I alway see for every request.

@rhuss
Copy link
Collaborator

rhuss commented May 3, 2016

Interesting in your logs are though that the server responds with a content-length of 1739 but doesn't continue to send sth (as it seems). This could have two reasons: Either the client is not reading (but from the stacktrace it is blocking for this read, so dont think that is the case) or the server is not sending (which I think is more likely). IMO this here is an issue on the Docker daemon side, could you check the logs there, too ?

@rhuss
Copy link
Collaborator

rhuss commented May 3, 2016

@jgangemi

[DEBUG] http-outgoing-35 >> "[\r][\n]"
[DEBUG] http-outgoing-34 << "8[\r][\n]"
[DEBUG] http-outgoing-34 << "[0x1][0x0][0x0][0x0][0x0][0x0][0x0]'[\r][\n]"
[DEBUG] http-outgoing-34 << "27[\r][\n]"
[DEBUG] http-outgoing-34 << "2016-01-28T15:57:10.338038105Z .......[\n]"
[DEBUG] http-outgoing-34 << "[\r][\n]"
[DEBUG] http-outgoing-34 << "HTTP/1.1 204 No Content[\r][\n]"
[DEBUG] http-outgoing-34 << "Server: Docker/1.9.1 (linux)[\r][\n]"
[DEBUG] http-outgoing-34 << "Date: Thu, 28 Jan 2016 15:57:10 GMT[\r][\n]"
[DEBUG] http-outgoing-34 << "[\r][\n]"

What's strange here is that the response stream contains still some input from a log reading connection first before the HTTP response header is received. This could be the reason for the hang since the http client still wait for a valid response, I guess.

So the problem seems to be that connections are reused to early by HcClient (i.e. when still in progress of processing log output) ? You probably dont have the logs anymore, but when this happens the next time, could you check in the log where the blocking connection is also used (here it would be http-outgoing-34) ?

@rhuss
Copy link
Collaborator

rhuss commented May 3, 2016

I think this is normal behaviour of Apache's HcClient to send the request on one thread and receive it on another. At least this is what I alway see for every request.

@brian98point6 Sorry, I misunderstood the logs. What you see are not threads, but connection objects which are potentially reused because the plugin uses a pooling thread connector.

@rhuss
Copy link
Collaborator

rhuss commented May 3, 2016

If someone would have full log with debug enabled for the blocking case, this would help me quite a bit.

The sequence of connection leasing / using / releasing can be best seen with

 mvn -X install | grep " Connection "

rhuss added a commit that referenced this issue May 3, 2016
There seems to be issues with long running HTTP request like fetching logs with chunked encodings and pooling connections. It seems that connections get released to soon and then reused in a different context when there is still log data to read. Now logging uses a non-pooled, single connection client for doing async log fetching which hopefully resolves the issues reported in #344 and #259.
@brian98point6
Copy link

Thanks for clarifying threads vs. connection objects. You can tell I'm just guessing around :)

Output of grep " Connection" of the debug log I saved last week:
http://pastebin.com/tJUzrYKY

I noticed that there was a "route allocated: 0" and found that http-outgoing-1 was closed when my postgres container was ready, and then used again during the db-migrate container run. Is that odd?
http://pastebin.com/hSNDfxUW

@rhuss
Copy link
Collaborator

rhuss commented May 3, 2016

@brian98point6 Thanks a lot ;-). BTW, could you try 0.15.1 released just today ? I changed the log handling quite a bit to use a dedicated HttpClient, so it shouldnt interfere with the rest anymore. from my feeling, this should fix a lot of the issues we had.

would be happy if you could give it a try ;-)

@jgangemi
Copy link
Collaborator Author

jgangemi commented May 3, 2016

i haven't seen this problem happen for a little while. @brian98point6, if you still see this problem happening, can you try upgrading your version of docker?

unfortunately i didn't record the version i was using when i filed this issue so it's also possible some of this issue was on docker's end.

@brian98point6
Copy link

@rhuss I tried v0.15.1 with the older docker version. It's successful 5 for 5, looking good. Thanks!!

@jgangemi I tried v0.15.0 after upgrading docker from 1.7.0 to 1.9.1. It's still failing about half the time. Are you on the latest version of docker now?

Meanwhile, I'll use 0.15.1 in our CI pipeline and confirm that the issue is resolved by tomorrow.

@brian98point6
Copy link

I've run maybe another 20 times. No hanging. So I think the issue is resolved with v0.15.1. Thank you so much!

@rhuss
Copy link
Collaborator

rhuss commented May 9, 2016

@brian98point6 @jgangemi I found some issues probably within the Unix socket implementation. How do you connect to the Docker daemon, via TCP or via Unix socket ?

@jgangemi
Copy link
Collaborator Author

jgangemi commented May 9, 2016

both - on my mac it's tcp, on linux it's the unix socket.

@brian98point6
Copy link

Ditto, but we haven't been able to reproduce this issue on mac/tcp, only on linux/unix sockets. BTW, I'm on the same team as @thomasvandoren. Mentioning this because I assume this is related to #436 (comment)

@jgangemi
Copy link
Collaborator Author

i'm going to try and see if i can verify if this is a unix socket vs tcp thing myself right now b/c this issue has crept back up for us.

@jgangemi
Copy link
Collaborator Author

ok - tcp works fine, unix socket does not, however where it decides to hang in our rather large reactor build is non-deterministic.

@rhuss
Copy link
Collaborator

rhuss commented May 12, 2016

In my analysis it looked like that the unix socket get access gets confused sometimes when accessing from multiple threads. We use a second thread for checking the log wait condition asynchronously, so a prober solution would be to avoid this second thread and do the check on the main thread which also prints the loggigng.

Currently I don't see another solution except fixing the jcr-unix socket library.

@jgangemi
Copy link
Collaborator Author

is there another unix socket library we could try?

@rhuss
Copy link
Collaborator

rhuss commented May 12, 2016

not that I'm aware of.

@jgangemi
Copy link
Collaborator Author

@zymzxq
Copy link

zymzxq commented Jul 4, 2016

We are still seeing this problem with 0.15.9

@rhuss
Copy link
Collaborator

rhuss commented Jul 4, 2016

I tried junixsocket, but it had even more severe issues. I really think the single thread solution is the way to go, but it's not a small task. Hopefully we can tackle that soon, but we also love contributions ;-)

@jgangemi
Copy link
Collaborator Author

jgangemi commented Jul 4, 2016

the last reason of juds was in february, so perhaps that might be worth trying out.

in the meantime, using the the tcp socket is a work around.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants