New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JENKINS-40825 close resource leak, fix broken pipe error #182

Merged
merged 11 commits into from Aug 9, 2017

Conversation

Projects
None yet
@marvinthepa

marvinthepa commented Aug 3, 2017

resource leak

First of all, Launcher.launch is called not once, but again and again.
I.e. overwriting shared members of the enclosing class that a different
thread is still using (and only cleaning the ones that were written last).
That is a bad idea, and caused the thread and connection leak.

So, move them (watch and proc) to local variables, and close them
elsewhere:

  • watch was already cleaned up in ContainerExecProc
  • proc.kill() was not correctly executed anyway, and this is still broken. See discussion below.

pipe not connected

Second of all, waitQuietly on the started latch was what caused
the "pipe not connected" errors:

In case of InterruptedException, it caused the launcher to just continue
as if started had already been counted down (i.e. the websocket connection
established), when in reality the thread just interrupted itself because of a timeout
(probably from DurableTaskStep.check()).

still TODO: proc.kill when step is interrupted

If the sh step inside container is interrupted, DefaultStepContext will create a new Launcher and DurableTaskStep will call kill on it - not call kill on existing procs that have finished (or not). If that one does not kill the process (which DurableTaskStep checks), ContainerStepExecution.stop will also not be called.

Keeping a single reference of proc around and calling kill on it when ContainerStepExecution.stop is called (that is what the current master does) would not work anyway.

This is because both ContainerExecDecorator.decorate and Launcher.launch are called multiple times (at least to start the actual script, and then to run a number of ps executions every 10 seconds to check on the process) - so we are very likely to call kill on the wrong instance.

Even if we keep track of all the procs we created - how will we know which is the right one?

The "right" solution is one like in the Docker Pipeline Plugin:
I.e. open a new connection to the container, find the right process to kill (using the environment variable JENKINS_SERVER_COOKIE=durable*, and kill it.

I can implement that solution, but as it has little to do with JENKINS-40825, do you want it on a new pull request?

Sorry for writing such a "novel", I hope I can make myself clear.

@iocanel

The only thing I am not sure about, is if we should remove the call to close() from stop and form the the BodyExecutionCallback.

@michaelajr

This comment has been minimized.

Show comment
Hide comment
@michaelajr

michaelajr Aug 3, 2017

So we just tried out the code for this PR. We had a single job, running concurrent builds, on exclusive agents with random labels. This ensured each running job was on its on label and agent. The podTemplate code simply switched to the JNLP container and did a sleep 300. We were running 17 agents. The first 7 jobs got this during its sleep:

[sleep] Running shell script
+ sleep 300
Cannot contact jenkins-agent-lm0wn-pk3k5: java.io.IOException: interrupted while waiting for websocket connection

The other jobs then immediately stopped printing output and appeared frozen at points before the sleep.

The jobs that had the websocket connection line after the sleep completed successfully with

[sleep] Running shell script
+ sleep 300
Cannot contact jenkins-agent-lm0wn-pk3k5: java.io.IOException: interrupted while waiting for websocket connection
[Pipeline] }
[Pipeline] // container
[Pipeline] }
[Pipeline] // stage
[Pipeline] }
[Pipeline] // node
[Pipeline] }
[Pipeline] // podTemplate
[Pipeline] End of Pipeline
Finished: SUCCESS

(actually 6 got this and a 7th succeeded without the log line)

All the other jobs that were frozen failed with:

java.io.IOException: interrupted while waiting for websocket connection
	at org.csanchez.jenkins.plugins.kubernetes.pipeline.ContainerExecDecorator$1.launch(ContainerExecDecorator.java:169)
	at hudson.Launcher$ProcStarter.start(Launcher.java:384)
	at org.jenkinsci.plugins.durabletask.BourneShellScript.launchWithCookie(BourneShellScript.java:157)
	at org.jenkinsci.plugins.durabletask.FileMonitoringTask.launch(FileMonitoringTask.java:64)
	at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.start(DurableTaskStep.java:167)
	at org.jenkinsci.plugins.workflow.cps.DSL.invokeStep(DSL.java:224)
	at org.jenkinsci.plugins.workflow.cps.DSL.invokeMethod(DSL.java:150)
	at org.jenkinsci.plugins.workflow.cps.CpsScript.invokeMethod(CpsScript.java:108)
	at sun.reflect.GeneratedMethodAccessor199.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:93)
	at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325)
	at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1218)
	at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1027)
	at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:42)
	at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:48)
	at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:113)
	at org.kohsuke.groovy.sandbox.impl.Checker$1.call(Checker.java:150)
	at org.kohsuke.groovy.sandbox.GroovyInterceptor.onMethodCall(GroovyInterceptor.java:23)
	at org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.SandboxInterceptor.onMethodCall(SandboxInterceptor.java:123)
	at org.kohsuke.groovy.sandbox.impl.Checker$1.call(Checker.java:148)
	at org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:152)
	at org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:122)
	at org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:122)
	at org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:122)
	at org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:122)
	at com.cloudbees.groovy.cps.sandbox.SandboxInvoker.methodCall(SandboxInvoker.java:16)
	at WorkflowScript.run(WorkflowScript:63)
	at ___cps.transform___(Native Method)
	at com.cloudbees.groovy.cps.impl.ContinuationGroup.methodCall(ContinuationGroup.java:57)
	at com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.dispatchOrArg(FunctionCallBlock.java:109)
	at com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.fixArg(FunctionCallBlock.java:82)
	at sun.reflect.GeneratedMethodAccessor151.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at com.cloudbees.groovy.cps.impl.ContinuationPtr$ContinuationImpl.receive(ContinuationPtr.java:72)
	at com.cloudbees.groovy.cps.impl.CollectionLiteralBlock$ContinuationImpl.dispatch(CollectionLiteralBlock.java:55)
	at com.cloudbees.groovy.cps.impl.CollectionLiteralBlock$ContinuationImpl.item(CollectionLiteralBlock.java:45)
	at sun.reflect.GeneratedMethodAccessor148.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at com.cloudbees.groovy.cps.impl.ContinuationPtr$ContinuationImpl.receive(ContinuationPtr.java:72)
	at com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.dispatchOrArg(FunctionCallBlock.java:103)
	at com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.fixArg(FunctionCallBlock.java:82)
	at sun.reflect.GeneratedMethodAccessor151.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at com.cloudbees.groovy.cps.impl.ContinuationPtr$ContinuationImpl.receive(ContinuationPtr.java:72)
	at com.cloudbees.groovy.cps.impl.ContinuationGroup.methodCall(ContinuationGroup.java:60)
	at com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.dispatchOrArg(FunctionCallBlock.java:109)
	at com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.fixArg(FunctionCallBlock.java:82)
	at sun.reflect.GeneratedMethodAccessor151.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at com.cloudbees.groovy.cps.impl.ContinuationPtr$ContinuationImpl.receive(ContinuationPtr.java:72)
	at com.cloudbees.groovy.cps.impl.ConstantBlock.eval(ConstantBlock.java:21)
	at com.cloudbees.groovy.cps.Next.step(Next.java:83)
	at com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:173)
	at com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:162)
	at org.codehaus.groovy.runtime.GroovyCategorySupport$ThreadCategoryInfo.use(GroovyCategorySupport.java:122)
	at org.codehaus.groovy.runtime.GroovyCategorySupport.use(GroovyCategorySupport.java:261)
	at com.cloudbees.groovy.cps.Continuable.run0(Continuable.java:162)
	at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.access$001(SandboxContinuable.java:19)
	at org.jenkinsci.plugins.workflow.cps.SandboxContinuable$1.call(SandboxContinuable.java:35)
	at org.jenkinsci.plugins.workflow.cps.SandboxContinuable$1.call(SandboxContinuable.java:32)
	at org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.GroovySandbox.runInSandbox(GroovySandbox.java:108)
	at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.run0(SandboxContinuable.java:32)
	at org.jenkinsci.plugins.workflow.cps.CpsThread.runNextChunk(CpsThread.java:174)
	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:330)
	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.access$100(CpsThreadGroup.java:82)
	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:242)
	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:230)
	at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:64)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
	at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
Finished: FAILURE

michaelajr commented Aug 3, 2017

So we just tried out the code for this PR. We had a single job, running concurrent builds, on exclusive agents with random labels. This ensured each running job was on its on label and agent. The podTemplate code simply switched to the JNLP container and did a sleep 300. We were running 17 agents. The first 7 jobs got this during its sleep:

[sleep] Running shell script
+ sleep 300
Cannot contact jenkins-agent-lm0wn-pk3k5: java.io.IOException: interrupted while waiting for websocket connection

The other jobs then immediately stopped printing output and appeared frozen at points before the sleep.

The jobs that had the websocket connection line after the sleep completed successfully with

[sleep] Running shell script
+ sleep 300
Cannot contact jenkins-agent-lm0wn-pk3k5: java.io.IOException: interrupted while waiting for websocket connection
[Pipeline] }
[Pipeline] // container
[Pipeline] }
[Pipeline] // stage
[Pipeline] }
[Pipeline] // node
[Pipeline] }
[Pipeline] // podTemplate
[Pipeline] End of Pipeline
Finished: SUCCESS

(actually 6 got this and a 7th succeeded without the log line)

All the other jobs that were frozen failed with:

java.io.IOException: interrupted while waiting for websocket connection
	at org.csanchez.jenkins.plugins.kubernetes.pipeline.ContainerExecDecorator$1.launch(ContainerExecDecorator.java:169)
	at hudson.Launcher$ProcStarter.start(Launcher.java:384)
	at org.jenkinsci.plugins.durabletask.BourneShellScript.launchWithCookie(BourneShellScript.java:157)
	at org.jenkinsci.plugins.durabletask.FileMonitoringTask.launch(FileMonitoringTask.java:64)
	at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.start(DurableTaskStep.java:167)
	at org.jenkinsci.plugins.workflow.cps.DSL.invokeStep(DSL.java:224)
	at org.jenkinsci.plugins.workflow.cps.DSL.invokeMethod(DSL.java:150)
	at org.jenkinsci.plugins.workflow.cps.CpsScript.invokeMethod(CpsScript.java:108)
	at sun.reflect.GeneratedMethodAccessor199.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:93)
	at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325)
	at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1218)
	at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1027)
	at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:42)
	at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:48)
	at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:113)
	at org.kohsuke.groovy.sandbox.impl.Checker$1.call(Checker.java:150)
	at org.kohsuke.groovy.sandbox.GroovyInterceptor.onMethodCall(GroovyInterceptor.java:23)
	at org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.SandboxInterceptor.onMethodCall(SandboxInterceptor.java:123)
	at org.kohsuke.groovy.sandbox.impl.Checker$1.call(Checker.java:148)
	at org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:152)
	at org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:122)
	at org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:122)
	at org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:122)
	at org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:122)
	at com.cloudbees.groovy.cps.sandbox.SandboxInvoker.methodCall(SandboxInvoker.java:16)
	at WorkflowScript.run(WorkflowScript:63)
	at ___cps.transform___(Native Method)
	at com.cloudbees.groovy.cps.impl.ContinuationGroup.methodCall(ContinuationGroup.java:57)
	at com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.dispatchOrArg(FunctionCallBlock.java:109)
	at com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.fixArg(FunctionCallBlock.java:82)
	at sun.reflect.GeneratedMethodAccessor151.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at com.cloudbees.groovy.cps.impl.ContinuationPtr$ContinuationImpl.receive(ContinuationPtr.java:72)
	at com.cloudbees.groovy.cps.impl.CollectionLiteralBlock$ContinuationImpl.dispatch(CollectionLiteralBlock.java:55)
	at com.cloudbees.groovy.cps.impl.CollectionLiteralBlock$ContinuationImpl.item(CollectionLiteralBlock.java:45)
	at sun.reflect.GeneratedMethodAccessor148.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at com.cloudbees.groovy.cps.impl.ContinuationPtr$ContinuationImpl.receive(ContinuationPtr.java:72)
	at com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.dispatchOrArg(FunctionCallBlock.java:103)
	at com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.fixArg(FunctionCallBlock.java:82)
	at sun.reflect.GeneratedMethodAccessor151.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at com.cloudbees.groovy.cps.impl.ContinuationPtr$ContinuationImpl.receive(ContinuationPtr.java:72)
	at com.cloudbees.groovy.cps.impl.ContinuationGroup.methodCall(ContinuationGroup.java:60)
	at com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.dispatchOrArg(FunctionCallBlock.java:109)
	at com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.fixArg(FunctionCallBlock.java:82)
	at sun.reflect.GeneratedMethodAccessor151.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at com.cloudbees.groovy.cps.impl.ContinuationPtr$ContinuationImpl.receive(ContinuationPtr.java:72)
	at com.cloudbees.groovy.cps.impl.ConstantBlock.eval(ConstantBlock.java:21)
	at com.cloudbees.groovy.cps.Next.step(Next.java:83)
	at com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:173)
	at com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:162)
	at org.codehaus.groovy.runtime.GroovyCategorySupport$ThreadCategoryInfo.use(GroovyCategorySupport.java:122)
	at org.codehaus.groovy.runtime.GroovyCategorySupport.use(GroovyCategorySupport.java:261)
	at com.cloudbees.groovy.cps.Continuable.run0(Continuable.java:162)
	at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.access$001(SandboxContinuable.java:19)
	at org.jenkinsci.plugins.workflow.cps.SandboxContinuable$1.call(SandboxContinuable.java:35)
	at org.jenkinsci.plugins.workflow.cps.SandboxContinuable$1.call(SandboxContinuable.java:32)
	at org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.GroovySandbox.runInSandbox(GroovySandbox.java:108)
	at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.run0(SandboxContinuable.java:32)
	at org.jenkinsci.plugins.workflow.cps.CpsThread.runNextChunk(CpsThread.java:174)
	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:330)
	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.access$100(CpsThreadGroup.java:82)
	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:242)
	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:230)
	at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:64)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
	at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
Finished: FAILURE
@marvinthepa

This comment has been minimized.

Show comment
Hide comment
@marvinthepa

marvinthepa Aug 3, 2017

@michaelajr

Thanks for the test. The exception you are getting now would likely have been the "pipe not connected" error before this. This means that Jenkins takes more than 10 seconds to connect to kubernetes and start executing the command. Unfortunately, Jenkins will just interrupt us in this case...

Are you able to paste a minimal example that you are using? That might help, as I was not able to trigger this again just running the same build 10 times...

Probably we are hitting some other limit here, I can't imagine that it takes so long to start executing the command.

Maybe the thread pool is exausted, maybe we should try re-using connections instead of opening a new one for each 'ps' that Jenkins wants us to execute... will investigate tomorrow.

marvinthepa commented Aug 3, 2017

@michaelajr

Thanks for the test. The exception you are getting now would likely have been the "pipe not connected" error before this. This means that Jenkins takes more than 10 seconds to connect to kubernetes and start executing the command. Unfortunately, Jenkins will just interrupt us in this case...

Are you able to paste a minimal example that you are using? That might help, as I was not able to trigger this again just running the same build 10 times...

Probably we are hitting some other limit here, I can't imagine that it takes so long to start executing the command.

Maybe the thread pool is exausted, maybe we should try re-using connections instead of opening a new one for each 'ps' that Jenkins wants us to execute... will investigate tomorrow.

@marvinthepa

This comment has been minimized.

Show comment
Hide comment
@marvinthepa

marvinthepa Aug 4, 2017

Added the implementation of kill.

Will now investigate @michaelajr's issues.

Maybe it helps to add the suggested fix for waitUntilContainerIsReady in marvinthepa@996952c#commitcomment-23453964. I will try that first.

marvinthepa commented Aug 4, 2017

Added the implementation of kill.

Will now investigate @michaelajr's issues.

Maybe it helps to add the suggested fix for waitUntilContainerIsReady in marvinthepa@996952c#commitcomment-23453964. I will try that first.

@marvinthepa

This comment has been minimized.

Show comment
Hide comment
@marvinthepa

marvinthepa Aug 4, 2017

@michaelajr I can reproduce your problems, however only if I run A LOT of jobs at exactly the same time - e.g. 4 of 30 jobs fail..

marvinthepa commented Aug 4, 2017

@michaelajr I can reproduce your problems, however only if I run A LOT of jobs at exactly the same time - e.g. 4 of 30 jobs fail..

@michaelajr

This comment has been minimized.

Show comment
Hide comment
@michaelajr

michaelajr Aug 4, 2017

@marvinthepa We saw a greater failure ratio yesterday. We definitely need to run/start multiple jobs at or around the same time. We'll test again this morning with the new code.

michaelajr commented Aug 4, 2017

@marvinthepa We saw a greater failure ratio yesterday. We definitely need to run/start multiple jobs at or around the same time. We'll test again this morning with the new code.

@marvinthepa

This comment has been minimized.

Show comment
Hide comment
@marvinthepa

marvinthepa Aug 4, 2017

@iocanel:

I saw a few "Pumpers" hanging around without 1fafc92 (only about 2 for 100 builds or so - I am testing a lot), that's why I applied it. Seems to work so far.

marvinthepa commented Aug 4, 2017

@iocanel:

I saw a few "Pumpers" hanging around without 1fafc92 (only about 2 for 100 builds or so - I am testing a lot), that's why I applied it. Seems to work so far.

@austinmoore-

This comment has been minimized.

Show comment
Hide comment
@austinmoore-

austinmoore- Aug 4, 2017

@marvinthepa Hey, I work with @michaelajr. I just tested the new commits you've just added.

I have a pipeline job that just sleeps for 300 seconds: https://gist.github.com/austinmoore-/b26b61d65072400b64ec205ba31628cb

First, I ran 15 instances of this job. 3/15 of the jobs failed with the IOException and stacktrace mentioned above.

After than I ran it again but 20 times. 7/20 jobs failed with the same IOException and stacktrace.

I can reproduce your problems, however only if I run A LOT of jobs at exactly the same time - e.g. 4 of 30 jobs fail..

I'm not sure, but the failures might only occur when running the same job. When I ran through our deployment jobs (8 simultaneous jobs), I did not encounter this error. Were you running the same job, or different jobs?

austinmoore- commented Aug 4, 2017

@marvinthepa Hey, I work with @michaelajr. I just tested the new commits you've just added.

I have a pipeline job that just sleeps for 300 seconds: https://gist.github.com/austinmoore-/b26b61d65072400b64ec205ba31628cb

First, I ran 15 instances of this job. 3/15 of the jobs failed with the IOException and stacktrace mentioned above.

After than I ran it again but 20 times. 7/20 jobs failed with the same IOException and stacktrace.

I can reproduce your problems, however only if I run A LOT of jobs at exactly the same time - e.g. 4 of 30 jobs fail..

I'm not sure, but the failures might only occur when running the same job. When I ran through our deployment jobs (8 simultaneous jobs), I did not encounter this error. Were you running the same job, or different jobs?

@michaelajr

This comment has been minimized.

Show comment
Hide comment
@michaelajr

michaelajr Aug 4, 2017

To add to @austinmoore- 's comment, under normal circumstances there are no jobs that run concurrent builds. It was just our test job where we executed concurrent builds, each with it's own EXCLUSIVE agent and label. The fact that we were able to run 8 of our REAL jobs at the same time without this issue is a huge step forward. So this is good news. Would be nice to understand why we are seeing such a high failure rate when we run concurrent builds of the same job.

michaelajr commented Aug 4, 2017

To add to @austinmoore- 's comment, under normal circumstances there are no jobs that run concurrent builds. It was just our test job where we executed concurrent builds, each with it's own EXCLUSIVE agent and label. The fact that we were able to run 8 of our REAL jobs at the same time without this issue is a huge step forward. So this is good news. Would be nice to understand why we are seeing such a high failure rate when we run concurrent builds of the same job.

@elementalvoid

This comment has been minimized.

Show comment
Hide comment
@elementalvoid

elementalvoid Aug 4, 2017

EDIT: The following is based on a local build from ed766ca.

To add some more experiences to this. It still appears the issue is related to running a job that uses more than one container in the Pod Template. We've found that when using a Pod Template that has greater than 1 container defined you're okay to run many parallel jobs (even the same job) so long as you don't use that extra container. As soon as you run commands in the additional containers we get the following exception and the job fails:

java.io.InterruptedIOException: interrupted while waiting for websocket connection
    at org.csanchez.jenkins.plugins.kubernetes.pipeline.ContainerExecDecorator$1.doLaunch(ContainerExecDecorator.java:179)
    at org.csanchez.jenkins.plugins.kubernetes.pipeline.ContainerExecDecorator$1.launch(ContainerExecDecorator.java:108)

What we've been using for testing is the following Pipeline:

Integer delaySec = 60

timestamps {
    node {
          for(i in [1,2,3,4,5]) {
                println "starting sleep ${delaySec} iteration: ${i}"
                stage("sleep ${i}") {
                        container('mk-foo') {
                            sh("sleep ${delaySec} && pwd")
                        }
                }
                stage("ls ${i}") {
                        container('mk-foo') {
                            sh('ls')
                        }
                }
                println "done sleeping: iteration: ${i}"
          }
    }
}

mk-foo is our extra container -- it's just a node:6 container with no special config.

Then we click the build button ten or so times. In some cases the loop will work but the ls command will take approximately one minute to execute (I suspect a timeout somewhere). In other cases they'll fail as soon as the exception occurs.

elementalvoid commented Aug 4, 2017

EDIT: The following is based on a local build from ed766ca.

To add some more experiences to this. It still appears the issue is related to running a job that uses more than one container in the Pod Template. We've found that when using a Pod Template that has greater than 1 container defined you're okay to run many parallel jobs (even the same job) so long as you don't use that extra container. As soon as you run commands in the additional containers we get the following exception and the job fails:

java.io.InterruptedIOException: interrupted while waiting for websocket connection
    at org.csanchez.jenkins.plugins.kubernetes.pipeline.ContainerExecDecorator$1.doLaunch(ContainerExecDecorator.java:179)
    at org.csanchez.jenkins.plugins.kubernetes.pipeline.ContainerExecDecorator$1.launch(ContainerExecDecorator.java:108)

What we've been using for testing is the following Pipeline:

Integer delaySec = 60

timestamps {
    node {
          for(i in [1,2,3,4,5]) {
                println "starting sleep ${delaySec} iteration: ${i}"
                stage("sleep ${i}") {
                        container('mk-foo') {
                            sh("sleep ${delaySec} && pwd")
                        }
                }
                stage("ls ${i}") {
                        container('mk-foo') {
                            sh('ls')
                        }
                }
                println "done sleeping: iteration: ${i}"
          }
    }
}

mk-foo is our extra container -- it's just a node:6 container with no special config.

Then we click the build button ten or so times. In some cases the loop will work but the ls command will take approximately one minute to execute (I suspect a timeout somewhere). In other cases they'll fail as soon as the exception occurs.

@marvinthepa

This comment has been minimized.

Show comment
Hide comment
@marvinthepa

marvinthepa Aug 7, 2017

I think I found another (probably big) part of the problem.

OkHttpClient per default only runs 5 concurrent requests against the same hostname, which is probably to little when you are constantly opening websocket connections to that one host.

Seems like I will have to open another pull request to make this configurable in the kubernetes-client, but I will verify my assumption first.

marvinthepa commented Aug 7, 2017

I think I found another (probably big) part of the problem.

OkHttpClient per default only runs 5 concurrent requests against the same hostname, which is probably to little when you are constantly opening websocket connections to that one host.

Seems like I will have to open another pull request to make this configurable in the kubernetes-client, but I will verify my assumption first.

@iocanel

This comment has been minimized.

Show comment
Hide comment
@iocanel

iocanel Aug 7, 2017

Given that each pod is meant to serve a single job, I can't see why we would possibly need more than 2 concurrent concurrent connection to a single pod (one for the sh part and one for the probe). If we don't have any leaking, we should be just fine.

iocanel commented Aug 7, 2017

Given that each pod is meant to serve a single job, I can't see why we would possibly need more than 2 concurrent concurrent connection to a single pod (one for the sh part and one for the probe). If we don't have any leaking, we should be just fine.

@marvinthepa

This comment has been minimized.

Show comment
Hide comment
@marvinthepa

marvinthepa Aug 7, 2017

@iocanel: It is not about connections to the pod - it is about connections to KubernetesCloud.serverUrl, which is the same for all builds.

marvinthepa commented Aug 7, 2017

@iocanel: It is not about connections to the pod - it is about connections to KubernetesCloud.serverUrl, which is the same for all builds.

@iocanel

This comment has been minimized.

Show comment
Hide comment
@iocanel

iocanel Aug 7, 2017

You are right! I think I need more coffee !!!

iocanel commented Aug 7, 2017

You are right! I think I need more coffee !!!

@marvinthepa

This comment has been minimized.

Show comment
Hide comment
@marvinthepa

marvinthepa Aug 7, 2017

Here you go: ☕️ .

marvinthepa commented Aug 7, 2017

Here you go: ☕️ .

@JeanMertz

This comment has been minimized.

Show comment
Hide comment
@JeanMertz

JeanMertz Aug 7, 2017

Really excited to see the progress being made here, the investigation being done by @marvinthepa, and the help from @iocanel. Keep up the good work 👍

I'm a colleague of @soudy, who initially reported JENKINS-40825, so if there's anything that we can do to help out, let use now.

Also: ☕️

JeanMertz commented Aug 7, 2017

Really excited to see the progress being made here, the investigation being done by @marvinthepa, and the help from @iocanel. Keep up the good work 👍

I'm a colleague of @soudy, who initially reported JENKINS-40825, so if there's anything that we can do to help out, let use now.

Also: ☕️

@marvinthepa

This comment has been minimized.

Show comment
Hide comment
@marvinthepa

marvinthepa Aug 7, 2017

@iocanel: I added two new commits:

  • Added the changed waiting behavior client.pods().inNamespace(namespace).withName(podName).waitUntilReady(CONTAINER_WAIT_TIMEOUT_SECONDS, TimeUnit.SECONDS)
  • Made the number of concurrent requests to Kubernetes configurable

Could I kindly ask you to review those changes?

I am especially unsure about the configuration in the second one - don't know if it makes sense to make it configurable from the GUI.

I think it should definitely be added to the Config object in the kubernetes-client, as reaching into the KubernetesClient feels dirty:

client.getHttpClient().dispatcher().setMaxRequestsPerHost(maxRequestsPerHost);

Then we could also set it via Java system property or environment variable (instead of or in addition to the GUI).
In any way I think we should provide a more sensible default than 5 - I set it to 32 for now as that worked reasonably well in my tests, and I guess it should not overwhelm the kubernetes master.

Please let me know what you think. Always happy to incorporate improvements.

marvinthepa commented Aug 7, 2017

@iocanel: I added two new commits:

  • Added the changed waiting behavior client.pods().inNamespace(namespace).withName(podName).waitUntilReady(CONTAINER_WAIT_TIMEOUT_SECONDS, TimeUnit.SECONDS)
  • Made the number of concurrent requests to Kubernetes configurable

Could I kindly ask you to review those changes?

I am especially unsure about the configuration in the second one - don't know if it makes sense to make it configurable from the GUI.

I think it should definitely be added to the Config object in the kubernetes-client, as reaching into the KubernetesClient feels dirty:

client.getHttpClient().dispatcher().setMaxRequestsPerHost(maxRequestsPerHost);

Then we could also set it via Java system property or environment variable (instead of or in addition to the GUI).
In any way I think we should provide a more sensible default than 5 - I set it to 32 for now as that worked reasonably well in my tests, and I guess it should not overwhelm the kubernetes master.

Please let me know what you think. Always happy to incorporate improvements.

@austinmoore-

This comment has been minimized.

Show comment
Hide comment
@austinmoore-

austinmoore- Aug 7, 2017

@marvinthepa @iocanel Hey, I just tried out the new changes. I ran 20 jobs, and they all succeeded with no issue! This seems to fix all of our issues. This really great! We'll continue to test this.

austinmoore- commented Aug 7, 2017

@marvinthepa @iocanel Hey, I just tried out the new changes. I ran 20 jobs, and they all succeeded with no issue! This seems to fix all of our issues. This really great! We'll continue to test this.

@elementalvoid

This comment has been minimized.

Show comment
Hide comment
@elementalvoid

elementalvoid Aug 7, 2017

We've also tested the latest changes with the pipeline I described before. 100 Jobs with no issues! We're getting ready to push this out to our main Jenkins instance. I'll let you know how it goes.

Thanks for the quick work on this @marvinthepa.

elementalvoid commented Aug 7, 2017

We've also tested the latest changes with the pipeline I described before. 100 Jobs with no issues! We're getting ready to push this out to our main Jenkins instance. I'll let you know how it goes.

Thanks for the quick work on this @marvinthepa.

@elementalvoid

This comment has been minimized.

Show comment
Hide comment
@elementalvoid

elementalvoid Aug 7, 2017

I have good news: We've upgraded our main Jenkins master with a plugin built from this PR and all is looking well. Prior to this we would begin to experience the broken pipe issue with a single handful of Jobs. Since the upgrade we've been running approximately 15 Jobs at minimum and about 22 max -- all have either succeeded or failed for legitimate reasons. We're no longer experiencing the broken pipe issue.

Thanks again for the quick fix!

elementalvoid commented Aug 7, 2017

I have good news: We've upgraded our main Jenkins master with a plugin built from this PR and all is looking well. Prior to this we would begin to experience the broken pipe issue with a single handful of Jobs. Since the upgrade we've been running approximately 15 Jobs at minimum and about 22 max -- all have either succeeded or failed for legitimate reasons. We're no longer experiencing the broken pipe issue.

Thanks again for the quick fix!

@michaelajr

This comment has been minimized.

Show comment
Hide comment
@michaelajr

michaelajr Aug 7, 2017

We too have our production Jenkins running this fix. Working great.

michaelajr commented Aug 7, 2017

We too have our production Jenkins running this fix. Working great.

@iocanel

This comment has been minimized.

Show comment
Hide comment
@iocanel

iocanel Aug 7, 2017

That's excellent news.

I am cutting a kubernetes-client 2.6.0 release right now, that allows you to config max concurrent requests per host.

iocanel commented Aug 7, 2017

That's excellent news.

I am cutting a kubernetes-client 2.6.0 release right now, that allows you to config max concurrent requests per host.

@iocanel

This comment has been minimized.

Show comment
Hide comment
@iocanel

iocanel Aug 8, 2017

2.6.1 was released so we should align this pr accordingly.

iocanel commented Aug 8, 2017

2.6.1 was released so we should align this pr accordingly.

@iocanel

This comment has been minimized.

Show comment
Hide comment
@iocanel

iocanel Aug 8, 2017

Can you rebase, so that we can merge?

iocanel commented Aug 8, 2017

Can you rebase, so that we can merge?

@mikesplain

This comment has been minimized.

Show comment
Hide comment
@mikesplain

mikesplain Aug 8, 2017

FYI we've tested this with the 2.6.1 changes as well and all our issues seem fixed! Great job!!

mikesplain commented Aug 8, 2017

FYI we've tested this with the 2.6.1 changes as well and all our issues seem fixed! Great job!!

marvinthepa added some commits Aug 2, 2017

JENKINS-40825 close resource leak, fix broken pipe error
First of all, `Launcher.launch` is called not once, but again and again.
I.e. overwriting shared members of the enclosing class that a different
thread is still using (and only cleaning the ones that were written last).
That is a bad idea, and caused the thread and connection leak.

So, move them (`watch` and `proc`) to local variables, and close them
elsewhere:
* `proc.kill()` should be taken care of by the shell step (that needs to be
verified)
* `watch` was already cleaned up in `ContainerExecProc`
anyway.

Second of all, `waitQuietly` on the `started` latch was what caused
the "pipe not connected" errors:

In case of InterruptedException, it caused the launcher to just continue
as if `started` had already been counted down (i.e. the websocket connection
established), when in reality the thread  just interrupted itself because of a timeout
(probably from [`DurableTaskStep.check()`][1]).

[1]: https://github.com/jenkinsci/workflow-durable-task-step-plugin/blob/workflow-durable-task-step-2.13/src/main/java/org/jenkinsci/plugins/workflow/steps/durable_task/DurableTaskStep.java#L303
Implement `kill`
Strongly inspired by [Docker Pipeline Plugin][1], but sending
just a single command to the container to be faster.

The command only uses linux standard tools and works on
debian and alpine, so it should be sufficiently portable.

Gets rid of one TODO :-).

[1]: https://github.com/jenkinsci/docker-workflow-plugin/blob/45f0c04/src/main/java/org/jenkinsci/plugins/docker/workflow/WithContainerStep.java#L249,
Update Kubernetes client version as suggested by @iocanel
Seems to completely get rid of the resource leak regarding
InputStreamPumper.

marvinthepa added some commits Aug 7, 2017

Make number of concurrent requests to Kubernetes configurable
This solves the actual problem causing JENKINS-40825 - OkHttp limiting
itself to 5 requests per hosts, which is not sufficient when running
20 builds at once.
Each build uses a websocket connection for the command it runs,
and additionally spawns more connections to check if the command
is still running.
prevent excessive string creation
if log level is not set to FINEST..
Be more defensive about freeing resources
Although this *should* not happen, just free up the watches when the
ContainerStep finishes for good measure.
silence findbugs error
we should be fine if we get deserialized - how are we supposed to clean
up those procs in that case?
@marvinthepa

This comment has been minimized.

Show comment
Hide comment
@marvinthepa

marvinthepa Aug 9, 2017

@iocanel: done, please review again.

marvinthepa commented Aug 9, 2017

@iocanel: done, please review again.

@iocanel

iocanel approved these changes Aug 9, 2017

@iocanel iocanel merged commit 1eb72f9 into jenkinsci:master Aug 9, 2017

1 check passed

continuous-integration/jenkins/pr-merge This commit looks good
Details
@JeanMertz

This comment has been minimized.

Show comment
Hide comment
@JeanMertz

JeanMertz Aug 10, 2017

Unfortunately, with the latest Jenkins version, and the master branch version of this plugin, I am still seeing this issue occur:

ERROR: Execution failed
java.io.IOException: Pipe not connected
	at java.io.PipedOutputStream.write(PipedOutputStream.java:140)
	at java.io.OutputStream.write(OutputStream.java:75)
	at org.csanchez.jenkins.plugins.kubernetes.pipeline.ContainerExecDecorator$ClosableLauncher.launch(ContainerExecDecorator.java:179)
	at hudson.Launcher$ProcStarter.start(Launcher.java:450)
	at org.jenkinsci.plugins.durabletask.BourneShellScript.launchWithCookie(BourneShellScript.java:157)
	at org.jenkinsci.plugins.durabletask.FileMonitoringTask.launch(FileMonitoringTask.java:64)
	at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.start(DurableTaskStep.java:167)
	at org.jenkinsci.plugins.workflow.cps.DSL.invokeStep(DSL.java:224)
	at org.jenkinsci.plugins.workflow.cps.DSL.invokeMethod(DSL.java:150)
	at org.jenkinsci.plugins.workflow.cps.CpsScript.invokeMethod(CpsScript.java:108)
	at groovy.lang.MetaClassImpl.invokeMethodOnGroovyObject(MetaClassImpl.java:1278)
	at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1172)
	at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1022)
	at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:810)
	at groovy.lang.GroovyObjectSupport.invokeMethod(GroovyObjectSupport.java:46)
	at groovy.lang.MetaClassImpl.invokeMethodOnGroovyObject(MetaClassImpl.java:1278)
	at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1172)
	at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1022)
	at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:810)
	at groovy.lang.GroovyObjectSupport.invokeMethod(GroovyObjectSupport.java:46)
	at groovy.lang.MetaClassImpl.invokeMethodOnGroovyObject(MetaClassImpl.java:1278)
	at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1172)
	at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1022)
	at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:42)
	at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:48)
	at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:113)
	at com.cloudbees.groovy.cps.sandbox.DefaultInvoker.methodCall(DefaultInvoker.java:19)
	at h.instance(/var/jenkins_home/jobs/blendle/jobs/core-api/branches/nodeport/builds/8/libs/jenkins-pipeline-scripts/vars/h.groovy:197)
	at ___cps.transform___(Native Method)
	at com.cloudbees.groovy.cps.impl.ContinuationGroup.methodCall(ContinuationGroup.java:57)
	at com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.dispatchOrArg(FunctionCallBlock.java:109)
	at com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.fixArg(FunctionCallBlock.java:82)
	at sun.reflect.GeneratedMethodAccessor285.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at com.cloudbees.groovy.cps.impl.ContinuationPtr$ContinuationImpl.receive(ContinuationPtr.java:72)
	at com.cloudbees.groovy.cps.impl.ConstantBlock.eval(ConstantBlock.java:21)
	at com.cloudbees.groovy.cps.Next.step(Next.java:83)
	at com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:173)
	at com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:162)
	at org.codehaus.groovy.runtime.GroovyCategorySupport$ThreadCategoryInfo.use(GroovyCategorySupport.java:122)
	at org.codehaus.groovy.runtime.GroovyCategorySupport.use(GroovyCategorySupport.java:261)
	at com.cloudbees.groovy.cps.Continuable.run0(Continuable.java:162)
	at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.access$001(SandboxContinuable.java:19)
	at org.jenkinsci.plugins.workflow.cps.SandboxContinuable$1.call(SandboxContinuable.java:35)
	at org.jenkinsci.plugins.workflow.cps.SandboxContinuable$1.call(SandboxContinuable.java:32)
	at org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.GroovySandbox.runInSandbox(GroovySandbox.java:108)
	at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.run0(SandboxContinuable.java:32)
	at org.jenkinsci.plugins.workflow.cps.CpsThread.runNextChunk(CpsThread.java:174)
	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:330)
	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.access$100(CpsThreadGroup.java:82)
	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:242)
	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:230)
	at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:64)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
	at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

This stack trace was preceded by:

ERROR: Execution failed
java.io.IOException: Failed to execute shell script inside container [instance-1] of pod [jenkins-slave-zzgsj-vq16h]. Timed out waiting for container to become ready!
	at org.csanchez.jenkins.plugins.kubernetes.pipeline.ContainerExecDecorator$ClosableLauncher.launch(ContainerExecDecorator.java:113)
	at hudson.Launcher$ProcStarter.start(Launcher.java:450)
	at org.jenkinsci.plugins.durabletask.BourneShellScript.launchWithCookie(BourneShellScript.java:157)
	at org.jenkinsci.plugins.durabletask.FileMonitoringTask.launch(FileMonitoringTask.java:64)
	at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.start(DurableTaskStep.java:167)

This happened, even though the pod was clearly up and running (using kubectl describe).

JeanMertz commented Aug 10, 2017

Unfortunately, with the latest Jenkins version, and the master branch version of this plugin, I am still seeing this issue occur:

ERROR: Execution failed
java.io.IOException: Pipe not connected
	at java.io.PipedOutputStream.write(PipedOutputStream.java:140)
	at java.io.OutputStream.write(OutputStream.java:75)
	at org.csanchez.jenkins.plugins.kubernetes.pipeline.ContainerExecDecorator$ClosableLauncher.launch(ContainerExecDecorator.java:179)
	at hudson.Launcher$ProcStarter.start(Launcher.java:450)
	at org.jenkinsci.plugins.durabletask.BourneShellScript.launchWithCookie(BourneShellScript.java:157)
	at org.jenkinsci.plugins.durabletask.FileMonitoringTask.launch(FileMonitoringTask.java:64)
	at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.start(DurableTaskStep.java:167)
	at org.jenkinsci.plugins.workflow.cps.DSL.invokeStep(DSL.java:224)
	at org.jenkinsci.plugins.workflow.cps.DSL.invokeMethod(DSL.java:150)
	at org.jenkinsci.plugins.workflow.cps.CpsScript.invokeMethod(CpsScript.java:108)
	at groovy.lang.MetaClassImpl.invokeMethodOnGroovyObject(MetaClassImpl.java:1278)
	at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1172)
	at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1022)
	at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:810)
	at groovy.lang.GroovyObjectSupport.invokeMethod(GroovyObjectSupport.java:46)
	at groovy.lang.MetaClassImpl.invokeMethodOnGroovyObject(MetaClassImpl.java:1278)
	at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1172)
	at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1022)
	at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:810)
	at groovy.lang.GroovyObjectSupport.invokeMethod(GroovyObjectSupport.java:46)
	at groovy.lang.MetaClassImpl.invokeMethodOnGroovyObject(MetaClassImpl.java:1278)
	at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1172)
	at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1022)
	at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:42)
	at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:48)
	at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:113)
	at com.cloudbees.groovy.cps.sandbox.DefaultInvoker.methodCall(DefaultInvoker.java:19)
	at h.instance(/var/jenkins_home/jobs/blendle/jobs/core-api/branches/nodeport/builds/8/libs/jenkins-pipeline-scripts/vars/h.groovy:197)
	at ___cps.transform___(Native Method)
	at com.cloudbees.groovy.cps.impl.ContinuationGroup.methodCall(ContinuationGroup.java:57)
	at com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.dispatchOrArg(FunctionCallBlock.java:109)
	at com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.fixArg(FunctionCallBlock.java:82)
	at sun.reflect.GeneratedMethodAccessor285.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at com.cloudbees.groovy.cps.impl.ContinuationPtr$ContinuationImpl.receive(ContinuationPtr.java:72)
	at com.cloudbees.groovy.cps.impl.ConstantBlock.eval(ConstantBlock.java:21)
	at com.cloudbees.groovy.cps.Next.step(Next.java:83)
	at com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:173)
	at com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:162)
	at org.codehaus.groovy.runtime.GroovyCategorySupport$ThreadCategoryInfo.use(GroovyCategorySupport.java:122)
	at org.codehaus.groovy.runtime.GroovyCategorySupport.use(GroovyCategorySupport.java:261)
	at com.cloudbees.groovy.cps.Continuable.run0(Continuable.java:162)
	at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.access$001(SandboxContinuable.java:19)
	at org.jenkinsci.plugins.workflow.cps.SandboxContinuable$1.call(SandboxContinuable.java:35)
	at org.jenkinsci.plugins.workflow.cps.SandboxContinuable$1.call(SandboxContinuable.java:32)
	at org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.GroovySandbox.runInSandbox(GroovySandbox.java:108)
	at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.run0(SandboxContinuable.java:32)
	at org.jenkinsci.plugins.workflow.cps.CpsThread.runNextChunk(CpsThread.java:174)
	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:330)
	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.access$100(CpsThreadGroup.java:82)
	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:242)
	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:230)
	at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:64)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
	at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

This stack trace was preceded by:

ERROR: Execution failed
java.io.IOException: Failed to execute shell script inside container [instance-1] of pod [jenkins-slave-zzgsj-vq16h]. Timed out waiting for container to become ready!
	at org.csanchez.jenkins.plugins.kubernetes.pipeline.ContainerExecDecorator$ClosableLauncher.launch(ContainerExecDecorator.java:113)
	at hudson.Launcher$ProcStarter.start(Launcher.java:450)
	at org.jenkinsci.plugins.durabletask.BourneShellScript.launchWithCookie(BourneShellScript.java:157)
	at org.jenkinsci.plugins.durabletask.FileMonitoringTask.launch(FileMonitoringTask.java:64)
	at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.start(DurableTaskStep.java:167)

This happened, even though the pod was clearly up and running (using kubectl describe).

@marvinthepa

This comment has been minimized.

Show comment
Hide comment
@marvinthepa

marvinthepa Aug 11, 2017

@JeanMertz:
I don't think so.

at org.csanchez.jenkins.plugins.kubernetes.pipeline.ContainerExecDecorator$ClosableLauncher.launch(ContainerExecDecorator.java:179)

ClosableLauncher does not exist in this pull request. Are you sure you were building from jenkinsci/kubernetes-plugin/master, and not from some other branch? Looks like you are running a plugin built from #180, not this pull request.

marvinthepa commented Aug 11, 2017

@JeanMertz:
I don't think so.

at org.csanchez.jenkins.plugins.kubernetes.pipeline.ContainerExecDecorator$ClosableLauncher.launch(ContainerExecDecorator.java:179)

ClosableLauncher does not exist in this pull request. Are you sure you were building from jenkinsci/kubernetes-plugin/master, and not from some other branch? Looks like you are running a plugin built from #180, not this pull request.

@JeanMertz

This comment has been minimized.

Show comment
Hide comment
@JeanMertz

JeanMertz Aug 11, 2017

Hmm. Good find @marvinthepa. In fact, I used the new Jenkins integration in this repo to download the hpi from here:

https://ci.jenkins.io/blue/organizations/jenkins/Plugins%2Fkubernetes-plugin/detail/master/10/artifacts

But perhaps something didn't go right there. I'll check and get back to you.

JeanMertz commented Aug 11, 2017

Hmm. Good find @marvinthepa. In fact, I used the new Jenkins integration in this repo to download the hpi from here:

https://ci.jenkins.io/blue/organizations/jenkins/Plugins%2Fkubernetes-plugin/detail/master/10/artifacts

But perhaps something didn't go right there. I'll check and get back to you.

@JeanMertz

This comment has been minimized.

Show comment
Hide comment
@JeanMertz

JeanMertz Aug 11, 2017

I just checked on our CI, but it does look like the HPI is correctly built from the latest master branch:

Kubernetes plugin 0.13-SNAPSHOT (private-ec238be6-win2012-6b7d90$)

screen shot 2017-08-11 at 11 59 37

(not quite sure what the win2012 implies, though). I'll try to rebuild manually from my local machine and see if that changes things.

JeanMertz commented Aug 11, 2017

I just checked on our CI, but it does look like the HPI is correctly built from the latest master branch:

Kubernetes plugin 0.13-SNAPSHOT (private-ec238be6-win2012-6b7d90$)

screen shot 2017-08-11 at 11 59 37

(not quite sure what the win2012 implies, though). I'll try to rebuild manually from my local machine and see if that changes things.

@marvinthepa

This comment has been minimized.

Show comment
Hide comment
@marvinthepa

marvinthepa Aug 11, 2017

@JeanMertz stupid question, but did you actually restart your Jenkins?

marvinthepa commented Aug 11, 2017

@JeanMertz stupid question, but did you actually restart your Jenkins?

@JeanMertz

This comment has been minimized.

Show comment
Hide comment
@JeanMertz

JeanMertz Aug 13, 2017

UPDATE actually, scratch that. below results where from yesterday. I just ran 3 batches of 10 concurrent jobs, and none of them failed. Not sure what happened before, but I can't seem to reproduce it anymore.


I'm not sure what caused this, but it seems to be working now 👍

However, starting 20 jobs, I do still get errors, not sure if they are related though:

java.net.ProtocolException: Expected HTTP 101 response but was '500 Internal Server Error'
	at okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:216)
	at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:183)
	at okhttp3.RealCall$AsyncCall.execute(RealCall.java:141)
	at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)


io.fabric8.kubernetes.client.KubernetesClientException: error dialing backend: ssh: rejected: connect failed (Connection refused)
	at io.fabric8.kubernetes.client.dsl.internal.ExecWebSocketListener.onFailure(ExecWebSocketListener.java:215)
	at okhttp3.internal.ws.RealWebSocket.failWebSocket(RealWebSocket.java:543)
	at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:185)
	at okhttp3.RealCall$AsyncCall.execute(RealCall.java:141)
	at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

JeanMertz commented Aug 13, 2017

UPDATE actually, scratch that. below results where from yesterday. I just ran 3 batches of 10 concurrent jobs, and none of them failed. Not sure what happened before, but I can't seem to reproduce it anymore.


I'm not sure what caused this, but it seems to be working now 👍

However, starting 20 jobs, I do still get errors, not sure if they are related though:

java.net.ProtocolException: Expected HTTP 101 response but was '500 Internal Server Error'
	at okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:216)
	at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:183)
	at okhttp3.RealCall$AsyncCall.execute(RealCall.java:141)
	at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)


io.fabric8.kubernetes.client.KubernetesClientException: error dialing backend: ssh: rejected: connect failed (Connection refused)
	at io.fabric8.kubernetes.client.dsl.internal.ExecWebSocketListener.onFailure(ExecWebSocketListener.java:215)
	at okhttp3.internal.ws.RealWebSocket.failWebSocket(RealWebSocket.java:543)
	at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:185)
	at okhttp3.RealCall$AsyncCall.execute(RealCall.java:141)
	at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
@idogoren

This comment has been minimized.

Show comment
Hide comment
@idogoren

idogoren Aug 14, 2017

Hi,

When do you expect to release a new version of the plugin?
I've tried using version 0.11 & 0.12 but I'm still expirencing lots of "pipe not connected" and other issues.
When I've tested using 0.13-SNAPSHOT, it didn't reproduce for several runs so it looks much more stable.

idogoren commented Aug 14, 2017

Hi,

When do you expect to release a new version of the plugin?
I've tried using version 0.11 & 0.12 but I'm still expirencing lots of "pipe not connected" and other issues.
When I've tested using 0.13-SNAPSHOT, it didn't reproduce for several runs so it looks much more stable.

@gm42

This comment has been minimized.

Show comment
Hide comment
@gm42

gm42 Aug 15, 2017

I have a reproducible cascade of "Pipe not connected" errors when many build jobs (~20) are triggered, or simply when Jenkins is "unthawed" from some time being down.

With the 0.13-SNAPSHOT version I verify that the "Pipe not connected" problem is gone.

gm42 commented Aug 15, 2017

I have a reproducible cascade of "Pipe not connected" errors when many build jobs (~20) are triggered, or simply when Jenkins is "unthawed" from some time being down.

With the 0.13-SNAPSHOT version I verify that the "Pipe not connected" problem is gone.

@gm42

This comment has been minimized.

Show comment
Hide comment
@gm42

gm42 Aug 18, 2017

I have to report that although the pipe not connected problem is gone, containers fail with "script returned exit code 2". It might be due to reasons unrelated to this though.

gm42 commented Aug 18, 2017

I have to report that although the pipe not connected problem is gone, containers fail with "script returned exit code 2". It might be due to reasons unrelated to this though.

@marvinthepa

This comment has been minimized.

Show comment
Hide comment
@marvinthepa

marvinthepa Aug 18, 2017

@gm42 if it is -2 not 2, then this might be because Jenkins is not able to get the exit code of the script,
see https://github.com/jenkinsci/durable-task-plugin/blob/0e4656cbd2b8fbd44e42cf2a67decdb202255605/src/main/java/org/jenkinsci/plugins/durabletask/BourneShellScript.java#L208.

If you can reproduce this with a minimal example, it might be worthwhile to investigate this further, as it might point to another problem.

marvinthepa commented Aug 18, 2017

@gm42 if it is -2 not 2, then this might be because Jenkins is not able to get the exit code of the script,
see https://github.com/jenkinsci/durable-task-plugin/blob/0e4656cbd2b8fbd44e42cf2a67decdb202255605/src/main/java/org/jenkinsci/plugins/durabletask/BourneShellScript.java#L208.

If you can reproduce this with a minimal example, it might be worthwhile to investigate this further, as it might point to another problem.

@gm42

This comment has been minimized.

Show comment
Hide comment
@gm42

gm42 Aug 18, 2017

@marvinthepa oh yes, it is actually script returned exit code -2. I also have cases of 2 but those are regular.

Unfortunately all the logs are gone as I have recently rebuilt the Jenkins deployment, I will not be able to reproduce this - sorry, we are moving away from the plugin for now.

gm42 commented Aug 18, 2017

@marvinthepa oh yes, it is actually script returned exit code -2. I also have cases of 2 but those are regular.

Unfortunately all the logs are gone as I have recently rebuilt the Jenkins deployment, I will not be able to reproduce this - sorry, we are moving away from the plugin for now.

@gm42

This comment has been minimized.

Show comment
Hide comment
@gm42

gm42 Aug 18, 2017

This is how the notifications of build failures look like (5 different projects, obviously some cascade effect):

FAILED: Job 'some/project1/master [24]' (https://example.com/job/org/job/project1/job/master/24/) with
```script returned exit code -2```


[12:23] 
FAILED: Job 'some/project2/master [21]' (https://example.com/job/org/job/project2/job/master/21/) with
```script returned exit code -2```


[12:23] 
FAILED: Job 'some/project3/master [16]' (https://example.com/job/org/job/project3/job/master/16/) with
```script returned exit code -2```


[12:23] 
FAILED: Job 'some/project4/master [23]' (https://example.com/job/org/job/project4/job/master/23/) with
```script returned exit code -2```


[12:23] 
FAILED: Job 'some/project5/master [43]' (https://example.com/job/org/job/project5/job/master/43/) with
```script returned exit code -2```

All failing very closely.

gm42 commented Aug 18, 2017

This is how the notifications of build failures look like (5 different projects, obviously some cascade effect):

FAILED: Job 'some/project1/master [24]' (https://example.com/job/org/job/project1/job/master/24/) with
```script returned exit code -2```


[12:23] 
FAILED: Job 'some/project2/master [21]' (https://example.com/job/org/job/project2/job/master/21/) with
```script returned exit code -2```


[12:23] 
FAILED: Job 'some/project3/master [16]' (https://example.com/job/org/job/project3/job/master/16/) with
```script returned exit code -2```


[12:23] 
FAILED: Job 'some/project4/master [23]' (https://example.com/job/org/job/project4/job/master/23/) with
```script returned exit code -2```


[12:23] 
FAILED: Job 'some/project5/master [43]' (https://example.com/job/org/job/project5/job/master/43/) with
```script returned exit code -2```

All failing very closely.

@marvinthepa marvinthepa deleted the marvinthepa:bugfix/JENKINS-40825-broken-pipe-now-really branch Aug 21, 2017

@idogoren

This comment has been minimized.

Show comment
Hide comment
@idogoren

idogoren Aug 23, 2017

Hi,

I've managed to reproduce the script returned exit code -2 issue with a very simple example.
I'm using the kubernetes plugin with a snaphot version of 0.13, Jenkins version is 2.60.2.
My Jenkinsfile is:

`def a = [
'test1': {
getPod('Pod1')
{
node('Pod1') {
container('devx') {
sh "echo hello world"
}
}
}
},
'test2': {
getPod('Pod2')
{
node('Pod2') {
container('devx') {
sh "echo hello world"
}
}
}
},
'test3': {
getPod('Pod3')
{
node('Pod3') {
container('devx') {
sh "echo hello world"
}
}
}
},
'test4': {
getPod('Pod4')
{
node('Pod4') {
container('devx') {
sh "echo hello world"
}
}
}
},
'test5': {
getPod('Pod5')
{
node('Pod5') {
container('devx') {
sh "echo hello world"
}
}
}
},
'test6': {
getPod('Pod6')
{
node('Pod6') {
container('devx') {
sh "echo hello world"
}
}
}
},
'test7': {
getPod('Pod7')
{
node('Pod7') {
container('devx') {
sh "echo hello world"
}
}
}
},
'test8': {
getPod('Pod8')
{
node('Pod8') {
container('devx') {
sh "echo hello world"
}
}
}
},
'test9': {
getPod('Pod9')
{
node('Pod9') {
container('devx') {
sh "echo hello world"
}
}
}
},
'test10': {
getPod('Pod10')
{
node('Pod10') {
container('devx') {
sh "echo hello world"
}
}
}
},
'test11': {
getPod('Pod11')
{
node('Pod11') {
container('devx') {
sh "echo hello world"
}
}
}
},
'test12': {
getPod('Pod12')
{
node('Pod12') {
container('devx') {
sh "echo hello world"
}
}
}
},
'test13': {
getPod('Pod13')
{
node('Pod13') {
container('devx') {
sh "echo hello world"
}
}
}
},
'test14': {
getPod('Pod14')
{
node('Pod14') {
container('devx') {
sh "echo hello world"
}
}
}
},
'test15': {
getPod('Pod15')
{
node('Pod15') {
container('devx') {
sh "echo hello world"
}
}
}
},
'test16': {
getPod('Pod16')
{
node('Pod16') {
container('devx') {
sh "echo hello world"
}
}
}
},
'test17': {
getPod('Pod17')
{
node('Pod17') {
container('devx') {
sh "echo hello world"
}
}
}
}
]

parallel a
parallel a
parallel a

def getPod(podNum, body) {
node ('master'){
podTemplate(name: jaasOwner + '-jaas', label: podNum, containers: [
containerTemplate( name: 'jnlp',
image: 'jnlp:1.0.1',
args: '${computer.jnlpmac} ${computer.name}'
),
containerTemplate( name: 'devx',
image: 'ubuntu:16.04',
ttyEnabled: true,
privileged: true,
command: 'cat',
alwaysPullImage: true
)
], body)
}
}`

I'm using as internal jnlp container, which is part of my organization so you will need to change it to another one.
Other than that, you should be able to run it.

idogoren commented Aug 23, 2017

Hi,

I've managed to reproduce the script returned exit code -2 issue with a very simple example.
I'm using the kubernetes plugin with a snaphot version of 0.13, Jenkins version is 2.60.2.
My Jenkinsfile is:

`def a = [
'test1': {
getPod('Pod1')
{
node('Pod1') {
container('devx') {
sh "echo hello world"
}
}
}
},
'test2': {
getPod('Pod2')
{
node('Pod2') {
container('devx') {
sh "echo hello world"
}
}
}
},
'test3': {
getPod('Pod3')
{
node('Pod3') {
container('devx') {
sh "echo hello world"
}
}
}
},
'test4': {
getPod('Pod4')
{
node('Pod4') {
container('devx') {
sh "echo hello world"
}
}
}
},
'test5': {
getPod('Pod5')
{
node('Pod5') {
container('devx') {
sh "echo hello world"
}
}
}
},
'test6': {
getPod('Pod6')
{
node('Pod6') {
container('devx') {
sh "echo hello world"
}
}
}
},
'test7': {
getPod('Pod7')
{
node('Pod7') {
container('devx') {
sh "echo hello world"
}
}
}
},
'test8': {
getPod('Pod8')
{
node('Pod8') {
container('devx') {
sh "echo hello world"
}
}
}
},
'test9': {
getPod('Pod9')
{
node('Pod9') {
container('devx') {
sh "echo hello world"
}
}
}
},
'test10': {
getPod('Pod10')
{
node('Pod10') {
container('devx') {
sh "echo hello world"
}
}
}
},
'test11': {
getPod('Pod11')
{
node('Pod11') {
container('devx') {
sh "echo hello world"
}
}
}
},
'test12': {
getPod('Pod12')
{
node('Pod12') {
container('devx') {
sh "echo hello world"
}
}
}
},
'test13': {
getPod('Pod13')
{
node('Pod13') {
container('devx') {
sh "echo hello world"
}
}
}
},
'test14': {
getPod('Pod14')
{
node('Pod14') {
container('devx') {
sh "echo hello world"
}
}
}
},
'test15': {
getPod('Pod15')
{
node('Pod15') {
container('devx') {
sh "echo hello world"
}
}
}
},
'test16': {
getPod('Pod16')
{
node('Pod16') {
container('devx') {
sh "echo hello world"
}
}
}
},
'test17': {
getPod('Pod17')
{
node('Pod17') {
container('devx') {
sh "echo hello world"
}
}
}
}
]

parallel a
parallel a
parallel a

def getPod(podNum, body) {
node ('master'){
podTemplate(name: jaasOwner + '-jaas', label: podNum, containers: [
containerTemplate( name: 'jnlp',
image: 'jnlp:1.0.1',
args: '${computer.jnlpmac} ${computer.name}'
),
containerTemplate( name: 'devx',
image: 'ubuntu:16.04',
ttyEnabled: true,
privileged: true,
command: 'cat',
alwaysPullImage: true
)
], body)
}
}`

I'm using as internal jnlp container, which is part of my organization so you will need to change it to another one.
Other than that, you should be able to run it.

@marvinthepa

This comment has been minimized.

Show comment
Hide comment
@marvinthepa

marvinthepa Aug 23, 2017

@i059304

I took the freedom to simplify your example a little bit:

def a = (1..17).toList().collectEntries {
    def label = "test$it"
    [
            "test$it": {
                podTemplate(name: "pod_test_$it", label: label, containers: [
                        containerTemplate(name: 'devx',
                                image: 'ubuntu:16.04',
                                ttyEnabled: true,
                                command: 'cat',
                        )
                ]) {
                    node(label) {
                        container('devx') {
                            sh "echo hello world"
                        }
                    }
                }
            }
    ]
}

stage('a') {
    parallel a
}
stage('b') {
    parallel a
}
stage('c') {
    parallel a
}

However, this runs successfully for me.

marvinthepa commented Aug 23, 2017

@i059304

I took the freedom to simplify your example a little bit:

def a = (1..17).toList().collectEntries {
    def label = "test$it"
    [
            "test$it": {
                podTemplate(name: "pod_test_$it", label: label, containers: [
                        containerTemplate(name: 'devx',
                                image: 'ubuntu:16.04',
                                ttyEnabled: true,
                                command: 'cat',
                        )
                ]) {
                    node(label) {
                        container('devx') {
                            sh "echo hello world"
                        }
                    }
                }
            }
    ]
}

stage('a') {
    parallel a
}
stage('b') {
    parallel a
}
stage('c') {
    parallel a
}

However, this runs successfully for me.

@lionelve

This comment has been minimized.

Show comment
Hide comment
@lionelve

lionelve Aug 24, 2017

This does seem to fix the "pipe not connected" error. It was getting really bad so thanks!

lionelve commented Aug 24, 2017

This does seem to fix the "pipe not connected" error. It was getting really bad so thanks!

@jamiecoleman92

This comment has been minimized.

Show comment
Hide comment
@jamiecoleman92

jamiecoleman92 Aug 31, 2017

Any ideas when an official 0.13 release that will include this bug fix will be made available?

jamiecoleman92 commented Aug 31, 2017

Any ideas when an official 0.13 release that will include this bug fix will be made available?

@mikesplain

This comment has been minimized.

Show comment
Hide comment

mikesplain commented Sep 6, 2017

@jamiecoleman92

This comment has been minimized.

Show comment
Hide comment
@jamiecoleman92

jamiecoleman92 Sep 7, 2017

@mikesplain Thanks for letting me know. I am currently testing the new plugin in our pipeline as I type!

jamiecoleman92 commented Sep 7, 2017

@mikesplain Thanks for letting me know. I am currently testing the new plugin in our pipeline as I type!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment