Fix logic for waitUntilContainerIsReady #95

Merged
merged 1 commit into from Dec 1, 2016

Projects

None yet

5 participants

@philk
philk commented Nov 30, 2016

We were running into the issue the wait is meant to resolve so I tried it out. Looks like the logic got reversed somewhere along the way.

@philk philk Fix logic for waitUntilContainerIsReady adcdbed
@volkangurel

👍

@carlossg
carlossg commented Dec 1, 2016

ping @iocanel

@iocanel
iocanel commented Dec 1, 2016

@philk: Thanks for catching this. I tried that and it looks good!

@carlossg: +1

@carlossg carlossg merged commit 30f1074 into jenkinsci:master Dec 1, 2016

1 check passed

Jenkins This pull request looks good
Details
@JeanMertz

@philk @iocanel @carlossg I'm not sure this is entirely fixed.

Before, I was also seeing the timeout issues as described in #93 (comment).

Now, I've got a situation where I've got some sh logic wrapped in a retry block, and I'm seeing the timeout error happen the first time the retry block is executed, and then once the block is retried, the script moves forward as expected:

Still waiting to schedule task
kubernetes-cbc4406893db41a9aef2d9e5d526fca1-6e9ef3076ae06 is offline
Running on kubernetes-cbc4406893db41a9aef2d9e5d526fca1-6e9ef3076ae06 in /home/jenkins/workspace/blendle_core-api_jenkins-JAM5PI6ZOUNKGTGYGMINDN4KNK2PFRGQPKWNXCRMGAOHJME4OVZA
[Pipeline] {
[Pipeline] container
[Pipeline] {
[Pipeline] withEnv
[Pipeline] {
[Pipeline] retry
[Pipeline] {
[Pipeline] sleep
Sleeping for 1 sec
[Pipeline] sh
[blendle_core-api_jenkins-JAM5PI6ZOUNKGTGYGMINDN4KNK2PFRGQPKWNXCRMGAOHJME4OVZA] Running shell script
Waiting for container container [instance-1] of pod [kubernetes-cbc4406893db41a9aef2d9e5d526fca1-6e9ef3076ae06] to become ready.
[Pipeline] }
ERROR: Execution failed
java.io.IOException: Failed to execute shell script inside container [instance-1] of pod [kubernetes-cbc4406893db41a9aef2d9e5d526fca1-6e9ef3076ae06]. Timed out waiting for container to become ready!
	at org.csanchez.jenkins.plugins.kubernetes.pipeline.ContainerExecDecorator$1.launch(ContainerExecDecorator.java:89)
	at hudson.Launcher$ProcStarter.start(Launcher.java:383)
	at org.jenkinsci.plugins.durabletask.BourneShellScript.launchWithCookie(BourneShellScript.java:147)
	at org.jenkinsci.plugins.durabletask.FileMonitoringTask.launch(FileMonitoringTask.java:61)
	at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.start(DurableTaskStep.java:158)
Retrying
[Pipeline] {
[Pipeline] sleep
Sleeping for 1 sec
[Pipeline] sh

... from here on out the build continued as expected
  • I can consistently reproduce this by rerunning the job.
  • I can confirm that the containers are actually ready rather quickly
  • However, I can imagine the container not being ready immediately when the first sh is triggered, so some waiting is to be expected.

It's almost as if the watcher logic here never triggers the MODIFIED event, and thus it just waits for the timeout to finish.

The timeout does take quite a long time to happen (maybe 60 seconds?) before my retry logic kicks in.

@iocanel
iocanel commented Dec 1, 2016

@JeanMertz: You are right! The watcher implementation does not check the pod reference by the event, but the pod in its previous state (before) the watch! That will never work!

Let me fix that!

@iocanel
iocanel commented Dec 1, 2016

@JeanMertz: It should look like this: #97

Do you want to give it a spin?

@philk philk deleted the philk:fix-container-wait-logic branch Dec 1, 2016
@philk
philk commented Dec 1, 2016

Nice catch, we had a retry in our Jenkinsfile from before (when there was a race for the pod to become ready) so we didn't notice this. Thanks for the fix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment