New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix logic for waitUntilContainerIsReady #95

Merged
merged 1 commit into from Dec 1, 2016

Conversation

Projects
None yet
5 participants
@philk

philk commented Nov 30, 2016

We were running into the issue the wait is meant to resolve so I tried it out. Looks like the logic got reversed somewhere along the way.

@volkangurel

This comment has been minimized.

Show comment
Hide comment
@volkangurel

volkangurel commented Nov 30, 2016

👍

@carlossg

This comment has been minimized.

Show comment
Hide comment
@carlossg

carlossg commented Dec 1, 2016

ping @iocanel

@iocanel

This comment has been minimized.

Show comment
Hide comment
@iocanel

iocanel Dec 1, 2016

@philk: Thanks for catching this. I tried that and it looks good!

@carlossg: +1

iocanel commented Dec 1, 2016

@philk: Thanks for catching this. I tried that and it looks good!

@carlossg: +1

@carlossg carlossg merged commit 30f1074 into jenkinsci:master Dec 1, 2016

1 check passed

Jenkins This pull request looks good
Details
@JeanMertz

This comment has been minimized.

Show comment
Hide comment
@JeanMertz

JeanMertz Dec 1, 2016

@philk @iocanel @carlossg I'm not sure this is entirely fixed.

Before, I was also seeing the timeout issues as described in #93 (comment).

Now, I've got a situation where I've got some sh logic wrapped in a retry block, and I'm seeing the timeout error happen the first time the retry block is executed, and then once the block is retried, the script moves forward as expected:

Still waiting to schedule task
kubernetes-cbc4406893db41a9aef2d9e5d526fca1-6e9ef3076ae06 is offline
Running on kubernetes-cbc4406893db41a9aef2d9e5d526fca1-6e9ef3076ae06 in /home/jenkins/workspace/blendle_core-api_jenkins-JAM5PI6ZOUNKGTGYGMINDN4KNK2PFRGQPKWNXCRMGAOHJME4OVZA
[Pipeline] {
[Pipeline] container
[Pipeline] {
[Pipeline] withEnv
[Pipeline] {
[Pipeline] retry
[Pipeline] {
[Pipeline] sleep
Sleeping for 1 sec
[Pipeline] sh
[blendle_core-api_jenkins-JAM5PI6ZOUNKGTGYGMINDN4KNK2PFRGQPKWNXCRMGAOHJME4OVZA] Running shell script
Waiting for container container [instance-1] of pod [kubernetes-cbc4406893db41a9aef2d9e5d526fca1-6e9ef3076ae06] to become ready.
[Pipeline] }
ERROR: Execution failed
java.io.IOException: Failed to execute shell script inside container [instance-1] of pod [kubernetes-cbc4406893db41a9aef2d9e5d526fca1-6e9ef3076ae06]. Timed out waiting for container to become ready!
	at org.csanchez.jenkins.plugins.kubernetes.pipeline.ContainerExecDecorator$1.launch(ContainerExecDecorator.java:89)
	at hudson.Launcher$ProcStarter.start(Launcher.java:383)
	at org.jenkinsci.plugins.durabletask.BourneShellScript.launchWithCookie(BourneShellScript.java:147)
	at org.jenkinsci.plugins.durabletask.FileMonitoringTask.launch(FileMonitoringTask.java:61)
	at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.start(DurableTaskStep.java:158)
Retrying
[Pipeline] {
[Pipeline] sleep
Sleeping for 1 sec
[Pipeline] sh

... from here on out the build continued as expected
  • I can consistently reproduce this by rerunning the job.
  • I can confirm that the containers are actually ready rather quickly
  • However, I can imagine the container not being ready immediately when the first sh is triggered, so some waiting is to be expected.

It's almost as if the watcher logic here never triggers the MODIFIED event, and thus it just waits for the timeout to finish.

The timeout does take quite a long time to happen (maybe 60 seconds?) before my retry logic kicks in.

JeanMertz commented Dec 1, 2016

@philk @iocanel @carlossg I'm not sure this is entirely fixed.

Before, I was also seeing the timeout issues as described in #93 (comment).

Now, I've got a situation where I've got some sh logic wrapped in a retry block, and I'm seeing the timeout error happen the first time the retry block is executed, and then once the block is retried, the script moves forward as expected:

Still waiting to schedule task
kubernetes-cbc4406893db41a9aef2d9e5d526fca1-6e9ef3076ae06 is offline
Running on kubernetes-cbc4406893db41a9aef2d9e5d526fca1-6e9ef3076ae06 in /home/jenkins/workspace/blendle_core-api_jenkins-JAM5PI6ZOUNKGTGYGMINDN4KNK2PFRGQPKWNXCRMGAOHJME4OVZA
[Pipeline] {
[Pipeline] container
[Pipeline] {
[Pipeline] withEnv
[Pipeline] {
[Pipeline] retry
[Pipeline] {
[Pipeline] sleep
Sleeping for 1 sec
[Pipeline] sh
[blendle_core-api_jenkins-JAM5PI6ZOUNKGTGYGMINDN4KNK2PFRGQPKWNXCRMGAOHJME4OVZA] Running shell script
Waiting for container container [instance-1] of pod [kubernetes-cbc4406893db41a9aef2d9e5d526fca1-6e9ef3076ae06] to become ready.
[Pipeline] }
ERROR: Execution failed
java.io.IOException: Failed to execute shell script inside container [instance-1] of pod [kubernetes-cbc4406893db41a9aef2d9e5d526fca1-6e9ef3076ae06]. Timed out waiting for container to become ready!
	at org.csanchez.jenkins.plugins.kubernetes.pipeline.ContainerExecDecorator$1.launch(ContainerExecDecorator.java:89)
	at hudson.Launcher$ProcStarter.start(Launcher.java:383)
	at org.jenkinsci.plugins.durabletask.BourneShellScript.launchWithCookie(BourneShellScript.java:147)
	at org.jenkinsci.plugins.durabletask.FileMonitoringTask.launch(FileMonitoringTask.java:61)
	at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.start(DurableTaskStep.java:158)
Retrying
[Pipeline] {
[Pipeline] sleep
Sleeping for 1 sec
[Pipeline] sh

... from here on out the build continued as expected
  • I can consistently reproduce this by rerunning the job.
  • I can confirm that the containers are actually ready rather quickly
  • However, I can imagine the container not being ready immediately when the first sh is triggered, so some waiting is to be expected.

It's almost as if the watcher logic here never triggers the MODIFIED event, and thus it just waits for the timeout to finish.

The timeout does take quite a long time to happen (maybe 60 seconds?) before my retry logic kicks in.

@iocanel

This comment has been minimized.

Show comment
Hide comment
@iocanel

iocanel Dec 1, 2016

@JeanMertz: You are right! The watcher implementation does not check the pod reference by the event, but the pod in its previous state (before) the watch! That will never work!

Let me fix that!

iocanel commented Dec 1, 2016

@JeanMertz: You are right! The watcher implementation does not check the pod reference by the event, but the pod in its previous state (before) the watch! That will never work!

Let me fix that!

@iocanel

This comment has been minimized.

Show comment
Hide comment
@iocanel

iocanel Dec 1, 2016

@JeanMertz: It should look like this: #97

Do you want to give it a spin?

iocanel commented Dec 1, 2016

@JeanMertz: It should look like this: #97

Do you want to give it a spin?

@philk philk deleted the philk:fix-container-wait-logic branch Dec 1, 2016

@philk

This comment has been minimized.

Show comment
Hide comment
@philk

philk Dec 1, 2016

Nice catch, we had a retry in our Jenkinsfile from before (when there was a race for the pod to become ready) so we didn't notice this. Thanks for the fix!

philk commented Dec 1, 2016

Nice catch, we had a retry in our Jenkinsfile from before (when there was a race for the pod to become ready) so we didn't notice this. Thanks for the fix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment