New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[JENKINS-42048] Fix durable pipeline PID NumberFormatException #157
Conversation
Fix the command execution so the pid is correctly stored in the pid file Use nohup so processes can survive disconnection Add test for ContainerExecDecorator
The warnings come from https://github.com/jenkinsci/durable-task-plugin/blob/master/src/main/java/org/jenkinsci/plugins/durabletask/ProcessLiveness.java#L87 where it executes In any case that command will always return 0 in busybox, and maybe the same in other distros. Something to have into account |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks good to me!
proc = new ContainerExecProc(watch, alive, finished); | ||
return proc; | ||
} | ||
|
||
@Override | ||
public void kill(Map<String, String> modelEnvVars) throws IOException, InterruptedException { | ||
// String cookie = modelEnvVars.get(COOKIE_VAR); | ||
// TODO we need to use the cookie for something |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really understand that comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is what @jglick was referring to in the initial PR, IIUC we need to use the cookie to know what process to kill
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@iocanel do you know if there is a way to get the exit code of a command launched through the |
@carlossg: We could pass a listener when we create the exec: https://github.com/fabric8io/kubernetes-client/blob/80ef2ee12258fe3a3bddb5ea02c2033487c378d3/kubernetes-client/src/main/java/io/fabric8/kubernetes/client/dsl/ExecListener.java |
we already do that. But we are just piping chars to the server and reading the output. |
@carlossg: can't we just grab the value from the |
I guess that this code, is related to websockets rather than the exec itself. |
…and execution Because ContainerExecDecorator is reused
@iocanel can you take a look again, this is working for me in multicontainer example. |
Will look again shortly! |
I just tried this on my test server I and i'm seeing this warning popup more that i think it should be.
Edit here is the full log from all the org.csanchez.jenkins.plugins.kubernetes.pipeline.* Classes
Jun 13, 2017 4:43:26 PM FINE org.csanchez.jenkins.plugins.kubernetes.pipeline.ContainerStepExecution
Starting container step.
Jun 13, 2017 4:43:37 PM FINEST org.csanchez.jenkins.plugins.kubernetes.pipeline.ContainerExecDecorator
Executing shell script inside container [nomad] of pod [jenkins-slave-pk7xk-hln0x]
Jun 13, 2017 4:43:37 PM FINEST org.csanchez.jenkins.plugins.kubernetes.pipeline.ContainerExecDecorator
Executing command: "nohup" "sh" "-c" "echo \$\$ > '/home/jenkins/workspace/nomad_test/sources@tmp/durable-0cf6fafa/pid'; jsc=durable-8933846f6b831b7e2e6e882ff1d98d5c; JENKINS_SERVER_COOKIE=\$jsc '/home/jenkins/workspace/nomad_test/sources@tmp/durable-0cf6fafa/script.sh' > '/home/jenkins/workspace/nomad_test/sources@tmp/durable-0cf6fafa/jenkins-log.txt' 2>&1; echo \$? > '/home/jenkins/workspace/nomad_test/sources@tmp/durable-0cf6fafa/jenkins-result.txt'"
printf "EXITCODE %3d" $?; exit
Jun 13, 2017 4:43:37 PM FINEST org.csanchez.jenkins.plugins.kubernetes.pipeline.ContainerExecDecorator Jun 13, 2017 4:43:37 PM FINEST org.csanchez.jenkins.plugins.kubernetes.pipeline.ContainerExecProc Jun 13, 2017 4:43:37 PM FINEST org.csanchez.jenkins.plugins.kubernetes.pipeline.ContainerExecDecorator Jun 13, 2017 4:43:37 PM FINEST org.csanchez.jenkins.plugins.kubernetes.pipeline.ContainerExecProc Jun 13, 2017 4:43:37 PM FINEST org.csanchez.jenkins.plugins.kubernetes.pipeline.ContainerExecDecorator Jun 13, 2017 4:43:37 PM FINEST org.csanchez.jenkins.plugins.kubernetes.pipeline.ContainerExecDecorator My pipeline is very simple, i only pulled out the build part, the scm is working fine. stage('build') {
podTemplate(label: "mobile", inheritFrom: "base",
containers: [
containerTemplate(name: 'mobile', image: "<my registry>/mobile-builder:latest", ttyEnabled: true, command: 'cat', args: '')
]) {
node("mobile") {
container('mobile') {
dir('sources') {
unstash "mobile-sources"
sh """
tar -xzf mobile.tgz -C ../
"""
// it never gets to here, the tar is abruptly ended, but if i comment out the tar it works just fine.
deleteDir()
}
sh """
/build.sh -b v5_0 -n ${env.BUILD_NUMBER} -s qa -r ./mobile
cd mobile; tar czf ../mobile.tar.gz mobile
"""
stash includes: "mobile.tar.gz", name: 'mobile'
}
}
}
} |
I started running this code to avoid https://issues.jenkins-ci.org/browse/JENKINS-40825. It's definitely improved, but today I ran into the Here's a sample stack:
|
The printed EXITCODE may not be the last string in the output
Any idea when this will get merged in and released? |
@carlossg any idea when a new version containing this pull request will be released? |
I rebuilt the It definitely seems that the rate of occurrence is related to the concurrency -- i.e. the larger number of ongoing builds and the more container ongoing containers seems to increase the likelihood of seeing this error. We use the pipeline plugin with many parallelized builds so we often have many containerized builds ongoing at once -- I believe that is part of the reason we see this error so frequently.
|
JENKINS-42048
Fix the command execution so the pid is correctly stored in the pid file
Use nohup so processes can survive disconnection
Add test for ContainerExecDecorator
The problem is that we still need a way to get the exit code from the commands to return it in ContainerExecProc#join.
I don't see any other way than to print it to the output and parse it.
In the meantime there are a lot of warnings in the log