-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[JENKINS-73056] cancel queue Item that really matches the pod causing the cancelling #1539
[JENKINS-73056] cancel queue Item that really matches the pod causing the cancelling #1539
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Idea seems right though implementation could be refined.
Deserves test coverage, which I think ought to be feasible here; it seems this method is called for several reasons, the most straightforward of which would be that the API request to create the pod in the first place was rejected, for example because you specified well-formed YAML which fails schema validation or some such constraint. So something like
parallel k8s: {
catchError(buildResult: 'SUCCESS') {
podTemplate(yaml: '…invalid…') {
node(POD_LABEL) {
sh 'false should never run'
}
}
echo 'cancelled pod item by now'
}
}, unrelated: {
node('special-agent') {}
}
should pass if the test waits for cancelled pod item by now
message and then creates a static agent named special-agent
, if I follow the bug report correctly.
String whyOnQueue = item.getWhy(); | ||
if(whyOnQueue.contains(metadata.getName())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems fragile. We can probably match based on the label request instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but the assignedLabel value and the why String have the same value
and the issue where jenkins/label does not match the assignedLabel String is in the case when the pod Name is using the fallback DEFAULT_AGENT_PREFIX ( because the given label does not match the regex set here https://github.com/jenkinsci/kubernetes-plugin/blob/40d4fb21bd8046082ecd88fc5cf860f0db8e54ee/src/main/java/org/csanchez/jenkins/plugins/kubernetes/KubernetesSlave.java#L313C14-L313C23 )
and the 'x' in the jenkins/label comes from here
Line 198 in 40d4fb2
input = input.replaceAll("[^_a-zA-Z0-9-]", "_").replaceFirst("^[^a-zA-Z0-9]", "x"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so, the question I have is why the queue.item assignedLabel not the pod name but the value of the label in jenkins/label instead..
Hi @jglick to be honest I am not very familiar of how to run tests for this plugin, But this is the pipeline that I use to reproduce the bug: ( the failing pod is where the image cannot be pulled - wrong registry ) String incorrectImage = "does.not.exist/jenkins/inbound-agent:3206.vb_15dcf73f6a_9-3-jdk11"
// on dev
String correctImage = "jenkins/inbound-agent:3206.vb_15dcf73f6a_9-3-jdk11"
String getPodYaml(String image){
return """
apiVersion: v1
kind: Pod
spec:
containers:
- name: jnlp
image: ${image}
imagePullPolicy: "Always"
tty: false
workingDir: "/home/jenkins/agent"
resources:
requests:
memory: 100Mi
cpu: 500m
restartPolicy: "Never"
volumes:
- emptyDir:
medium: ""
name: "workspace-volume"
"""
}
parallel (
failingPod: {
stage("failing pod") {
podTemplate(yaml: getPodYaml(incorrectImage)) {
node(POD_LABEL) {
sh 'false should never run'
}
}
}
},
correctPods: {
// keep spawning pods until eventually one of them catches the cancel queue item from the failing pod
List l = (1..10).toList()
l.each { num ->
stage("correctPod-${num}") {
podTemplate(yaml: getPodYaml(correctImage)) {
node(POD_LABEL) {
echo("correctPod${num}")
}
}
}
}
}
) and when executed this is what happens: I need probably some assistance in writing and executing such a test scenario for the plugin, can you help? ( excluding the failing pod branch from the pipeline results in no failures in the "correct-pods" branches ) |
4c7d18c
to
56c8cb3
Compare
56c8cb3
to
c5889c6
Compare
* Add KubernetesPipelineTest#cancelOnlyRelevantQueueItem to cover the use case * Remove duplicated sanitization method from `PodTemplate#sanitizeLabel` * Extract labels and annotations to constant for reuse
Thank you for your contribution, made some follow-ups and took the opportunity to clean up sanitization. Implemented the test proposed by @jglick |
thanks for the help @Vlatombe it seems spotbugs is still complaining, and I tried to fix it here: https://github.com/ysmaoui/kubernetes-plugin/pull/1/files |
…-pointer-bug spotbugs-detected possible nullpointer-exception
…queue-item-when-a-pod-fails
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, pinging @jglick for a second pair of eyes
* @throws AssertionError if the generated label is not valid | ||
*/ | ||
public static String sanitizeLabel(String input) { | ||
int max = /* Kubernetes limit */ 63 - /* hyphen */ 1 - /* suffix */ 5; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is actually from labelify
not from the function previous called sanitizeLabel
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think labelify and the previous sanitizeLabel methods are now merged into this new sanitizeLabel method of the PodTemplateUtils class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, both were doing very similar job, there is now only one method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This actually changes a behavior for labels that contains a .
. For example label release-1.2.3
resulted in jenkins/label: release-1.2.3
. Now it results in jenkins/label: release-1_2_3
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed the previous version handled [^_.a-zA-Z0-9-]
which is gone here. Seems like a regression; @Dohbedoh can you please make sure this is filed as an issue (if not a patch)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changes a behavior
But is that breaking anything?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, no, runUrl
Javadoc does not seem to match reality.
b294ac7
to
1ce44e4
Compare
1ce44e4
to
baa1004
Compare
JENKINS-73056
Testing done
Submitter checklist