New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Log events in build log during pod provisioning #742
Log events in build log during pod provisioning #742
Conversation
Need to gracefully degrade if jenkins sa is not able to watch pod events. |
Looks like |
src/main/java/io/jenkins/plugins/kubernetes/TaskListenerEventWatcher.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing that has long bothered me: the build log receives a dump of the YAML of the pod if and when the agent launches successfully, but if the launch fails you have to either enable fine logging or have kubectl
access in order to see what pod definition Jenkins had tried to create. Is that still true here; and where does the YAML appear relative to these other messages?
src/main/java/io/jenkins/plugins/kubernetes/TaskListenerEventWatcher.java
Outdated
Show resolved
Hide resolved
src/main/java/io/jenkins/plugins/kubernetes/TaskListenerEventWatcher.java
Outdated
Show resolved
Hide resolved
src/main/java/io/jenkins/plugins/kubernetes/TaskListenerEventWatcher.java
Outdated
Show resolved
Hide resolved
src/main/java/org/csanchez/jenkins/plugins/kubernetes/KubernetesLauncher.java
Outdated
Show resolved
Hide resolved
src/main/java/org/csanchez/jenkins/plugins/kubernetes/KubernetesLauncher.java
Outdated
Show resolved
Hide resolved
src/main/java/org/csanchez/jenkins/plugins/kubernetes/KubernetesLauncher.java
Outdated
Show resolved
Hide resolved
template.getWorkspaceVolume().createVolume(client, pod.getMetadata()); | ||
watcher = new AllContainersRunningPodWatcher(client, pod, runListener); | ||
try (Watch _w = client.pods().inNamespace(namespace1).withName(podName).watch(watcher)) { | ||
try (Watch w1 = client.pods().inNamespace(namespace).withName(podName).watch(watcher); | ||
Watch w2 = client.events().inNamespace(namespace).withField("involvedObject.name", podName).watch(new TaskListenerEventWatcher(podName, runListener))) { | ||
watcher.await(template.getSlaveConnectTimeout(), TimeUnit.SECONDS); | ||
} catch (InvalidDockerImageException e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this special case still needed? I am not really sure what it was doing to begin with; looks suspicious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It fails the build if one of the docker images is invalid (#497). Ideally I think it would deserve a rework but it works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the build would fail anyway, without this clause; the exception would be thrown up out of launch
. Right? And with this PR, you would see the appropriate diagnostics right in the log. But the current code here does more, by canceling some queue items (which seem to be any node
blocks associated with this build, which is wrong by the way—should only be canceling the node
block associated with this pod). If that is needed in order to prevent the build from hanging, then it is needed in many other error cases too. So this does not add up for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, the launcher would just fail but this wouldn't affect the build so the build would wait forever.
should only be canceling the node block associated with this pod
probably.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think this is anything blocking the current PR but it looks like something to revisit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jglick I'm revisiting this, not sure how to find out item should be cancelled exactly.
The only correlation I can make seems to be based on label, since at this point the queue item is not yet assigned to the node. There could be several items waiting for a node if they are inside a parallel block or using a static pod template. So, cancel the first one only?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure.
226b97b
to
a60d030
Compare
Next build should be ok. https://ci.jenkins.io/job/Plugins/job/kubernetes-plugin/job/PR-742/8/display/redirect shows 3 failures without privilege to watch events, pretty much what I expected. |
That part isn't touched by this PR, so the yaml is printed once you have an executor. But when using the |
try { | ||
return client.events().inNamespace(namespace).withField("involvedObject.name", podName).watch(new TaskListenerEventWatcher(podName, runListener)); | ||
} catch (KubernetesClientException e) { | ||
LOGGER.log(Level.FINE, e, () -> "Cannot watch events on " + namespace + "/" +podName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is is reasonable to ask that this log line be written more readily (less restricted than FINE) and include a note/link to instructions for the required level of access needed to watch pod events?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this seems like it could be at INFO
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or printed to the runListener
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Vlatombe force-pushed
Broke incremental review, so I guess I will need to review from scratch.
try { | ||
return client.events().inNamespace(namespace).withField("involvedObject.name", podName).watch(new TaskListenerEventWatcher(podName, runListener)); | ||
} catch (KubernetesClientException e) { | ||
LOGGER.log(Level.FINE, e, () -> "Cannot watch events on " + namespace + "/" +podName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this seems like it could be at INFO
.
template.getWorkspaceVolume().createVolume(client, pod.getMetadata()); | ||
watcher = new AllContainersRunningPodWatcher(client, pod, runListener); | ||
try (Watch _w = client.pods().inNamespace(namespace1).withName(podName).watch(watcher)) { | ||
try (Watch w1 = client.pods().inNamespace(namespace).withName(podName).watch(watcher); | ||
Watch w2 = client.events().inNamespace(namespace).withField("involvedObject.name", podName).watch(new TaskListenerEventWatcher(podName, runListener))) { | ||
watcher.await(template.getSlaveConnectTimeout(), TimeUnit.SECONDS); | ||
} catch (InvalidDockerImageException e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the build would fail anyway, without this clause; the exception would be thrown up out of launch
. Right? And with this PR, you would see the appropriate diagnostics right in the log. But the current code here does more, by canceling some queue items (which seem to be any node
blocks associated with this build, which is wrong by the way—should only be canceling the node
block associated with this pod). If that is needed in order to prevent the build from hanging, then it is needed in many other error cases too. So this does not add up for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still have some questions but looks good.
watcher.await(template.getSlaveConnectTimeout(), TimeUnit.SECONDS); | ||
try (Watch w1 = client.pods().inNamespace(namespace).withName(podName).watch(watcher); | ||
Watch w2 = eventWatch(client, podName, namespace, runListener)) { | ||
if (watcher != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How could it be null? You just assigned it via a constructor. Does SpotBugs not reject this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess Spotbugs thinks eventWatch
could have a side-effect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAICT the watcher
field is only written to in the one place above, so it should be definitely non-null here. You can always
if (watcher != null) { | |
assert watcher != null; |
to make it behave.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤷♂
try { | ||
return client.events().inNamespace(namespace).withField("involvedObject.name", podName).watch(new TaskListenerEventWatcher(podName, runListener)); | ||
} catch (KubernetesClientException e) { | ||
LOGGER.log(Level.FINE, e, () -> "Cannot watch events on " + namespace + "/" +podName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or printed to the runListener
.
Thank you so much! |
This listens to pod events during the agent launch and stream them in the build logs. Streaming stops once the agent is up and running.
Happy case
Unhappy case