Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JENKINS-53790] PodTemplate Container Docker image check #497

Merged
merged 45 commits into from Feb 19, 2020

Conversation

pyieh
Copy link
Contributor

@pyieh pyieh commented Jun 10, 2019

Check on ContainerWaitingState if the Docker image can't be found, prints message to build console and cancels and removes job from build queue.

JENKINS-53790
Cleaned up and reforked, previous PR #440

…ass it down to AllContainersPodWatcher, which does a check for invalid docker images used, will print an error msg to the build console and cancel the job to prevent infinte looping
@Vlatombe Vlatombe changed the title PodTemplate Container Docker image check (revised) [JENKINS-53790] PodTemplate Container Docker image check Jun 11, 2019
CHANGELOG.md Outdated
@@ -11,6 +11,10 @@ Unreleased
----------
* Explicit inheritance should override implicit inheritance. Declarative k8s template do not inherit from parent pod template by default. [#480](https://github.com/jenkinsci/kubernetes-plugin/pull/480) [JENKINS-57548](https://issues.jenkins-ci.org/browse/JENKINS-57548)

1.15.7
------
* Check on ContainerWaitingState if the Docker image can't be found, prints message to build console and cancels and removes job from build queue. [#497](https://github.com/jenkinsci/kubernetes-plugin/pull/497) [JENKINS-53790](https://issues.jenkins-ci.org/browse/JENKINS-53790)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to be moved to Unreleased

Queue q = jenkins.getQueue();
for (Queue.Item item : q.getItems()) {
Label itemLabel = item.getAssignedLabel();
if (itemLabel != null && isCorrespondingLabels(itemLabel.getDisplayName(), pod.getMetadata().getName())) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the intent, but this is out of line here. Eventually could be added to exception handling in the launcher

setProblem(ex);
LOGGER.log(Level.WARNING, String.format("Error in provisioning; agent=%s, template=%s", slave, template), ex);
LOGGER.log(Level.FINER, "Removing Jenkins node: {0}", slave.getNodeName());
try {
slave.terminate();
} catch (IOException | InterruptedException e) {
LOGGER.log(Level.WARNING, "Unable to remove Jenkins node", e);
}
throw Throwables.propagate(ex);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you saying that the KubernetesLauncher should be in charge of canceling the job? We've seen that unless a job is canceled, it'll get put back in the queue and infinitely fail due to that bad pod template.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be a better place than in AllContainersRunningPodWatcher, certainly.

@pyieh pyieh changed the title [JENKINS-53790] PodTemplate Container Docker image check [JENKINS-53790] PodTemplate Container Docker image check (revised) Jun 11, 2019
@pyieh pyieh changed the title [JENKINS-53790] PodTemplate Container Docker image check (revised) [JENKINS-53790] PodTemplate Container Docker image check Jun 11, 2019
@Vlatombe Vlatombe added the enhancement Improvements label Jul 6, 2019
@pyieh
Copy link
Contributor Author

pyieh commented Aug 8, 2019

@jglick @Vlatombe please take a look at the new changes.
We introduced a cleaner way of matching the created faulty pod to the build item in the queue that needs to be canceled.
Also added new tests for this.

@jglick jglick self-requested a review August 9, 2019 18:44
CHANGELOG.md Outdated Show resolved Hide resolved
@markjacksonfishing
Copy link
Contributor

@jglick @Vlatombe I had a chat with the person who opened this PR at JWDW yesterday. I explained what they needed to do in order to get this PR correct

@pyieh
Copy link
Contributor Author

pyieh commented Dec 2, 2019

@jglick Looks like you made some changes and have fixed the issue I was seeing.

@jglick
Copy link
Member

jglick commented Dec 2, 2019

Looks like you made some changes

#655

@pyieh pyieh requested a review from jglick December 13, 2019 23:14
@pyieh
Copy link
Contributor Author

pyieh commented Jan 13, 2020

@jglick @Vlatombe can I get a review for this? It was originally showing as all checks passing, but now checks are failing.

@jglick
Copy link
Member

jglick commented Jan 13, 2020

Test failures seem to be infrastructural. Suggest git pull origin master:master && git push to trigger a new build.

Copy link
Member

@Vlatombe Vlatombe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like I had several comments pending for a while, sorry for that

String waitingStateMsg = waitingState.getMessage();
if (waitingStateMsg != null && waitingStateMsg.contains("Back-off pulling image")) {
runListener.error("Unable to pull Docker image \""+containerStatus.getImage()+"\". Check if image name is spelled correctly");
throw new IllegalStateException("BAD_DOCKER_IMAGE");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Create a specific exception type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The exception is needed in the stack of AllContainersRunningPodWatcher.areAllContainersRunning()<- periodicAwait() <- await().

Creating a new exception type causes a clash in AllContainersRunningPodWatcher.eventReceived() which is overriding the parent method of Watcher.eventReceived()
To allow the aforementioned flow, it also causes the flow of areAllContainersRunning() <- updateState() <- eventReceived()
Would you suggest:

  1. catching the exception at updateState() or eventReceived() such that eventReceived() has to also pass it along.
  2. Remain with the branching usage of the IllegalStateException, which eventReceived() already throws
  3. Some alternative ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make your exception extend RuntimeException.

try (Watch _w = client.pods().inNamespace(namespace1).withName(podName).watch(watcher)) {
watcher.await(template.getSlaveConnectTimeout(), TimeUnit.SECONDS);
} catch (IllegalStateException e) {
if (e.getMessage().equals("BAD_DOCKER_IMAGE")) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above.

/* itemTaskName is format of "part of <ORGANIZATION> <JOB NAME> >> <BRANCH> #<BUILD NUMBER> */
/* <ORGANIZATION> and <BRANCH> are only there if Pipeline created through BlueOcean, else just <JOB NAME> */
private String getJobName(String itemTaskName) {
final String partOfStr = "part of ";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very fragile

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed this method as the only use was logging in the case there was an unexpected job name and not actually used in the feature functionality.

@@ -14,6 +14,7 @@
import javax.annotation.Nonnull;

import com.google.common.annotations.VisibleForTesting;
import hudson.model.*;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Set your IDE to avoid wildcard imports.

@Vlatombe
Copy link
Member

We have a flake (ContainerExecDecoratorTest.testCommandExecution) which can be ignored, but there are two other tests failing that need to be addressed.

kind / org.csanchez.jenkins.plugins.kubernetes.pipeline.KubernetesDeclarativeAgentTest.declarativeWithNonexistentDockerImage | 3 min 0 sec | 4
kind / org.csanchez.jenkins.plugins.kubernetes.pipeline.KubernetesDeclarativeAgentTest.declarativeWithNonexistentDockerImageLongLabel | 3 min 0 sec | 4
kind / org.csanchez.jenkins.plugins.kubernetes.pipeline.ContainerExecDecoratorTest.testCommandExecution

@pyieh
Copy link
Contributor Author

pyieh commented Feb 7, 2020

@Vlatombe I've fixed those tests. But the KubernetesPipelineTest.runInPodFromYaml is failing in Jenkins, but it's passing for me locally.

However, KubernetesPipelineTest.cascadeDelete is failing for me locally. Even a clean clone from the jenkinsci/kubernetes-plugin master is failing.
Any suggestions?

Copy link
Member

@Vlatombe Vlatombe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks okay overall.

@pyieh
Copy link
Contributor Author

pyieh commented Feb 10, 2020

@jglick please review. Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improvements
Projects
None yet
6 participants