Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Robustness improvements #23

Merged
merged 4 commits into from Jan 5, 2017

Conversation

Projects
None yet
5 participants
@jglick
Copy link
Member

jglick commented Dec 29, 2016

@reviewbybees

jglick added some commits Dec 29, 2016

[FIXED JENKINS-38769] If getWorkspace returns null, best to fail the …
…step immediately.

If it seems to be live, but the process is still running after 10s despite a signal, fail the step also.
@reviewbybees

This comment has been minimized.

Copy link

reviewbybees commented Dec 29, 2016

This pull request originates from a CloudBees employee. At CloudBees, we require that all pull requests be reviewed by other CloudBees employees before we seek to have the change accepted. If you want to learn more about our process please see this explanation.

@recampbell

This comment has been minimized.

Copy link
Member

recampbell commented Dec 30, 2016

Thanks Jesse. What use-cases can be covered by tests here?

@jglick

This comment has been minimized.

Copy link
Member Author

jglick commented Jan 3, 2017

What use-cases can be covered by tests here?

Well that is the question. The problems relate somehow to hung Remoting channels (or some cases of JENKINS-37730 might relate to Timer thread starvation). But as I just noted in JENKINS-40613, it is not obvious how to reproduce the situation which requires this kind of fix. So this PR is intended to be defensive: by code inspection we hope that Jenkins would recover from the problems observed in the field more gracefully, and report more detailed information in the virtual thread dump that could lead to a more fundamental fix.

@jglick jglick referenced this pull request Jan 3, 2017

Merged

[JEP-210] Integration tests with DurableTaskStep #21

7 of 7 tasks complete
@stephenc

This comment has been minimized.

Copy link
Member

stephenc commented Jan 4, 2017

Would be good if you had a test that verified upgrading a system while a durable task is running does not lead to a zombie durable task ;-)

🐝

@jglick

This comment has been minimized.

Copy link
Member Author

jglick commented Jan 5, 2017

a test that verified upgrading a system while a durable task is running does not lead to a zombie durable task

Well, it depends on the nature of the upgrade. The most dangerous changes would be to the serialized form of a StepExecution (transitively, so including nested objects such as a Controller in this case). This patch should only be touching transient fields, though.

@rsandell

This comment has been minimized.

Copy link
Member

rsandell commented Jan 5, 2017

🐝

@jglick jglick merged commit 590da1a into jenkinsci:master Jan 5, 2017

1 check passed

Jenkins This pull request looks good
Details

@jglick jglick deleted the jglick:robustness-JENKINS-40613 branch Jan 5, 2017

@SuppressFBWarnings(value="SE_TRANSIENT_FIELD_NOT_RESTORED", justification="recurrencePeriod is set in onResume, not deserialization")
public static final class Execution extends AbstractStepExecutionImpl implements Runnable {
static final class Execution extends StepExecution implements Runnable {

This comment has been minimized.

Copy link
@jglick

jglick Jan 9, 2017

Author Member

Unfortunately, this does seem to cause a InvalidClassException:

Class does not extend stream superclass

Looking to see if there is some way around that which does not require going back to AbstractStepExecutionImpl.

This comment has been minimized.

Copy link
@jglick

jglick Jan 9, 2017

Author Member

Filed as JENKINS-40909.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.