Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JENKINS-45295 Swarm Client should update labels on the fly when labelsFile changes #110

Merged
merged 1 commit into from Jun 2, 2019

Conversation

basil
Copy link
Member

@basil basil commented Jun 2, 2019

Problem

In JENKINS-45295 and JENKINS-48247, users have reported that when the contents of the file specified by -labelsFile are changed, the client is blindly restarted, interrupting any running jobs. Needless to say, this is disruptive.

Evaluation

LabelFileWatcher has two modes of operation: "soft" updates (which attempt to modify the labels in place) and "hard" updates (which restart the client, interrupting any running jobs). A soft update is first tried, followed by a hard update if the soft update is unsuccessful. I verified experimentally that after changing the labels file, a soft update was attempted and failed. Then the hard update was done, which restarted the client.

Why did the soft update fail? I stepped through the failure and saw that the soft update process made a call to the backend, but the backend returned a 404, claiming that a node by that name could not be found. Looking into this further, I saw that the node created on the backend had a hash appended to it, while the node name being provided by the client did not have this hash. This provided the root cause: we were setting up the label watcher before the node had been named by the server. The requested name was not actually used by the server, but the label watcher thread was still trying to use it when communicating with the server.

Incidentally, the hash is only appended to the node name when -disableClientsUniqueId is not provided, and I was able to work around the issue by passing in -disableClientsUniqueId.

Solution

Create the node on the server before creating the label watcher thread, and use the name returned by the server when setting up the label watcher thread.

Implementation

The implementation was pretty straightforward. I had to wean our test framework off -disableClientsUniqueId in order to be able to test this. I started by removing -disableClientsUniqueId from TestUtils, which entailed writing a custom getComputer function to iterate through nodes with a possible hash in the name. That having been done, we were no longer using -disableClientsUniqueId in PipelineTest, which needed it for tests that restart the agent. I added it back to just those tests. That having been done, I removed disableClientsUniqueId from the label tests.

@basil basil merged commit 1c9e394 into jenkinsci:master Jun 2, 2019
@basil basil deleted the JENKINS-45295 branch June 2, 2019 18:44
@jimklimov
Copy link
Contributor

Nice research write-up. Got one comment though: you were removing this and that test with the option - so now future developers would not have a means of regression testing of how this plugin works WITH the option set? Shouldn't there be two sets of tests in place, with and without the option, that should act according to some set expectations for each case?

@basil
Copy link
Member Author

basil commented Jun 3, 2019

Shouldn't there be two sets of tests in place, with and without the option, that should act according to some set expectations for each case?

Good point. I implemented this in #113.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants