Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FIXED JENKINS-26558] Clients should provide a unique ID to be used for ... #26

Merged
merged 9 commits into from Apr 27, 2015

Conversation

stephenc
Copy link
Member

...name collision avoidance

  • The current name collision avoidance uses the requests address, which could very likely be the same for all clients
    as they could be being routed through a HTTP proxy (or two) so that is not a good disambiguator
  • We use a digest of the client's interfaces and MAC addresses and the remoteFSRoot to try and give a consistent ID
  • We ALWAYS append the ID if we have it as otherwise during reconnect the slaves with the same name will shuffle around
    which defeats a lot of the login that Jenkins has internally based on slaves having a consistent name
  • In the event of legacy clients that do not have the ID we will let them connect with their name as long as there
    is no online slave with that name. This does mean that where there are multiple legacy swarm clients with the
    same name, only one can be on-line at any moment in time, but that is an improvement on the current where
    once a shuffle starts, none can stay on-line

@reviewbybees

…or name collision avoidance

- The current name collision avoidance uses the requests address, which could very likely be the same for all clients
  as they could be being routed through a HTTP proxy (or two) so that is not a good disambiguator
- We use a digest of the client's interfaces and MAC addresses and the remoteFSRoot to try and give a consistent ID
- We ALWAYS append the ID if we have it as otherwise during reconnect the slaves with the same name will shuffle around
  which defeats a lot of the login that Jenkins has internally based on slaves having a consistent name
- In the event of legacy clients that do not have the ID we will let them connect with their name as long as there
  is no online slave with that name. This does mean that where there are multiple legacy swarm clients with the
  same name, only one can be on-line at any moment in time, but that is an improvement on the current where
  once a shuffle starts, none can stay on-line
}

public static String getDigestOf(String text) {
try {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apache commons has some nice util classes (org.apache.commons.codec.binary.Hex and
org.apache.commons.codec.digest.DigestUtils) that would make this a bit less bloated, and eliminate the need for toHexString

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But that would bloat the swarm-client jar file

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I assumed that the inclusion of org.apache.commons.httpclient would drag those with it as dependencies.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The client already has commons-codec as a dependency.

@@ -292,7 +307,9 @@ protected void createSwarmSlave(Candidate target) throws IOException, Interrupte
+ param("labels", labelStr)
+ param("toolLocations", toolLocationsStr)
+ "&secret=" + target.secret
+ param("mode", options.mode.toUpperCase()));
+ param("mode", options.mode.toUpperCase())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

potential case issues in whacky locales... supply a defaultLocale.

@jenkinsadmin
Copy link
Member

Thank you for a pull request! Please check this document for how the Jenkins project handles pull requests

@stephenc
Copy link
Member Author

@jtnord @rsandell I think I have addressed your comments

if (ia instanceof Inet4Address) {
buf.append(ia.getHostAddress()).append('\n');
} else if (ia instanceof Inet6Address) {
buf.append(ia.getHostAddress()).append('\n');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why you don't want to include non IP adapters (it's just more entropy?) or collapse this into on if but not enough to fail the review.

@jtnord
Copy link
Member

jtnord commented Apr 27, 2015

Whilst I'm sure that name won't contain characters that are not the same in UTF-8 and ISO 8859-1 as its a digest this is essentially lying.

@stephenc
Copy link
Member Author

@jtnord should be fine now

@jtnord
Copy link
Member

jtnord commented Apr 27, 2015

👍

@mindjiver
Copy link
Contributor

@stephenc @jtnord so it this good enough to merge now?

@stephenc
Copy link
Member Author

@mindjiver you are the maintainer of this plugin, so that is your call. I'm not going to ask you to merge it until either:

  • I get a second +1 from a bee;
  • 12h has passed since my last commit and I have at least one +1

But that is our internal process which does not concern you as a maintainer of a Jenkins plugin. If you see a pull request that you are happy to merge, your plugin, your call.

@mindjiver
Copy link
Contributor

Thanks for the clarification 😃 I'm happy to wait, so any problems with the 1.23 release should trickle in first.

@christ66
Copy link
Member

👍

@stephenc
Copy link
Member Author

@mindjiver this pull request has now successfully completed the CloudBees internal processes and we are now asking you, as the maintainer of the plugin for any additional feedback with a view towards merging and cutting a new release with this fix.

mindjiver added a commit that referenced this pull request Apr 27, 2015
[FIXED JENKINS-26558] Clients should provide a unique ID to be used for ...
@mindjiver mindjiver merged commit 5f622da into jenkinsci:master Apr 27, 2015
@mindjiver
Copy link
Contributor

Lets go!

@stephenc stephenc deleted the jenkins-26558 branch April 29, 2015 10:05
@philbert
Copy link

philbert commented Jun 8, 2015

Can we make it configurable whether or not we want to append this randomised string to the slave name?

The premise of the PR works ok only if you are assuming something is not already being done to make sure the node name is unique (like using a hostname).

This change had broken all our jobs and made the node names long and ugly :(

@stephenc
Copy link
Member Author

stephenc commented Jun 8, 2015

@Phiche the issue here is that a client with the same name connecting causes the whole thing to blow up.

You can have multiple swarm clients connecting from the same host and often cloned machines or VMs can end up with the same hostname.

Due to how the swarm clients connect, if we went with something like "first one to connect gets the name, all the others use the hash" you would still get slave rotation as A-1 may connect first initially but after the restart A-2 connects first and grabs the name without the hash

We cannot assume that a name passed on the CLI guarantees uniqueness either as the command to launch the swarm client may be part of the cloned image's startup scripts.

The workaround for you is to use an older build of the client as legacy clients will connect using their name.

I would view relying on nodes names for ephemeral nodes, such as those provided by swarm, is a bad plan and it would be better to use labels.

However, if you feel strongly towards the other position you could create a pull request and see what @mindjiver feels. My personal opinion is that what you ask is rather dangerous and if I were the maintainer of this project I would reject such a request. However I am not the maintainer, @mindjiver is, and the easiest way to see what he feels is to suggest a pull request for the feature you want and see what he says

@stephenc
Copy link
Member Author

stephenc commented Jun 8, 2015

The simplest fix for your builds is just to add the old name to the --labels parameter when launching the client

If you do that, the builds will run on the correct nodes and the only remaining issue -for you- is the ugly display names.

@philbert
Copy link

philbert commented Jun 8, 2015

Unfortunately the ugly display names is not just a trivial aesthetic. The names are quite meaningful in our case. They are carried the way all through the toolchain including host names, slave build names, jenkins job names, tags on EC2, and in Route53.

I liked having full control over the names even if jenkins dies in a fiery implosion if there was a name collision (which we never saw in hundreds of slaves being created/destroyed).

Downgrading the plugin is the preferred option at the moment.

@mindjiver
Copy link
Contributor

I revisited this thread and I still think it's reasonable as it is. Hashes in node names is not very pretty but this was actually how we ended up doing it at my previous employer, so good that it's available straight in this plugin.

@dawidmalina
Copy link
Contributor

I must agree with @Phiche that this option should be configurable. If you prefer to enable it by default it ok but please add ability to disable this feature. It forcing us to stay with 1.23 version.

If you have hostname collision why don't you run swarm in docker. We are doing it and it is working as a charm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
8 participants