Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
check SSH host keys before progressing #6857
Conversation
jameinel
added some commits
Jan 22, 2017
jameinel
changed the title from
WIP: check SSH host keys before progressing
to
check SSH host keys before progressing
Jan 25, 2017
|
!!build!! |
jameinel
added some commits
Jan 26, 2017
|
!!build!! |
|
!!build!! |
jameinel
closed this
Jan 29, 2017
jameinel
deleted the
jameinel:2.1-ssh-keyscan-1646329
branch
Jan 29, 2017
jameinel
restored the
jameinel:2.1-ssh-keyscan-1646329
branch
Jan 29, 2017
|
!!build!! |
jameinel
reopened this
Jan 30, 2017
mjs
reviewed
Jan 30, 2017
Looks good. Just lots of small stuff.
If you haven't already, please review what the logs look like at DEBUG level in real world situations. We don't want a bunch of new log lines for each SSH related command.
| + if !c.noHostKeyChecks { | ||
| + publicKeys, err = c.apiClient.PublicKeys(entity) | ||
| + if err != nil { | ||
| + // We ignore NotFound errors, as we may not have finished registering |
mjs
Jan 30, 2017
Contributor
Would you mind clarifying who/what is the second "we" here? Do you mean the machine agent on the target?
jameinel
Jan 30, 2017
Owner
IIRC, the issue was that a test was expecting a particular context around what entity we were missing keys for. I can fix that by just changing the returned error instead of skipping NotFound.
| - | ||
| - s.testSSHCommandHostAddressRetry(c, true) | ||
| -} | ||
| +/// XXX(jam): 2017-01-25 do we need these functions anymore? We don't really |
jameinel
Jan 30, 2017
Owner
I realized when looking again, only half of them are testing v1. I can mimic what they were doing by changing it to force failing all ssh addresses, which gets them to pass instead. (which I've done)
| +/// s.setForceAPIv1(false) | ||
| +/// | ||
| +/// s.testSSHCommandHostAddressRetry(c, true) | ||
| +/// } | ||
| func (s *SSHSuite) testSSHCommandHostAddressRetry(c *gc.C, proxy bool) { |
| @@ -0,0 +1,14 @@ | ||
| +// Copyright 2016 Canonical Ltd. |
| @@ -0,0 +1,216 @@ | ||
| +// Copyright 2014 Canonical Ltd. |
jameinel
Jan 30, 2017
Owner
This was actually moved from network/reachable.go, so I think the original copyright applies.
| +type hostKeyChecker struct { | ||
| + // AcceptedKeys is a set of the Marshalled PublicKey content. | ||
| + AcceptedKeys set.Strings | ||
| + // Stop will be polled for whether we should stop trying to do any work |
| + // first, still exit. | ||
| + select { | ||
| + case h.Accepted <- h.HostPort: | ||
| + case <-h.Stop: |
mjs
Jan 30, 2017
Contributor
Doesn't this mean that hostKeyAccepted gets returned when Stop is closed? That seems wrong.
jameinel
Jan 30, 2017
Owner
Fair point. It really doesn't matter as any error will close our SSH connection. And if we got Stop then nobody is listening for whether this connection is successful because either another connection was successful or we got a timeout.
Anyway, I added a "hostKeyAcceptedButStopped".
| + } | ||
| + return hostKeyAccepted | ||
| + } | ||
| + logger.Debugf("host key for %s not in our accepted set: use --debug --log-level=TRACE to see actual key", debugName) |
mjs
Jan 30, 2017
Contributor
I'm not so sure about incorporating command line switching in the error message. That seems like the wrong level of detail for this area of the code. Perhaps use the fact that hostKeyNotInList was returned to trigger logging along these lines in the cmd code?
jameinel
Jan 30, 2017
Owner
I could just log it immediately. (I certainly have the public key right here), but I found those log messages are super noisy, cause its a whole SSH public key stanza like:
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC5f/PLMoq2Ub8av4iIY9kRFSwJasr7RiS8z3OGC3WOSGOayNX2ku7JpCfhn+kyETauCPjxT7064rA0O9YeRiAEOE8JxrWTSTOxU6QKuGqO0Ch0C/mkFL/Avd1L+EUuxvY2RSAPYX6dk46xRULsoogwBqYj64Tp5to8jiVRGY7JuyBH0wtIuFhIc1FT2NqQ6GXDBW76HzInKBUBfrihG/hfXraCCMm7F0GKx0NleFqUxC3pzo2mKa/cGPhI7EoSYJNajh8nxQOAWHUUjyqHeWxaceK5rdtbfnInLsySYueCRO/QltDaA7ME8lWST/z9FcyjSMIViji+4/597ORMeThr jameinel@Link
I was trying to give people a pointer to how they could find out what it was. I can just change this to "check TRACE logging for details" if you prefer.
The one caveat is that it is not trivial to figure out how to enable TRACE logging for "juju ssh". As it is both the above flags in order to enable it. (you need --debug to send log messages to stderr and you need the latter to drop it to TRACE level.)
jameinel
Jan 30, 2017
Owner
logger.Debugf("host key for %s not in our accepted set: log at TRACE to see raw keys", debugName)
better?
| + // TODO(jam): 2017-01-24 One limitation of our algorithm, is that we don't | ||
| + // try to limit the negotiation of the keys to our set of possible keys. | ||
| + // For example, say we only know about the RSA key for the remote host, but | ||
| + // it has been updated to use a ECDSA key as well. We |
| + // no need to log these two messages, that's already been done | ||
| + // in hostKeyCallback | ||
| + if !strings.Contains(err.Error(), hostKeyAccepted.Error()) && | ||
| + !strings.Contains(err.Error(), hostKeyNotInList.Error()) { |
mjs
Jan 30, 2017
Contributor
Why the string comparison instead of using errors.Cause and an equality check?
jameinel
Jan 30, 2017
Owner
So this is just to suppress yet-another log message. I figure it is worth logging a message if SSH fails for reasons I don't understand.
I can't use errors.Cause because the error is being returned to Gocrypto, which then reformats it into its own Error structure using:
fmt.Errorf("ssh: handshake failed: %v", err)
https://github.com/golang/crypto/blob/master/ssh/client.go#L77
I'd love to use Cause, but that's only for Juju errors.
| + // in hostKeyCallback | ||
| + if !strings.Contains(err.Error(), hostKeyAccepted.Error()) && | ||
| + !strings.Contains(err.Error(), hostKeyNotInList.Error()) { | ||
| + logger.Debugf("%v", err) |
jameinel
Jan 30, 2017
Owner
So either it got to the HostKeyCheck that we do above, and we get nice messages, or we don't get this far. The other messages are already at DEBUG level, so you don't see any of them unless you explicitly ask for --debug.
I gave an example of what 'juju ssh --debug 0' looks like in the pull request. Feedback on whether that is too verbose or not is more than welcome. It seemed a reasonable amount of information to actually be able to debug, without being so verbose as to be clutter/actually hard to parse.
| + timeout time.Duration | ||
| +} | ||
| + | ||
| +func (r *reachableChecker) FindHost(hostPorts []network.HostPort, publicKeys []string) (network.HostPort, error) { |
| @@ -0,0 +1,197 @@ | ||
| +// Copyright 2014 Canonical Ltd. |
| + c.Assert(err, jc.ErrorIsNil) | ||
| + c.Logf("listening on %q", hostPort) | ||
| + | ||
| + shutdown := make(chan struct{}, 0) |
| +// do Key exchange to set up the encrypted conversation. | ||
| +// We return the address where the SSH service is listening, and a channel | ||
| +// callers must close when they want the service to stop. | ||
| +func CreateSSHServer(c *gc.C, privateKeys ...string) (string, chan struct{}) { |
mjs
Jan 30, 2017
Contributor
This is quite similar in structure to testTCPServer. Could this take a flag or something to make it not do any key exchange so that it can be used for both SSH and plain TCP testing? (avoiding some duplicated code)
jameinel
Jan 30, 2017
Owner
I had considered sharing the TCP side of the connection, went ahead and finished that per your advice. Turned the TCP version into a callback function, and SSH just uses that callback to negotiate an SSH session.
|
I actively cleaned up a lot of the "SSH" related dialing statements, and trimmed it down to something nice. You can check with "juju ssh --debug 0" as mentioned earlier. It does list out what we are dialing, and when we are trying SSH handshaking, etc, but the messages are generally much more understandable than they used to be, and I avoided a lot of things like IP addresses being repeated 2x in the same message, etc. |
jameinel
added some commits
Jan 30, 2017
|
I'm pretty sure Menno intended that I should land it as long as I addressed his questions, so $$merge$$ |
|
Status: merge request accepted. Url: http://juju-ci.vapour.ws:8080/job/github-merge-juju |
|
Build failed: Generating tarball failed |
|
$$merge$$ |
|
Status: merge request accepted. Url: http://juju-ci.vapour.ws:8080/job/github-merge-juju |
|
Build failed: Tests failed |
|
$$merge$$ Ping test failure seems Inconsistent |
|
Status: merge request accepted. Url: http://juju-ci.vapour.ws:8080/job/github-merge-juju |
|
Build failed: Tests failed |
|
$$merge$$ the only failure appears to be in 'grant' which seems we aren't prompting for a password, which seems a known bug in the representative-tests suite:
|
|
Status: merge request accepted. Url: http://juju-ci.vapour.ws:8080/job/github-merge-juju |
jameinel commentedJan 23, 2017
•
Edited 1 time
-
jameinel
Jan 25, 2017
Rather than just checking if we can get to port 22 on hosts that we'd like to connect to, when we know the remote hosts keys we can do the ssh handshake and assert that the key we see is in our list of acceptable host keys.
This should address LP:#1646329
To test this, you can use an old Juju to
from outside the machine should also show a 172.19.0.1 address which means we have one address we can't talk to and one that is duplicated with our host machine
With an old Juju trying to do
Should sometimes fail because it sees it can get to 172.19.0.1 but that is not the actual host we're looking for.
With the new Juju you can do:
And you should see something like:
I tried to find a fair trade on the strings so that it is useful. I'm not 100% sold on dumping the public key information by default, but I figured if it is failing, it is likely to be the most helpful information we can dump. If you do use:
juju ssh --debug --log-level=TRACE 0You'll see all of the public keys that we've found, as well as some of the other API call results.