check SSH host keys before progressing #6857

Merged
merged 20 commits into from Jan 30, 2017

Conversation

Projects
None yet
3 participants
Owner

jameinel commented Jan 23, 2017

Rather than just checking if we can get to port 22 on hosts that we'd like to connect to, when we know the remote hosts keys we can do the ssh handshake and assert that the key we see is in our list of acceptable host keys.

This should address LP:#1646329

To test this, you can use an old Juju to

  • bootstrap an lxd controller to work in
$ juju bootstrap lxd test-lxd
$ juju switch controller
  • create a device on your host machine that has the same IP address as a device in the container
$ sudo brctl addbr br-xxx
$ sudo ip a add 172.19.0.1 dev br-xxx
  • go into the container and configure its device in the same way
  • you must also restart the jujud agent for it to report the new IP address
$ juju ssh 0
$ sudo brctl addbr br-yyy
$ sudo ip a add 172.19.01 dev br-yyy
$ sudo brctl addbr br-zzz
$ sudo ip a add 172.18.0.1 dev br-zzz
$ sudo service jujud-machine-0 restart
  • now from the outside
$ juju show-machine 0
  • should show 3 IP addresses. Usually a 10.* address and now a 172.19.0.1 address and a 172.18.0.1
$ ip a s

from outside the machine should also show a 172.19.0.1 address which means we have one address we can't talk to and one that is duplicated with our host machine

With an old Juju trying to do

$ juju ssh 0

Should sometimes fail because it sees it can get to 172.19.0.1 but that is not the actual host we're looking for.
With the new Juju you can do:

$ juju ssh --debug 0

And you should see something like:

15:57:21 DEBUG juju.network.ssh reachable.go:140 dialing "172.19.0.1:22" to check host keys
15:57:21 DEBUG juju.network.ssh reachable.go:140 dialing "10.139.15.152:22" to check host keys
15:57:21 DEBUG juju.network.ssh reachable.go:153 connected to "172.19.0.1:22", initiating ssh handshake
15:57:21 DEBUG juju.network.ssh reachable.go:153 connected to "10.139.15.152:22", initiating ssh handshake
15:57:21 DEBUG juju.network.ssh reachable.go:140 dialing "172.18.0.1:22" to check host keys
15:57:21 DEBUG juju.network.ssh reachable.go:99 host key for 172.19.0.1:22 not in our accepted set:  use --debug --log-level=TRACE to see actual key
15:57:21 DEBUG juju.network.ssh reachable.go:86 accepted host key for: 10.139.15.152:22
15:57:21 INFO  juju.network.ssh reachable.go:200 found 10.139.15.152:22 has an acceptable ssh key
15:57:21 DEBUG juju.cmd.juju.commands ssh_common.go:369 using target "0" address "10.139.15.152"
...
15:57:22 DEBUG juju.network.ssh reachable.go:143 dial "172.18.0.1:22" failed with: dial tcp 172.18.0.1:22: i/o timeout

I tried to find a fair trade on the strings so that it is useful. I'm not 100% sold on dumping the public key information by default, but I figured if it is failing, it is likely to be the most helpful information we can dump. If you do use:
juju ssh --debug --log-level=TRACE 0
You'll see all of the public keys that we've found, as well as some of the other API call results.

jameinel added some commits Jan 22, 2017

start factoring out the 'ssh reachable' code.
We want to integrate with the golang crypto ssh library so that
we not only check that we can get a TCP connection to the port, but
also so that we can check that there is a valid SSH that is presenting
the right public key on the other side.
Also, our code was causing the goroutines to block indefinitely, as
they'd never be able to send on the channel once we find one that is
correct, so close a done channel to signal they have nothing to do.
(broken) very close to validating real ssh keys.
adding a front-end script to test the validation against live SSH servers.
Messing up something about how we're passing the data around, need to fix that.
ssh-keycheck now passes against real SSH hosts.
The test suites are broken, because now we need an actual SSH server running
on the remote side, since we aren't doing a trivial Dial test. However, the
underlying 'Reachable' primative has been tested against real servers and
does what we'd like it to do.
The tests are now running.
We have an SSH service that can run on a port, and actually properly
does a key exchange handshake.
Cleanup the function a lot by creating a method.
We already needed almost all the information on our struct, so
by turning it into a method we were able to write a nicer function
that just did the host key lookup. Making the logic easier to follow.
Get the test suite happy again.
Change the Command tests to just use a ReachableChecker that returns
a valid host from the list that is supplied. This means we can avoid
all of the Dial semantics. We have solid testing around whether
Reachable does its job in the reachable tests.

@jameinel jameinel changed the title from WIP: check SSH host keys before progressing to check SSH host keys before progressing Jan 25, 2017

Owner

jameinel commented Jan 26, 2017

!!build!!

Owner

jameinel commented Jan 26, 2017

!!build!!

Owner

jameinel commented Jan 29, 2017

!!build!!

@jameinel jameinel closed this Jan 29, 2017

@jameinel jameinel deleted the jameinel:2.1-ssh-keyscan-1646329 branch Jan 29, 2017

@jameinel jameinel restored the jameinel:2.1-ssh-keyscan-1646329 branch Jan 29, 2017

Owner

jameinel commented Jan 30, 2017

!!build!!

@jameinel jameinel reopened this Jan 30, 2017

Looks good. Just lots of small stuff.

If you haven't already, please review what the logs look like at DEBUG level in real world situations. We don't want a bunch of new log lines for each SSH related command.

cmd/juju/commands/ssh_common.go
+ if !c.noHostKeyChecks {
+ publicKeys, err = c.apiClient.PublicKeys(entity)
+ if err != nil {
+ // We ignore NotFound errors, as we may not have finished registering
@mjs

mjs Jan 30, 2017

Contributor

Would you mind clarifying who/what is the second "we" here? Do you mean the machine agent on the target?

@jameinel

jameinel Jan 30, 2017

Owner

IIRC, the issue was that a test was expecting a particular context around what entity we were missing keys for. I can fix that by just changing the returned error instead of skipping NotFound.

cmd/juju/commands/ssh_unix_test.go
-
- s.testSSHCommandHostAddressRetry(c, true)
-}
+/// XXX(jam): 2017-01-25 do we need these functions anymore? We don't really
@mjs

mjs Jan 30, 2017

Contributor

I think ditch these

@jameinel

jameinel Jan 30, 2017

Owner

I realized when looking again, only half of them are testing v1. I can mimic what they were doing by changing it to force failing all ssh addresses, which gets them to pass instead. (which I've done)

cmd/juju/commands/ssh_unix_test.go
+/// s.setForceAPIv1(false)
+///
+/// s.testSSHCommandHostAddressRetry(c, true)
+/// }
func (s *SSHSuite) testSSHCommandHostAddressRetry(c *gc.C, proxy bool) {
@mjs

mjs Jan 30, 2017

Contributor

If the above are removed then I think this is in use either

network/ssh/package_test.go
@@ -0,0 +1,14 @@
+// Copyright 2016 Canonical Ltd.
@@ -0,0 +1,216 @@
+// Copyright 2014 Canonical Ltd.
@mjs

mjs Jan 30, 2017

Contributor

2017?

@jameinel

jameinel Jan 30, 2017

Owner

This was actually moved from network/reachable.go, so I think the original copyright applies.

network/ssh/reachable.go
+type hostKeyChecker struct {
+ // AcceptedKeys is a set of the Marshalled PublicKey content.
+ AcceptedKeys set.Strings
+ // Stop will be polled for whether we should stop trying to do any work
@mjs

mjs Jan 30, 2017

Contributor

nit: blank lines before the docstrings please

network/ssh/reachable.go
+ // first, still exit.
+ select {
+ case h.Accepted <- h.HostPort:
+ case <-h.Stop:
@mjs

mjs Jan 30, 2017

Contributor

Doesn't this mean that hostKeyAccepted gets returned when Stop is closed? That seems wrong.

@jameinel

jameinel Jan 30, 2017

Owner

Fair point. It really doesn't matter as any error will close our SSH connection. And if we got Stop then nobody is listening for whether this connection is successful because either another connection was successful or we got a timeout.

Anyway, I added a "hostKeyAcceptedButStopped".

network/ssh/reachable.go
+ }
+ return hostKeyAccepted
+ }
+ logger.Debugf("host key for %s not in our accepted set: use --debug --log-level=TRACE to see actual key", debugName)
@mjs

mjs Jan 30, 2017

Contributor

I'm not so sure about incorporating command line switching in the error message. That seems like the wrong level of detail for this area of the code. Perhaps use the fact that hostKeyNotInList was returned to trigger logging along these lines in the cmd code?

@jameinel

jameinel Jan 30, 2017

Owner

I could just log it immediately. (I certainly have the public key right here), but I found those log messages are super noisy, cause its a whole SSH public key stanza like:

ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC5f/PLMoq2Ub8av4iIY9kRFSwJasr7RiS8z3OGC3WOSGOayNX2ku7JpCfhn+kyETauCPjxT7064rA0O9YeRiAEOE8JxrWTSTOxU6QKuGqO0Ch0C/mkFL/Avd1L+EUuxvY2RSAPYX6dk46xRULsoogwBqYj64Tp5to8jiVRGY7JuyBH0wtIuFhIc1FT2NqQ6GXDBW76HzInKBUBfrihG/hfXraCCMm7F0GKx0NleFqUxC3pzo2mKa/cGPhI7EoSYJNajh8nxQOAWHUUjyqHeWxaceK5rdtbfnInLsySYueCRO/QltDaA7ME8lWST/z9FcyjSMIViji+4/597ORMeThr jameinel@Link

I was trying to give people a pointer to how they could find out what it was. I can just change this to "check TRACE logging for details" if you prefer.
The one caveat is that it is not trivial to figure out how to enable TRACE logging for "juju ssh". As it is both the above flags in order to enable it. (you need --debug to send log messages to stderr and you need the latter to drop it to TRACE level.)

@jameinel

jameinel Jan 30, 2017

Owner

logger.Debugf("host key for %s not in our accepted set: log at TRACE to see raw keys", debugName)

better?

network/ssh/reachable.go
+ // TODO(jam): 2017-01-24 One limitation of our algorithm, is that we don't
+ // try to limit the negotiation of the keys to our set of possible keys.
+ // For example, say we only know about the RSA key for the remote host, but
+ // it has been updated to use a ECDSA key as well. We
@mjs

mjs Jan 30, 2017

Contributor

Incomplete

+ // no need to log these two messages, that's already been done
+ // in hostKeyCallback
+ if !strings.Contains(err.Error(), hostKeyAccepted.Error()) &&
+ !strings.Contains(err.Error(), hostKeyNotInList.Error()) {
@mjs

mjs Jan 30, 2017

Contributor

Why the string comparison instead of using errors.Cause and an equality check?

@jameinel

jameinel Jan 30, 2017

Owner

So this is just to suppress yet-another log message. I figure it is worth logging a message if SSH fails for reasons I don't understand.
I can't use errors.Cause because the error is being returned to Gocrypto, which then reformats it into its own Error structure using:
fmt.Errorf("ssh: handshake failed: %v", err)
https://github.com/golang/crypto/blob/master/ssh/client.go#L77

I'd love to use Cause, but that's only for Juju errors.

+ // in hostKeyCallback
+ if !strings.Contains(err.Error(), hostKeyAccepted.Error()) &&
+ !strings.Contains(err.Error(), hostKeyNotInList.Error()) {
+ logger.Debugf("%v", err)
@mjs

mjs Jan 30, 2017

Contributor

Trace? It's likely there will be a few of these per SSH command right?

@jameinel

jameinel Jan 30, 2017

Owner

So either it got to the HostKeyCheck that we do above, and we get nice messages, or we don't get this far. The other messages are already at DEBUG level, so you don't see any of them unless you explicitly ask for --debug.
I gave an example of what 'juju ssh --debug 0' looks like in the pull request. Feedback on whether that is too verbose or not is more than welcome. It seemed a reasonable amount of information to actually be able to debug, without being so verbose as to be clutter/actually hard to parse.

network/ssh/reachable.go
+ timeout time.Duration
+}
+
+func (r *reachableChecker) FindHost(hostPorts []network.HostPort, publicKeys []string) (network.HostPort, error) {
@mjs

mjs Jan 30, 2017

Contributor

Doc string?

@@ -0,0 +1,197 @@
+// Copyright 2014 Canonical Ltd.
@mjs

mjs Jan 30, 2017

Contributor

2017?

network/ssh/reachable_test.go
+ c.Assert(err, jc.ErrorIsNil)
+ c.Logf("listening on %q", hostPort)
+
+ shutdown := make(chan struct{}, 0)
@mjs

mjs Jan 30, 2017

Contributor

the 0 is unnecessary

+// do Key exchange to set up the encrypted conversation.
+// We return the address where the SSH service is listening, and a channel
+// callers must close when they want the service to stop.
+func CreateSSHServer(c *gc.C, privateKeys ...string) (string, chan struct{}) {
@mjs

mjs Jan 30, 2017

Contributor

This is quite similar in structure to testTCPServer. Could this take a flag or something to make it not do any key exchange so that it can be used for both SSH and plain TCP testing? (avoiding some duplicated code)

@jameinel

jameinel Jan 30, 2017

Owner

I had considered sharing the TCP side of the connection, went ahead and finished that per your advice. Turned the TCP version into a callback function, and SSH just uses that callback to negotiate an SSH session.

mjs approved these changes Jan 30, 2017

Owner

jameinel commented Jan 30, 2017

I actively cleaned up a lot of the "SSH" related dialing statements, and trimmed it down to something nice. You can check with "juju ssh --debug 0" as mentioned earlier. It does list out what we are dialing, and when we are trying SSH handshaking, etc, but the messages are generally much more understandable than they used to be, and I avoided a lot of things like IP addresses being repeated 2x in the same message, etc.

jameinel added some commits Jan 30, 2017

Review feedback.
Lots of tweaks suggested by Menno, should clean things up.
Owner

jameinel commented Jan 30, 2017

I'm pretty sure Menno intended that I should land it as long as I addressed his questions, so $$merge$$

Contributor

jujubot commented Jan 30, 2017

Status: merge request accepted. Url: http://juju-ci.vapour.ws:8080/job/github-merge-juju

Contributor

jujubot commented Jan 30, 2017

Build failed: Generating tarball failed
build url: http://juju-ci.vapour.ws:8080/job/github-merge-juju/10145

Owner

jameinel commented Jan 30, 2017

$$merge$$

Contributor

jujubot commented Jan 30, 2017

Status: merge request accepted. Url: http://juju-ci.vapour.ws:8080/job/github-merge-juju

Contributor

jujubot commented Jan 30, 2017

Build failed: Tests failed
build url: http://juju-ci.vapour.ws:8080/job/github-merge-juju/10146

Owner

jameinel commented Jan 30, 2017

$$merge$$ Ping test failure seems Inconsistent

Contributor

jujubot commented Jan 30, 2017

Status: merge request accepted. Url: http://juju-ci.vapour.ws:8080/job/github-merge-juju

Contributor

jujubot commented Jan 30, 2017

Build failed: Tests failed
build url: http://juju-ci.vapour.ws:8080/job/github-merge-juju/10147

Owner

jameinel commented Jan 30, 2017

$$merge$$ the only failure appears to be in 'grant' which seems we aren't prompting for a password, which seems a known bug in the representative-tests suite:

    # This scenario is pre-macaroon.
    # See https://bugs.launchpad.net/bugs/1621532
    child.expect('(?i)password')
    child.sendline(user.name + '_password_2')
    # end non-macaroon.
Contributor

jujubot commented Jan 30, 2017

Status: merge request accepted. Url: http://juju-ci.vapour.ws:8080/job/github-merge-juju

@jujubot jujubot merged commit c0643f8 into juju:2.1 Jan 30, 2017

1 check failed

github-check-merge-juju Built PR, ran unit tests, and tested LXD deploy. Use !!.*!! to request another build. IE, !!build!!, !!retry!!
Details

@jameinel jameinel deleted the jameinel:2.1-ssh-keyscan-1646329 branch Apr 22, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment