Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

check SSH host keys before progressing #6857

Merged
merged 20 commits into from Jan 30, 2017

Conversation

jameinel
Copy link
Member

@jameinel jameinel commented Jan 23, 2017

Rather than just checking if we can get to port 22 on hosts that we'd like to connect to, when we know the remote hosts keys we can do the ssh handshake and assert that the key we see is in our list of acceptable host keys.

This should address LP:#1646329

To test this, you can use an old Juju to

  • bootstrap an lxd controller to work in
$ juju bootstrap lxd test-lxd
$ juju switch controller
  • create a device on your host machine that has the same IP address as a device in the container
$ sudo brctl addbr br-xxx
$ sudo ip a add 172.19.0.1 dev br-xxx
  • go into the container and configure its device in the same way
  • you must also restart the jujud agent for it to report the new IP address
$ juju ssh 0
$ sudo brctl addbr br-yyy
$ sudo ip a add 172.19.01 dev br-yyy
$ sudo brctl addbr br-zzz
$ sudo ip a add 172.18.0.1 dev br-zzz
$ sudo service jujud-machine-0 restart
  • now from the outside
$ juju show-machine 0
  • should show 3 IP addresses. Usually a 10.* address and now a 172.19.0.1 address and a 172.18.0.1
$ ip a s

from outside the machine should also show a 172.19.0.1 address which means we have one address we can't talk to and one that is duplicated with our host machine

With an old Juju trying to do

$ juju ssh 0

Should sometimes fail because it sees it can get to 172.19.0.1 but that is not the actual host we're looking for.
With the new Juju you can do:

$ juju ssh --debug 0

And you should see something like:

15:57:21 DEBUG juju.network.ssh reachable.go:140 dialing "172.19.0.1:22" to check host keys
15:57:21 DEBUG juju.network.ssh reachable.go:140 dialing "10.139.15.152:22" to check host keys
15:57:21 DEBUG juju.network.ssh reachable.go:153 connected to "172.19.0.1:22", initiating ssh handshake
15:57:21 DEBUG juju.network.ssh reachable.go:153 connected to "10.139.15.152:22", initiating ssh handshake
15:57:21 DEBUG juju.network.ssh reachable.go:140 dialing "172.18.0.1:22" to check host keys
15:57:21 DEBUG juju.network.ssh reachable.go:99 host key for 172.19.0.1:22 not in our accepted set:  use --debug --log-level=TRACE to see actual key
15:57:21 DEBUG juju.network.ssh reachable.go:86 accepted host key for: 10.139.15.152:22
15:57:21 INFO  juju.network.ssh reachable.go:200 found 10.139.15.152:22 has an acceptable ssh key
15:57:21 DEBUG juju.cmd.juju.commands ssh_common.go:369 using target "0" address "10.139.15.152"
...
15:57:22 DEBUG juju.network.ssh reachable.go:143 dial "172.18.0.1:22" failed with: dial tcp 172.18.0.1:22: i/o timeout

I tried to find a fair trade on the strings so that it is useful. I'm not 100% sold on dumping the public key information by default, but I figured if it is failing, it is likely to be the most helpful information we can dump. If you do use:
juju ssh --debug --log-level=TRACE 0
You'll see all of the public keys that we've found, as well as some of the other API call results.

We want to integrate with the golang crypto ssh library so that
we not only check that we can get a TCP connection to the port, but
also so that we can check that there is a valid SSH that is presenting
the right public key on the other side.
Also, our code was causing the goroutines to block indefinitely, as
they'd never be able to send on the channel once we find one that is
correct, so close a done channel to signal they have nothing to do.
adding a front-end script to test the validation against live SSH servers.
Messing up something about how we're passing the data around, need to fix that.
The test suites are broken, because now we need an actual SSH server running
on the remote side, since we aren't doing a trivial Dial test. However, the
underlying 'Reachable' primative has been tested against real servers and
does what we'd like it to do.
We have an SSH service that can run on a port, and actually properly
does a key exchange handshake.
We already needed almost all the information on our struct, so
by turning it into a method we were able to write a nicer function
that just did the host key lookup. Making the logic easier to follow.
Change the Command tests to just use a ReachableChecker that returns
a valid host from the list that is supplied. This means we can avoid
all of the Dial semantics. We have solid testing around whether
Reachable does its job in the reachable tests.
@jameinel jameinel changed the title WIP: check SSH host keys before progressing check SSH host keys before progressing Jan 25, 2017
@jameinel
Copy link
Member Author

!!build!!

@jameinel
Copy link
Member Author

!!build!!

1 similar comment
@jameinel
Copy link
Member Author

!!build!!

@jameinel jameinel closed this Jan 29, 2017
@jameinel jameinel deleted the 2.1-ssh-keyscan-1646329 branch January 29, 2017 14:27
@jameinel jameinel restored the 2.1-ssh-keyscan-1646329 branch January 29, 2017 14:27
@jameinel
Copy link
Member Author

!!build!!

@jameinel jameinel reopened this Jan 30, 2017
Copy link

@mjs mjs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Just lots of small stuff.

If you haven't already, please review what the logs look like at DEBUG level in real world situations. We don't want a bunch of new log lines for each SSH related command.

if !c.noHostKeyChecks {
publicKeys, err = c.apiClient.PublicKeys(entity)
if err != nil {
// We ignore NotFound errors, as we may not have finished registering
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind clarifying who/what is the second "we" here? Do you mean the machine agent on the target?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, the issue was that a test was expecting a particular context around what entity we were missing keys for. I can fix that by just changing the returned error instead of skipping NotFound.


s.testSSHCommandHostAddressRetry(c, true)
}
/// XXX(jam): 2017-01-25 do we need these functions anymore? We don't really
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ditch these

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized when looking again, only half of them are testing v1. I can mimic what they were doing by changing it to force failing all ssh addresses, which gets them to pass instead. (which I've done)

/// s.setForceAPIv1(false)
///
/// s.testSSHCommandHostAddressRetry(c, true)
/// }

func (s *SSHSuite) testSSHCommandHostAddressRetry(c *gc.C, proxy bool) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the above are removed then I think this is in use either

@@ -0,0 +1,14 @@
// Copyright 2016 Canonical Ltd.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2017

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -0,0 +1,216 @@
// Copyright 2014 Canonical Ltd.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2017?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was actually moved from network/reachable.go, so I think the original copyright applies.

// in hostKeyCallback
if !strings.Contains(err.Error(), hostKeyAccepted.Error()) &&
!strings.Contains(err.Error(), hostKeyNotInList.Error()) {
logger.Debugf("%v", err)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trace? It's likely there will be a few of these per SSH command right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So either it got to the HostKeyCheck that we do above, and we get nice messages, or we don't get this far. The other messages are already at DEBUG level, so you don't see any of them unless you explicitly ask for --debug.
I gave an example of what 'juju ssh --debug 0' looks like in the pull request. Feedback on whether that is too verbose or not is more than welcome. It seemed a reasonable amount of information to actually be able to debug, without being so verbose as to be clutter/actually hard to parse.

timeout time.Duration
}

func (r *reachableChecker) FindHost(hostPorts []network.HostPort, publicKeys []string) (network.HostPort, error) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doc string?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -0,0 +1,197 @@
// Copyright 2014 Canonical Ltd.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2017?

c.Assert(err, jc.ErrorIsNil)
c.Logf("listening on %q", hostPort)

shutdown := make(chan struct{}, 0)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the 0 is unnecessary

// do Key exchange to set up the encrypted conversation.
// We return the address where the SSH service is listening, and a channel
// callers must close when they want the service to stop.
func CreateSSHServer(c *gc.C, privateKeys ...string) (string, chan struct{}) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite similar in structure to testTCPServer. Could this take a flag or something to make it not do any key exchange so that it can be used for both SSH and plain TCP testing? (avoiding some duplicated code)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had considered sharing the TCP side of the connection, went ahead and finished that per your advice. Turned the TCP version into a callback function, and SSH just uses that callback to negotiate an SSH session.

@jameinel
Copy link
Member Author

I actively cleaned up a lot of the "SSH" related dialing statements, and trimmed it down to something nice. You can check with "juju ssh --debug 0" as mentioned earlier. It does list out what we are dialing, and when we are trying SSH handshaking, etc, but the messages are generally much more understandable than they used to be, and I avoided a lot of things like IP addresses being repeated 2x in the same message, etc.

Lots of tweaks suggested by Menno, should clean things up.
@jameinel
Copy link
Member Author

I'm pretty sure Menno intended that I should land it as long as I addressed his questions, so $$merge$$

@jujubot
Copy link
Collaborator

jujubot commented Jan 30, 2017

Status: merge request accepted. Url: http://juju-ci.vapour.ws:8080/job/github-merge-juju

@jujubot
Copy link
Collaborator

jujubot commented Jan 30, 2017

Build failed: Generating tarball failed
build url: http://juju-ci.vapour.ws:8080/job/github-merge-juju/10145

@jameinel
Copy link
Member Author

$$merge$$

@jujubot
Copy link
Collaborator

jujubot commented Jan 30, 2017

Status: merge request accepted. Url: http://juju-ci.vapour.ws:8080/job/github-merge-juju

@jujubot
Copy link
Collaborator

jujubot commented Jan 30, 2017

Build failed: Tests failed
build url: http://juju-ci.vapour.ws:8080/job/github-merge-juju/10146

@jameinel
Copy link
Member Author

$$merge$$ Ping test failure seems Inconsistent

@jujubot
Copy link
Collaborator

jujubot commented Jan 30, 2017

Status: merge request accepted. Url: http://juju-ci.vapour.ws:8080/job/github-merge-juju

@jujubot
Copy link
Collaborator

jujubot commented Jan 30, 2017

Build failed: Tests failed
build url: http://juju-ci.vapour.ws:8080/job/github-merge-juju/10147

@jameinel
Copy link
Member Author

$$merge$$ the only failure appears to be in 'grant' which seems we aren't prompting for a password, which seems a known bug in the representative-tests suite:

    # This scenario is pre-macaroon.
    # See https://bugs.launchpad.net/bugs/1621532
    child.expect('(?i)password')
    child.sendline(user.name + '_password_2')
    # end non-macaroon.

@jujubot
Copy link
Collaborator

jujubot commented Jan 30, 2017

Status: merge request accepted. Url: http://juju-ci.vapour.ws:8080/job/github-merge-juju

@jujubot jujubot merged commit c0643f8 into juju:2.1 Jan 30, 2017
@jameinel jameinel deleted the 2.1-ssh-keyscan-1646329 branch April 22, 2017 16:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants