Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use dynamic HostKeyAlgorithms SSH option for unknown hosts #798

Merged
merged 1 commit into from Apr 7, 2017

Conversation

fullyint
Copy link
Contributor

Fixes Bad protocol 2 host key algorithms error like this example. Should also resolve #784.

Background

In recent sshd changes (#744), b24f074 made the server only offer secure keys based on ed25519 and rsa. However, many SSH clients default to asking for an ecdsa-based key (less secure). These conditions would result in this experience for users:

  • first connection to a new server pulls down an ecdsa key (SSH client's default preference)
  • server.yml configures the server to no longer be willing to use ecdsa key
  • later the user faces failed SSH connections due to changed host key (server offers key type other than ecdsa)

To avoid the connection failure due to changed host key, b24f074 also added the HostKeyAlgorithms option to Ansible's ssh_args (in ansible.cfg), causing the very first connection to a server to request an ed25519 key that would never need to change. It appeared that this would prevent the changed host key problem for new servers.

The problem

As of OpenSSH 6.5p1 (Jan 2014), the HostKeyAlgorithms option included ed25519 (ssh_config). Although older OSs like Ubuntu 14.04 (April 2014) include a new enough version (OpenSSH 6.6p1) to handle ed25519, some Trellis users are on OSs with older OpenSSH and the sentiment seems to be that we don't want to require them to update. For example, macOS 10.10.5 (Aug 2015) uses OpenSSH 6.2p2 (May 2013). OpenSSH for these latter users will fail if the HostKeyAlgorithms option includes ed25519: Bad protocol 2 host key algorithms.

Proposed solution

This PR enables Trellis to...

  • use the HostKeyAlgorithms option only for unknown hosts, when there is still a chance to influence which key type will be used
  • use only rsa-based HostKeyAlgorithms for machines with OpenSSH < 6.5
  • use ed25519 and rsa HostKeyAlgorithms for the rest

Users may disable the feature altogether by defining this somewhere in group_vars:

dynamic_host_key_algorithms: false

The feature will disable itself if users specify the --extra-ssh-args CLI option.

Trellis will display this forthright message on the very first connection to a server
(and NOT on subsequent connections):

TASK [connection : Announce which user was selected] ***************************
Note: Ansible will attempt connections as user = root

Note: The host `12.34.56.78` was not detected in known_hosts
so Trellis prompted the host to offer a key type that will work with
the stronger key types Trellis configures on the server. This avoids future
connection failures due to changed host keys. Trellis used this SSH option:

  -o HostKeyAlgorithms=ssh-ed25519

To prevent Trellis from ever using this SSH option, add this to group_vars:

  dynamic_host_key_algorithms: false
ok: [12.34.56.78]

Implementation notes

ansible_ssh_extra_args is an Ansible magic var, always defined as empty string, or contains content of the --ssh-extra-args CLI option. This PR optionally loads the var with the desired HostKeyAlgorithms.

ssh-keygen -F <hostname> can be used to check whether a host is in known_hosts and factors in to the helper vars in roles/connection/defaults/main.yml.

We need to know which hosts to check for status as known/unknown. Consider this example inventory file:

# hosts/production

aliasname ansible_host=12.34.56.78

[production]
aliasname

[web]
aliasname

Ansible's ansible_host magic var will include the most specific info it can find in the inventory file, e.g., the actual IP from the example above. The ansible_host_known helper var in this PR just runs the ssh-keygen -F check on ansible_host, and is a boolean containing true/false.

Now consider an inventory file like the one above, but without the line indicating the IP. Suppose the IP is indicated instead in the local machine's ssh config like this:

Host aliasname
  HostName 12.34.56.78

Ansible can still connect to aliasname because the SSH client sorts out the IP. However, the ansible_host magic var will equal aliasname (not the IP). The known_hosts file will only have the IP, not aliasname, so the ssh-keygen -F on the ansible_host will suggest the host is unknown when it could in fact be known. If only we could get the IP out of the SSH config file...

This ssh_config_host helper var in this PR checks for the ansible_host in the ssh config, then the ssh_config_host_known boolean runs the ssh-kegen -F check on the returned ssh_config_host value. This is the same logic as the ansible_host_known var, but this time applied to the hostname from the SSH config file.

To zoom back out conceptually, the point of all this host checking is that we only want to specify HostKeyAlgorithms if the machine doesn't already have a key for the host. If the local machine already has an acceptable key but we specify a different HostKeyAlgorithm type, it will cause a host key change error.

So, we check whether there is a key for the host as per the Ansible inventory (ansible_host_known) and as per the ssh config (ssh_config_host_known). You'll notice the condition that both these booleans must be false for the set_fact task to run (the task that loads HostKeyAlgorithms into ansible_ssh_extra_args).

Useful for testing

# roles/connection/tasks/main.yml

  - name: Specify preferred HostKeyAlgorithms for unknown hosts
    set_fact:
      ansible_ssh_extra_args: -o HostKeyAlgorithms={{ host_key_algorithms }}
    register: preferred_host_key_algorithms
    when:
      - dynamic_host_key_algorithms | default(true)
      - ansible_ssh_extra_args == ''
      - not (ansible_host_known or ssh_config_host_known)

+ - debug:
+     msg: |
+       ansible_host_known: {{ ansible_host_known }}
+       ssh_config_host: {{ ssh_config_host }}
+       ssh_config_host_known: {{ ssh_config_host_known }}
+       host_key_algorithms: {{ host_key_algorithms }}
+       ansible_ssh_extra_args: {{ ansible_ssh_extra_args }}

Q & A

Q. How does this affect...

  • ... servers last provisioned with Trellis pre-sshd-overhaul?
    A. Only difference from current master is that users will no longer get Bad protocol 2 host key algorithms error.
  • ... servers that HAVE BEEN provisioned with the sshd-overhauled version of Trellis?
    A. No change. If local machine already has ed25519 or rsa key, that key will continue to be used.
  • ... new servers?
    A. No change except users will no longer get Bad protocol 2 host key algorithms error.

Q. Why not just change to rsa-based for everyone, e.g., with an SSH config entry with Host *?

  • A. This explicit directive for HostKeyAlgorithms would require all hosts to send the rsa-type key, causing a host key change for any known_hosts that use a different key type (e.g., A LOT of host key changes for users' hosts not even related to Trellis).
  • A. it's just not necessary
  • A. we let people use the stronger ed25519 when they can

Q. Will some users' OpenSSH versions be too old for the rsa-based algorithms?
A. ssh-rsa-cert-v01@openssh.com appears in the ssh_config man page as far back as OpenSSH_5.9p1 (e.g., on macOS 10.8) and in the codebase for OpenSSH_5.6p1 (e.g., used in macOS 10.7, released July 2011). (Assuming these macOS and openssh pairings are correct.) The ssh-rsa is a predecessor and appears in all of the above. Of course, macOS isn't the standard, but the only issue reports have been from macOS 10.10 users. In any case, I doubt we want to support users with OpenSSH versions older than this.

Q. Could the ssh -G (available only in OpenSSH 6.8+) or the ssh-keygen -F fail on some systems?
A. The helper vars in defaults send most output to /dev/null 2>&1 and typically use an or || condition to avoid failing on any non-zero exit status.

Q. Why not move these OpenSSH-related tasks into the connection role next to this new SSH-related set_fact task?
A. Because those tasks rely on sshd role vars available only in the next play, the play that actually has the sshd role. In addition, the connection role also runs in deploy.yml, which doesn't have the sshd role nor its vars.

@swalkinshaw
Copy link
Member

🚀

Resulting HostKeyAlgorithms option...
- is omitted if host already in known_hosts
- is omitted if `dynamic_host_key_algorithms: false` (default: true)
- includes ed25519 types only if local machine has OpenSSH 6.5+
@fullyint fullyint force-pushed the dynamic-host-key-algorithms branch from 2ab35ce to b277316 Compare April 7, 2017 02:13
@fullyint fullyint merged commit c1371e3 into master Apr 7, 2017
@fullyint fullyint deleted the dynamic-host-key-algorithms branch April 7, 2017 02:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SSH HostKey algorithms different from native SSH can cause issues
2 participants