Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GitHub] Handshake failed: knownhosts: key mismatch #490

Open
pkit opened this issue Nov 16, 2021 · 43 comments
Open

[GitHub] Handshake failed: knownhosts: key mismatch #490

pkit opened this issue Nov 16, 2021 · 43 comments

Comments

@pkit
Copy link

pkit commented Nov 16, 2021

Started getting these errors out of the blue on all clusters.

{"level":"error","ts":"2021-11-16T18:21:07.474Z","logger":"controller.gitrepository","msg":"Reconciler error","reconciler group":"source.toolkit.fluxcd.io","reconciler kind":"GitRepository","name":"flux-system","namespace":"flux-system","error":"unable to clone 'ssh://git@github.com/user/repository', error: ssh: handshake failed: knownhosts: key mismatch"}

Doing find -name known_hosts in the pod produces nothing.
Restarting the pod = same error immediately.
What's going on, where's the known_hosts file?

@stefanprodan
Copy link
Member

What's going on, where's the known_hosts file?

The known_hosts file is in the same secret as the SSH key, please see the docs here https://fluxcd.io/docs/components/source/gitrepositories/#ssh-authentication

@stefanprodan
Copy link
Member

I'm getting the same error on my cluster:

✗ GitRepository reconciliation failed: 'unable to clone 'ssh://git@github.com/stefanprodan/my-demo-fleet': ssh: handshake failed: knownhosts: key mismatch'

Looks like an issue with GitHub host keys.

@kmannuz
Copy link

kmannuz commented Nov 16, 2021

I am also seeing this error in the last 30 minutes on 3 clusters that had been previously working fine

@kingdonb
Copy link
Member

According to: https://github.blog/2021-09-01-improving-git-protocol-security-github/

Today is the day that host keys get rotated at GitHub. There are two new host keys in the blog post, one for ECDSA and another for Ed25519.

@stefanprodan stefanprodan changed the title handshake failed: knownhosts: key mismatch [GitHub] Handshake failed: knownhosts: key mismatch Nov 16, 2021
@stefanprodan
Copy link
Member

stefanprodan commented Nov 16, 2021

Ok so rotating the SSH key fixes it.

Before:

$ k -n flux-system get secret flux-system -o json | jq '.data | map_values(@base64d)'
{
  "identity": "-----BEGIN PRIVATE KEY-----\n",
  "identity.pub": "ecdsa-sha2-nistp384 \n",
  "known_hosts": "github.com ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAq2A7hRGmdnm9tUDbO9IDSwBK6TbQa+PXYPCPy6rbTrTtw7PHkccKrpp0yVhp5HdEIcKr6pLlVDBfOLX9QUsyCOV0wzfjIJNlGEYsdlLJizHhbn2mUjvSAHQqZETYP81eFzLQNnPHt4EVVUh7VfDESU84KezmD5QlWpXLmvU31/yMf+Se8xhHTvKSCZIFImWwoG6mbUoWf9nzpIoaSjB+weqqUUmpaaasXVal72J+UX2B+2RPW3RcT0eOzQgqlJL3RKrTJvdsjE3JEAvGq3lGHSZXy28G3skua2SmVi/w4yCE6gbODqnTWlg7+wC604ydGXA8VJiS5ap43JXiUFFAaQ=="
}

After:

{
  "identity": "-----BEGIN PRIVATE KEY-----\n",
  "identity.pub": "ecdsa-sha2-nistp384 \n",
  "known_hosts": "github.com ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBEmKSENjQEezOmxkZMy7opKgwFB9nkt5YRrYMjNuG5N87uRgg6CLrbo5wAdT/y6v0mKV0U2w0WZ2YB/++Tpockg="
}

@pkit
Copy link
Author

pkit commented Nov 16, 2021

The known_hosts file is in the same secret as the SSH key, please see the docs here https://fluxcd.io/docs/components/source/gitrepositories/#ssh-authentication

Cool, thanks, but I do see the "old" keys when doing keyscan on the nodes.
Somehow only the pods see the "new" ones.
It makes sense though.

@stefanprodan
Copy link
Member

stefanprodan commented Nov 16, 2021

GitHub has changed its SSH host keys from DSA to ECDSA!
https://github.blog/2021-09-01-improving-git-protocol-security-github/

To fix the key mismatch error, you have two options:

Update the known_hosts in the flux-system secret with the ecdsa-sha2-nistp25 value:

github.com ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBEmKSENjQEezOmxkZMy7opKgwFB9nkt5YRrYMjNuG5N87uRgg6CLrbo5wAdT/y6v0mKV0U2w0WZ2YB/++Tpockg=

Or rotate the SSH keys with flux boostrap like so:

  • delete the deploy key secret from your cluster kubectl -n flux-system delete secret flux-system
  • rerun flux bootstrap github with the same arguments as before
  • Flux will generate the secret with ecdsa-sha2 SSH key and Host key

@pkit
Copy link
Author

pkit commented Nov 16, 2021

Updated known_hosts in flux-system secret manually everywhere.
Seems to work now.

@seh
Copy link

seh commented Nov 16, 2021

If you'd like a short program to do it:

#!/usr/bin/env bash

set -e -u -o pipefail

# NB: The Ed25519-format key does not work with Flux.
for secret_name in flux-system repo-2 repo-3; do
  kubectl --namespace=flux-system \
          patch secret "${secret_name}" \
          --patch='
stringData:
  known_hosts: >
    github.com ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBEmKSENjQEezOmxkZMy7opKgwFB9nkt5YRrYMjNuG5N87uRgg6CLrbo5wAdT/y6v0mKV0U2w0WZ2YB/++Tpockg='
done

kubectl --namespace=flux-system rollout restart deployment source-controller
kubectl --namespace=flux-system rollout status deployment/source-controller --watch

@brianpham
Copy link

Confirmed. Working for us now as well after deleting the secret and bootstrapping again.

@stefanprodan
Copy link
Member

@seh the secret is not mounted inside source-controller, instead the controller reads the secret from Kubernetes API before each Git operations. I don't think you need rollout restart.

@seh
Copy link

seh commented Nov 16, 2021

I was finding that it sits in what appears to be due to a backed-off timer, such that it won't try again for a while after several consecutive failures, but restarting it caused it to try again immediately.

@ellieayla
Copy link

ellieayla commented Nov 16, 2021

Variant on the above script: https://gist.github.com/ellieayla/76352313c4f5939db6d2268fb70b0d48

Then either wait or request each GitRepository to reconcile.

@poteat
Copy link

poteat commented Nov 16, 2021

Confirm that we are getting this on our cluster as well suddenly.

@ellieayla
Copy link

Note with libgit2, the reported error is unable to clone: Certificate ala #397 and #433.

aledegano added a commit to aledegano/terraform-provider-flux that referenced this issue Nov 17, 2021
Update the documentation and example to use the
new SSH public key that GitHub deployed after Nov 16th 2021.

See also fluxcd/source-controller#490

Signed-off-by: Alessandro Degano <a.degano@gmail.com>
@ghost
Copy link

ghost commented Nov 17, 2021

@stefanprodan maybe add to the comment that if you edit the secrets manually, you should restart the source-controller after updating the secret, otherwise source-controller might overwrite the secret with the old values.

We've stopped the source-controller before updating the secrets and then started it again just to be safe:

kubectl scale deploy/source-controller --replicas=0

update the secrets

kubectl scale deploy/source-controller --replicas=1

Edit: the old ssh-rsa value gets added back somehow. Maybe kustomize-controller also needs to be restarted.

@stefanprodan
Copy link
Member

otherwise source-controller might overwrite the secret with the old values.

source-controller doesn't alter secrets. It can't even do that, our RBAC allows the controller read-only access to secrets.

@stefanprodan
Copy link
Member

Edit: the old ssh-rsa value gets added back somehow. Maybe kustomize-controller also needs to be restarted.

You clearly don't use bootstrap or you've stored the SSH keys in Git. If so, then update the secret in Git as well.

@rtjfarrimond
Copy link

rtjfarrimond commented Nov 17, 2021

Unfortunately, this was a predictable incident. It felt wrong to me, as a Flux user, to be providing a known hosts entry as part of the terraform bootstrap process (from this example) for precisely this reason.

To prevent another incident of similar scale in the future, why not give the source-controller the responsibility of maintaining the known hosts file? Presumably given the urls of the sources it has to reconcile it should be fairly straight forward to use something like ssh-keyscan to keep the file up to date?

@stefanprodan
Copy link
Member

stefanprodan commented Nov 17, 2021

It felt wrong to me, as a Flux user, to be providing a known hosts entry as part of the bootstrap process for precisely this reason.

Bootstrap does no such thing, Flux itself generates the known_hosts entries. As a Flux user, you are never asked to provide host keys.

@sebastian-dyroff
Copy link

Are multiple known_hosts with different algorithms supported by the go-git implementation?

@rtjfarrimond
Copy link

Bootstrap does no such thing, Flux itself generates the known_hosts entries. As a Flux user, you are never asked to provide host keys.
@stefanprodan this example from the flux terraform provider examples certainly does.

@stefanprodan
Copy link
Member

@rtjfarrimond I was referring to flux bootstrap not Terraform.

@rtjfarrimond
Copy link

rtjfarrimond commented Nov 17, 2021

I understand, but to be clear, in my original comment I was referring to the terraform bootstrap process. Updated the original comment to reflect this.

@hiddeco
Copy link
Member

hiddeco commented Nov 17, 2021

To prevent another incident of similar scale in the future, why not give the source-controller the responsibility of maintaining the known hosts file?

How can a known_hosts file, that is used as a trust storage, be automatically maintained by a service? That would render the known_hosts useless and allow any MITM-attacks to happen.

@ghost
Copy link

ghost commented Nov 17, 2021

We have two git sources, flux-system and flux-manifests. We've updated the known_hosts for both but for flux-manifests the known_hosts keeps getting replaced with the ssh-rsa key:

{
  "level": "debug",
  "ts": "2021-11-17T10:28:10.304Z",
  "logger": "events",
  "msg": "Normal",
  "object": {
    "kind": "Kustomization",
    "namespace": "flux-system",
    "name": "flux-system",
    "uid": "138b16f7-ca30-458e-a0b1-811b2900fa2c",
    "apiVersion": "kustomize.toolkit.fluxcd.io/v1beta2",
    "resourceVersion": "189896097"
  },
  "reason": "info",
  "message": "Secret/flux-system/flux-manifests configured"
}

Is known_hosts getting updated by the libgit2 callback ?

@ghost
Copy link

ghost commented Nov 17, 2021

Sorry, my bad. It looks like we have the secrets for flux-manifests in Git and flux is just reconciling the secrets.

@hiddeco
Copy link
Member

hiddeco commented Nov 17, 2021

The Secret files are not managed or written to by any of the controllers, but only used for read operations. If something is overwriting your Secret, it must come from something within your configuration.

@rtjfarrimond
Copy link

rtjfarrimond commented Nov 17, 2021

How can a known_hosts file, that is used as a trust storage, be automatically maintained by a service? That would render the known_hosts useless and allow any MITM-attack to happen.

If the some process were to update the known_hosts runs on the same box with the same user that uses the known_hosts file, where would the vector for a MITM be?

@hiddeco
Copy link
Member

hiddeco commented Nov 17, 2021

By it automatically accepting the offered keys.

If your network is compromised and hostname.com suddenly starts serving traffic from compromised.com with a different host key, which is then automatically excepted by the controller, checking the host key no longer has any value.

@rtjfarrimond
Copy link

rtjfarrimond commented Nov 17, 2021

If your network is compromised and hostname.com suddenly starts serving traffic from compromised.com with a different host key, which is then automatically excepted by the controller, checking the host key no longer has any value.

Yep, that makes sense, I withdraw my bad idea! Thanks :)

@rtjfarrimond
Copy link

@stefanprodan Here is a PR to update the known_hosts in the terraform example I linked earlier.

@seh
Copy link

seh commented Nov 17, 2021

Two things lengthened my fixing of this problem across ~20 clusters:

  • I forgot that I generate Secret manifests for additional GitRepository objects with kustomize, and deploy them with Flux.
    The old known hosts entry was in my VCS, and I forgot that I needed to update it there too, as opposed to just in the Secret objects in the Kubernetes clusters.
  • Flux got stuck in a "deadly embrace" due to the Kustomization "spec.wait" field being true on some of my Kustomizations.
    • A Kustomization was waiting on the health of
      • a Kustomization it creates whose GitRepository needed
        • a Secret to be updated by the first Kustomization which was stuck waiting on the health of
          • ...

I had to patch the top-level Kustomization to set "spec.wait" to false, then force Flux to reconcile it. It took many tries before the health checking timeouts expired and Flux finally both updated and then started using the new Secret "data.known_hosts" field value.

aledegano added a commit to aledegano/terraform-provider-flux that referenced this issue Nov 17, 2021
Update the documentation and example to use the
new SSH public key that GitHub deployed after Nov 16th 2021.

See also fluxcd/source-controller#490

Also update the suggested algorithm for the SSH key to use with
GitHub to ensure (compability with libgit2)[https://github.blog/2021-09-01-improving-git-protocol-security-github/#libgit2-and-other-git-clients].

Signed-off-by: Alessandro Degano <a.degano@gmail.com>
@devozs
Copy link

devozs commented Nov 17, 2021

GitHub has changed its SSH host keys from DSA to ECDSA! https://github.blog/2021-09-01-improving-git-protocol-security-github/

To fix the key mismatch error, you have two options:

Update the known_hosts in the flux-system secret with the ecdsa-sha2-nistp25 value:

github.com ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBEmKSENjQEezOmxkZMy7opKgwFB9nkt5YRrYMjNuG5N87uRgg6CLrbo5wAdT/y6v0mKV0U2w0WZ2YB/++Tpockg=

Or rotate the SSH keys with flux boostrap like so:

  • delete the deploy key secret from your cluster kubectl -n flux-system delete secret flux-system
  • rerun flux bootstrap github with the same arguments as before
  • Flux will generate the secret with ecdsa-sha2 SSH key and Host key

Thanks for the suggestion, in my case i also had to:

  • deleting the secret was not enough, it was also required to delete the git source
    flux delete source git flux-system
  • who also having additional repo (for example referenced from flux-infra repo): remember to bootstrap this repo as well, update the persisted flux secret yaml and as mentioned above to delete the secret and git source

@gautamr
Copy link

gautamr commented Nov 17, 2021

GitHub has changed its SSH host keys from DSA to ECDSA! https://github.blog/2021-09-01-improving-git-protocol-security-github/

To fix the key mismatch error, you have two options:

Update the known_hosts in the flux-system secret with the ecdsa-sha2-nistp25 value:

github.com ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBEmKSENjQEezOmxkZMy7opKgwFB9nkt5YRrYMjNuG5N87uRgg6CLrbo5wAdT/y6v0mKV0U2w0WZ2YB/++Tpockg=

Or rotate the SSH keys with flux boostrap like so:

  • delete the deploy key secret from your cluster kubectl -n flux-system delete secret flux-system
  • rerun flux bootstrap github with the same arguments as before
  • Flux will generate the secret with ecdsa-sha2 SSH key and Host key

worked for us

@oscaromeu
Copy link

oscaromeu commented Nov 18, 2021

GitHub has changed its SSH host keys from DSA to ECDSA! https://github.blog/2021-09-01-improving-git-protocol-security-github/

Or rotate the SSH keys with flux boostrap like so:

  • delete the deploy key secret from your cluster kubectl -n flux-system delete secret flux-system
  • rerun flux bootstrap github with the same arguments as before
  • Flux will generate the secret with ecdsa-sha2 SSH key and Host key

Worked for me as well, thanks! 👯

@cbyad
Copy link

cbyad commented Nov 18, 2021

GitHub has changed its SSH host keys from DSA to ECDSA! https://github.blog/2021-09-01-improving-git-protocol-security-github/
To fix the key mismatch error, you have two options:
Update the known_hosts in the flux-system secret with the ecdsa-sha2-nistp25 value:

github.com ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBEmKSENjQEezOmxkZMy7opKgwFB9nkt5YRrYMjNuG5N87uRgg6CLrbo5wAdT/y6v0mKV0U2w0WZ2YB/++Tpockg=

Or rotate the SSH keys with flux boostrap like so:

  • delete the deploy key secret from your cluster kubectl -n flux-system delete secret flux-system
  • rerun flux bootstrap github with the same arguments as before
  • Flux will generate the secret with ecdsa-sha2 SSH key and Host key

worked for us too Thks!

@kaaboaye
Copy link

kaaboaye commented Nov 18, 2021

In my case bootstrap fails to create new secret

flux bootstrap github --owner=USER --repository=REPO --branch=flux2 --personal --path=clusters/CLUSTER --components-extra=image-reflector-controller,image-automation-controller

► connecting to github.com
► cloning branch "flux2" from Git repository "https://github.com/USER/REPO.git"
✔ cloned repository
► generating component manifests
✔ generated component manifests
✔ component manifests are up to date
✔ reconciled components
► determining if source secret "flux-system/flux-system" exists
► generating source secret
✔ public key: ecdsa-sha2-nistp384 key 
✗ multiple errors occurred: 
- POST https://api.github.com/repos/USER/REPO/keys: 404 Not Found []
- the requested resource was not found

Switching from ssh to https helped

@stefanprodan
Copy link
Member

stefanprodan commented Nov 18, 2021

@kaaboaye your user token doesn’t have permission to create deploy keys, you need to be a repo admin.

@ninja9k1
Copy link

I am having a very similar, if not the same, error while setting up gitops on my local kind cluster following this tutorial:
https://docs.gitops.weave.works/docs/getting-started/

{"level":"error","ts":"2021-11-16T18:21:07.474Z","logger":"controller.gitrepository","msg":"Reconciler error","reconciler group":"source.toolkit.fluxcd.io","reconciler kind":"GitRepository","name":"flux-system","namespace":"flux-system","error":"unable to clone 'ssh://git@github.com/user/repository', error: ssh: handshake failed: knownhosts: key mismatch"}

This is a brand new instantiation which I have just fired up a few minutes ago as of this writing. kubectl -n flux-system delete secret flux-system does not work as this is not done through flux bootstrap. Any ideas?

@sbernheim
Copy link

I am having a very similar, if not the same, error while setting up gitops on my local kind cluster following this tutorial: https://docs.gitops.weave.works/docs/getting-started/

{"level":"error","ts":"2021-11-16T18:21:07.474Z","logger":"controller.gitrepository","msg":"Reconciler error","reconciler group":"source.toolkit.fluxcd.io","reconciler kind":"GitRepository","name":"flux-system","namespace":"flux-system","error":"unable to clone 'ssh://git@github.com/user/repository', error: ssh: handshake failed: knownhosts: key mismatch"}

This is a brand new instantiation which I have just fired up a few minutes ago as of this writing. kubectl -n flux-system delete secret flux-system does not work as this is not done through flux bootstrap. Any ideas?

@ninja9k1 - I assume by now that you've resolved this issue for your local gitops installation, but I'll add a response to this Issue in case anyone else finds it and needs the same solution.

The gitops CLI uses your local user's ~/.ssh/known_hosts file as the source for this key, and this error generally means that you need to remove the old RSA host key and add the new ECDSA host key in that file.

This command should remove the existing key:

ssh-keygen -R github.com

You can then either use this command to insert the new key without actually trying to SSH to GitHub:

ssh-keyscan -t ecdsa github.com >> ~/.ssh/known_hosts

Or start an SSH connection to github.com and let GitHub disconnect you after the connection succeeds:

ssh git@github.com

@olivercp3
Copy link

olivercp3 commented Dec 23, 2022

An error is still reported ( Handshake failed: knownhosts: key mismatch ) when a new ecdsa hostkey is generated
my bootstrap command is:
flux bootstrap git --url=ssh://git@XXX.com/DP/k8s-deploy-2.git --private-key-file=/root/.ssh/id_ecdsa --branch dev
Secret generated by bootstrap, why knownhosts: key still mismatch

@braadaaay
Copy link

I managed to get SFTP working, see here on #2948

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests