Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Building Flatcar SIG images on Azure with OpenSSH 9.0 fails #859

Closed
invidian opened this issue Apr 13, 2022 · 10 comments · Fixed by #1035
Closed

Building Flatcar SIG images on Azure with OpenSSH 9.0 fails #859

invidian opened this issue Apr 13, 2022 · 10 comments · Fixed by #1035
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@invidian
Copy link
Member

What steps did you take and what happened:

Running FLATCAR_VERSION=current make build-azure-sig-flatcar on version a09b089 currently fails with the following error:

    sig-flatcar: Setting up proxy adapter for Ansible....
==> sig-flatcar: Executing Ansible: ansible-playbook -e packer_build_name="sig-flatcar" -e packer_builder_type=azure-arm --ssh-extra-args '-o IdentitiesOnly=yes' --extra-vars containerd_url=https://github.com/containerd/containerd/releases/download/v1.6.1/cri-containerd-cni-1.6.1-linux-amd64.tar.gz containerd_sha256=e01da1ad4a41a71e0fef52b1f0ed08980b808f1d7c904c9956c24afb8236d6f0 pause_image=k8s.gcr.io/pause:3.6 containerd_additional_settings= containerd_cri_socket=/var/run/containerd/containerd.sock containerd_version=1.6.1 crictl_url=https://github.com/kubernetes-sigs/cri-tools/releases/download/v1.23.0/crictl-v1.23.0-linux-amd64.tar.gz crictl_sha256=b754f83c80acdc75f93aba191ff269da6be45d0fc2d3f4079704e7d1424f1ca8 crictl_source_type=http custom_role= custom_role_names="" disable_public_repos=false extra_debs= extra_repos= extra_rpms= http_proxy= https_proxy= kubeadm_template=etc/kubeadm.yml kubernetes_cni_http_source=https://github.com/containernetworking/plugins/releases/download kubernetes_cni_http_checksum=sha256:https://storage.googleapis.com/k8s-artifacts-cni/release/v0.8.7/cni-plugins-linux-amd64-v0.8.7.tgz.sha256 kubernetes_http_source=https://dl.k8s.io/release kubernetes_container_registry=k8s.gcr.io kubernetes_rpm_repo=https://packages.cloud.google.com/yum/repos/kubernetes-el7-x86_64 kubernetes_rpm_gpg_key="https://packages.cloud.google.com/yum/doc/yum-key.gpg https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg" kubernetes_rpm_gpg_check=True kubernetes_deb_repo="https://apt.kubernetes.io/ kubernetes-xenial" kubernetes_deb_gpg_key=https://packages.cloud.google.com/apt/doc/apt-key.gpg kubernetes_cni_deb_version=0.8.7-00 kubernetes_cni_rpm_version=0.8.7-0 kubernetes_cni_semver=v0.8.7 kubernetes_cni_source_type=http kubernetes_semver=v1.21.10 kubernetes_source_type=http kubernetes_load_additional_imgs=false kubernetes_deb_version=1.21.10-00 kubernetes_rpm_version=1.21.10-0 no_proxy= pip_conf_file= python_path=/opt/pypy/site-packages redhat_epel_rpm=https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm epel_rpm_gpg_key= reenable_public_repos=true remove_extra_repos=false systemd_prefix=/etc/systemd sysusr_prefix=/opt sysusrlocal_prefix=/opt load_additional_components=false additional_registry_images=false additional_registry_images_list= additional_url_images=false additional_url_images_list= additional_executables=false additional_executables_list= additional_executables_destination_path= build_target=virt --extra-vars ansible_python_interpreter=/opt/pypy/bin/pypy --extra-vars  -e ansible_ssh_private_key_file=/tmp/ansible-key1464663093 -i /tmp/packer-provisioner-ansible1359111898 /home/invidian/data/workspaces/clusterapi-flatcar/image-builder/images/capi/ansible/node.yml
    sig-flatcar:
    sig-flatcar: PLAY [all] *********************************************************************
    sig-flatcar:
    sig-flatcar: TASK [Gathering Facts] *********************************************************
    sig-flatcar: fatal: [default]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via scp: Warning: Permanently added '[127.0.0.1]:40711' (RSA) to the list of known hosts.\r\nbash: line 1: /usr/lib/sftp-server: No such file or directory\nscp: Connection closed\r\n", "unreachable": true}
    sig-flatcar:
    sig-flatcar: PLAY RECAP *********************************************************************
    sig-flatcar: default                    : ok=0    changed=0    unreachable=1    failed=0    skipped=0    rescued=0    ignored=0
<omitted>
Build 'sig-flatcar' errored after 5 minutes 873 milliseconds: Error executing Ansible: Non-zero exit status: exit status 4

What did you expect to happen:

Build do succeed.

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

Environment:

Project (Image Builder for Cluster API):

Additional info for Image Builder for Cluster API related issues:

  • OS (e.g. from /etc/os-release, or cmd /c ver): Arch Linux
  • Packer Version: 1.8.0
  • Packer Provider:
  • Ansible Version: core 2.11.5
  • Cluster-api version (if using):
  • Kubernetes version: (use kubectl version):

/kind bug
[One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels]

@invidian
Copy link
Member Author

Ooh, wait. It might be an issue with my local machine, I don't have sftp-server binary either with latest OpenSSH update to version 9.0p1-1...

@invidian
Copy link
Member Author

So downgrading my local openssh to 8.9p1-1 makes it work, I'll keep investigating

@invidian invidian changed the title Building Flatcar SIG images on Azure fails Building Flatcar SIG images on Azure with OpenSSH 9.0 fails Apr 14, 2022
@kopiczko
Copy link
Contributor

Based on: hashicorp/packer#11783 (comment)

Replacing:

      "extra_arguments": [
        "--extra-vars",
        "{{user `ansible_common_vars`}}",
        "--extra-vars",
        "{{user `ansible_extra_vars`}}"
      ],

with

      "extra_arguments": [
        "--scp-extra-args", "'-O'",
        "--extra-vars",
        "{{user `ansible_common_vars`}}",
        "--extra-vars",
        "{{user `ansible_extra_vars`}}"
      ],

did the trick for me.

@invidian
Copy link
Member Author

So I think the root cause lies in the Ansible provisioner for Packer: hashicorp/packer-plugin-ansible#100.

As a workaround, we could try disabling the proxy for provisioner, but it may break some other scenarios I guess. Or use the workaround proposed by @kopiczko above.

@manuh-L
Copy link

manuh-L commented Jul 11, 2022

hello guys,

did you managed to fix this or find a solution?
I'm facing the same issue with vmware vpshere templates

vsphere-clone.MGlobal: Setting up proxy adapter for Ansible....
==> vsphere-clone.MGlobal: Executing Ansible: ansible-playbook -e packer_build_name="MGlobal" -e packer_builder_type=vsphere-clone -e packer_http_addr=192.168.100.253:0 --ssh-extra-args '-o IdentitiesOnly=yes' -v -e ansible_ssh_private_key_file=/tmp/ansible-key3249663531 -i /tmp/packer-provisioner-ansible3844623631 /home/gitlab-runner/builds/NrNECaSf/0/manuh/vmug-demo-packer/default-config.yml
vsphere-clone.MGlobal: Using /etc/ansible/ansible.cfg as config file
vsphere-clone.MGlobal:
vsphere-clone.MGlobal: PLAY [all] *********************************************************************
vsphere-clone.MGlobal:
vsphere-clone.MGlobal: TASK [Create groups] ***********************************************************
vsphere-clone.MGlobal: failed: [default] (item={'name': 'local'}) => {"ansible_loop_var": "item", "item": {"name": "local"}, "msg": "Failed to connect to the host via scp: bash: /usr/lib/sftp-server: No such file or directory\nscp: Connection closed\r\n", "unreachable": true}
vsphere-clone.MGlobal: failed: [default] (item={'name': 'admins'}) => {"ansible_loop_var": "item", "item": {"name": "admins"}, "msg": "Failed to connect to the host via scp: bash: /usr/lib/sftp-server: No such file or directory\nscp: Connection closed\r\n", "unreachable": true}
vsphere-clone.MGlobal: fatal: [default]: UNREACHABLE! => {"changed": false, "msg": "All items completed", "results": [{"ansible_loop_var": "item", "item": {"name": "local"}, "msg": "Failed to connect to the host via scp: bash: /usr/lib/sftp-server: No such file or directory\nscp: Connection closed\r\n", "unreachable": true}, {"ansible_loop_var": "item", "item": {"name": "admins"}, "msg": "Failed to connect to the host via scp: bash: /usr/lib/sftp-server: No such file or directory\nscp: Connection closed\r\n", "unreachable": true}]}
vsphere-clone.MGlobal:
vsphere-clone.MGlobal: PLAY RECAP *********************************************************************
vsphere-clone.MGlobal: default : ok=0 changed=0 unreachable=1 failed=0 skipped=0 rescued=0 ignored=0

@invidian
Copy link
Member Author

@manuh-L give a try changes from #907, we will appreciate feedback :)

@kopiczko
Copy link
Contributor

It looks like the newest scp versions use SFTP under the hood:

        -O Use the legacy SCP protocol for file transfers instead of the
           SFTP protocol. Forcing the use of the SCP protocol may be
           necessary for servers that do not implement SFTP, for
           backwards-compatibility for particular filename wildcard
           patterns and for expanding paths with a ‘~’ prefix for older
           SFTP servers.

It would be nice to get SFTP to work with Flatcar instead. I think that would be the ultimate solution for this issue.

@manuh-L
Copy link

manuh-L commented Jul 18, 2022

Thanks

@manuh-L give a try changes from #907, we will appreciate feedback :)

Thanks, I had demo to present, so for me the quick fix at the time was downgrade. I'll try asap

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 16, 2022
@invidian
Copy link
Member Author

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 16, 2022
invidian pushed a commit to kinvolk/image-builder that referenced this issue Dec 14, 2022
This allows a workaround for issue kubernetes-sigs#859 when building host uses OpenSSH
version 9.0+, which uses SFTP protocol for SCP instead of a legacy SCP
protocol, which right now causes builds to fail with error message as
below when Ansible is trying to copy files over to remote host.

bash: line 1: /usr/lib/sftp-server: No such file or directory\nscp: Connection closed\r\n"

This commit allows users with new OpenSSH version to specify
ANSIBLE_SCP_EXTRA_ARGS="-O" to fix their builds. I plan to automate this
in another commit, as it should be relatively simple and harmless.

Refs kubernetes-sigs#859.
invidian added a commit to kinvolk/image-builder that referenced this issue Dec 14, 2022
Since OpenSSH 9.0+ 'scp' uses SFTP protocol instead of legacy SCP protocol,
which causes building errors like:

bash: line 1: /usr/lib/sftp-server: No such file or directory\nscp: Connection closed\r\n""

However, -O option is not available in older OpenSSH version, so we
cannot always set it as an option to use. To provide better out-of-the-box
experience for users with newer versions of OpenSSH, we conditionally ensure
-O is used when used OpenSSH version requires it.

See kubernetes-sigs#859 and
hashicorp/packer-plugin-ansible#100 for more details.

Signed-off-by: Mateusz Gozdek <mgozdekof@gmail.com>
invidian pushed a commit to kinvolk/image-builder that referenced this issue Dec 14, 2022
This allows a workaround for issue kubernetes-sigs#859 when building host uses OpenSSH
version 9.0+, which uses SFTP protocol for SCP instead of a legacy SCP
protocol, which right now causes builds to fail with error message as
below when Ansible is trying to copy files over to remote host.

bash: line 1: /usr/lib/sftp-server: No such file or directory\nscp: Connection closed\r\n"

This commit allows users with new OpenSSH version to specify
ANSIBLE_SCP_EXTRA_ARGS="-O" to fix their builds. I plan to automate this
in another commit, as it should be relatively simple and harmless.

Refs kubernetes-sigs#859.
invidian added a commit to kinvolk/image-builder that referenced this issue Dec 14, 2022
Since OpenSSH 9.0+ 'scp' uses SFTP protocol instead of legacy SCP protocol,
which causes building errors like:

bash: line 1: /usr/lib/sftp-server: No such file or directory\nscp: Connection closed\r\n""

However, -O option is not available in older OpenSSH version, so we
cannot always set it as an option to use. To provide better out-of-the-box
experience for users with newer versions of OpenSSH, we conditionally ensure
-O is used when used OpenSSH version requires it.

See kubernetes-sigs#859 and
hashicorp/packer-plugin-ansible#100 for more details.

Signed-off-by: Mateusz Gozdek <mgozdekof@gmail.com>
invidian pushed a commit to kinvolk/image-builder that referenced this issue Dec 16, 2022
Below commit messages from squashed commits:

images/capi/packer: extract ansible common SSH args to a single place

This is done to remove repetition of '-o IdentitiesOnly=yes' to make
sure it is consistent across all platforms and to reduce amount of churn
when adding new default arguments like we plan as part of mitigating
issue with ssh-rsa keys (kubernetes-sigs#905).

images/capi/packer: allow specifying extra scp arguments for Ansible

This allows a workaround for issue kubernetes-sigs#859 when building host uses OpenSSH
version 9.0+, which uses SFTP protocol for SCP instead of a legacy SCP
protocol, which right now causes builds to fail with error message as
below when Ansible is trying to copy files over to remote host.

bash: line 1: /usr/lib/sftp-server: No such file or directory\nscp: Connection closed\r\n"

This commit allows users with new OpenSSH version to specify
ANSIBLE_SCP_EXTRA_ARGS="-O" to fix their builds. I plan to automate this
in another commit, as it should be relatively simple and harmless.

Refs kubernetes-sigs#859.

images/capi/packer: allow using ssh-rsa keys with OpenSSH 8.8+

Since OpenSSH version 8.8+ ssh-rsa key algorithm is disabled by default,
which right now causes builds to fail for builders which use OpenSSH
version 8.8+.

The problematic keys are generated by Ansible plugin for Packer and the
problem is currently being discussed in issue
hashicorp/packer-plugin-ansible#69.

An alternative would be to consider using `use_proxy=false` option in
plugin, however we are not sure what could be the implications of this.
Given that building machine should be a rather short process, the
workaround seem acceptable and actually allows being able to succesfully
build images out of the box on more distributions.

In implementation, 'PubkeyAcceptedKeyTypes' is used instead of
'PubkeyAcceptedAlgorithms', as it provides better backward
compatibility, since 'PubkeyAcceptedAlgorithms' is only available since
OpenSSH version 8.4.

See issue kubernetes-sigs#905 for more details.

Co-authored-by: Jeremi Piotrowski <jeremi.piotrowski@gmail.com>

images/capi/Makefile: set ANSIBLE_SCP_EXTRA_ARGS="-O" when needed

Since OpenSSH 9.0+ 'scp' uses SFTP protocol instead of legacy SCP protocol,
which causes building errors like:

bash: line 1: /usr/lib/sftp-server: No such file or directory\nscp: Connection closed\r\n""

However, -O option is not available in older OpenSSH version, so we
cannot always set it as an option to use. To provide better out-of-the-box
experience for users with newer versions of OpenSSH, we conditionally ensure
-O is used when used OpenSSH version requires it.

See kubernetes-sigs#859 and
hashicorp/packer-plugin-ansible#100 for more details.

Signed-off-by: Mateusz Gozdek <mgozdekof@gmail.com>
Co-authored-by: Jeremi Piotrowski <jeremi.piotrowski@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
5 participants