Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node provisioning stuck on Waiting for ssh when provisioning from an airgapped environment with proxy #28411

Closed
bmdepesa opened this issue Aug 14, 2020 · 7 comments
Assignees
Labels
area/provisioning-rke1 Provisioning issues with RKE1 kind/bug-qa Issues that have not yet hit a real release. Bugs introduced by a new feature or enhancement release-note Note this issue in the milestone's release notes team/hostbusters The team that is responsible for provisioning/managing downstream clusters + K8s version support
Milestone

Comments

@bmdepesa
Copy link
Member

bmdepesa commented Aug 14, 2020

rancher/rancher:v2.4.5

Expected:

  • The node provisions successfully

Notes:

  • The Rancher instance is accessible through a public NLB to external machines
  • Outbound connections from Rancher are configured to go through the proxy
  • curl commands from within the Rancher container go through the proxy successfully
  • ssh from within the Rancher container fails unless using a jumphost
root@35d3c0b70171:/var/lib/rancher# ssh -i mykey.pem -o ProxyCommand="ssh -W %h:%p host.compute.amazonaws.com:3131" ubuntu@host
root@host.compute.amazonaws.com: Permission denied (publickey).
ssh_exchange_identification: Connection closed by remote host

# this example above fails due to the key error, but typically this would time out 

root@35d3c0b70171:/var/lib/rancher# ssh ubuntu@host
ssh: connect to host <host> port 22: Connection timed out
  • The same configuration in a non-airgapped setup (but still using a proxy) allows a cluster to be provisioned successfully, though no ssh traffic is seen traversing the proxy
@bmdepesa bmdepesa self-assigned this Aug 14, 2020
@sangeethah sangeethah added the kind/bug-qa Issues that have not yet hit a real release. Bugs introduced by a new feature or enhancement label Aug 15, 2020
@sangeethah sangeethah added this to the v2.4.x milestone Aug 18, 2020
@maggieliu maggieliu modified the milestones: v2.4.x, v2.4.6 Aug 20, 2020
@maggieliu maggieliu modified the milestones: v2.4.6, v2.3.9 Aug 20, 2020
@maggieliu maggieliu modified the milestones: v2.3.9, v2.4.x Aug 20, 2020
@deniseschannon deniseschannon modified the milestones: v2.4.x2, v2.4 - Backlog, v2.x - Backlog Jan 27, 2021
@Jono-SUSE-Rancher
Copy link
Contributor

Does this now work with the proxy changes that were made in 2.5.6?

@bmdepesa
Copy link
Member Author

bmdepesa commented Mar 4, 2021

@slickwarren can you retest this?

@slickwarren
Copy link
Contributor

Rancher/rancher:v2.5.6 I was able to reproduce this issue:

Expected:

  • The node provisions successfully

Notes:

  • The Rancher instance is accessible through a public NLB to external machines
  • Outbound connections from Rancher are configured to go through the proxy
  • curl commands from within the Rancher container go through the proxy successfully
  • ssh from within the Rancher container fails when using public IP of other nodes
root@35d3c0b70171:/var/lib/rancher# ssh ubuntu@host
ssh: connect to host <host> port 22: Connection timed out
  • The same configuration in a non-airgapped setup (but still using a proxy) allows a cluster to be provisioned successfully, though no ssh traffic is seen traversing the proxy

@deniseschannon deniseschannon added the team/hostbusters The team that is responsible for provisioning/managing downstream clusters + K8s version support label Dec 1, 2021
@jiaqiluo
Copy link
Member

Root cause

Rancher invokes the embedded rancher-machine binary to provision nodes in the cluster providers, but the rancher-machine lacks the ability to use proxy to connect to the cluster providers.

What was fixed, or what changes have occurred

  • in rancher-machine, add the support for using the proxy if proxy-related env vars are detected
  • in Rancher, update the rancher-machine binary to the new version, and package new dependencies into the image
  • in Rancher, update the command to let rancher-machine use external ssh binary which can use proxy

Areas or cases that should be tested

  • provision node-driver clusters when Rancher is in the airgap environment and behind a proxy

What areas could experience regressions?

none, this is a new feature

Are the repro steps accurate/minimal?

yes

@slickwarren
Copy link
Contributor

I'm still seeing this issue on v2.6-head (e7f03fd):

reproduction steps:

  • bring up a proxy-gapped rancher server
  • provision an rke1 cluster (digital ocean, all default settings)
    results in this error:
Notifying bugsnag: [Error creating machine: Error detecting OS: Too many retries waiting for SSH to be available. Last error: Maximum number of retries (60) exceeded]

I am able to provision an rke2 cluster with default settings in the same setup.

@jiaqiluo
Copy link
Member

The failure is caused by the misconfiguration of the proxy server.
moved it back to to-test.

@zube zube bot removed the [zube]: Reopened label Jan 27, 2022
@jiaqiluo jiaqiluo added the release-note Note this issue in the milestone's release notes label Jan 27, 2022
@slickwarren
Copy link
Contributor

after adding the following to the proxy config, I was able to continue testing

acl SSL_ports port 22
acl SSL_ports port 2376

acl Safe_ports port 22      # ssh
acl Safe_ports port 2376    # docker port 

on v2.6-head (e7f03fd) using a proxy-gapped rancher server with the above configuration of the proxy:

  • provision an rke1 cluster (digital ocean, single node all roles) -- pass
  • provision rke2 cluster (digital ocean, single node all roles) -- pass
  • provision rke1 cluster with dedicated roles (aws) -- pass
    • deploy and test workloads and services -- pass
  • provision rke2 cluster with dedicated roles (aws) -- pass
    • deploy and test workloads and services -- pass

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/provisioning-rke1 Provisioning issues with RKE1 kind/bug-qa Issues that have not yet hit a real release. Bugs introduced by a new feature or enhancement release-note Note this issue in the milestone's release notes team/hostbusters The team that is responsible for provisioning/managing downstream clusters + K8s version support
Projects
None yet
Development

No branches or pull requests

9 participants