Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UI: Need to clarify field ssh user in node template #17995

Closed
yasker opened this issue Feb 12, 2019 · 8 comments
Closed

UI: Need to clarify field ssh user in node template #17995

yasker opened this issue Feb 12, 2019 · 8 comments

Comments

@yasker
Copy link
Member

yasker commented Feb 12, 2019

Current there no explanation of the SSH user option in the node template. It can be easily mistaken by a new user that this field is customizable, e.g. the VM instance would be created with the username you choose. But in fact, it's determined by the VM template and only have one correct value. Most of time UI will determine the correct value but a user may not realize it must match the predefined value inside VM image and change it manually, result in Too many retries waiting for SSH to be available and failed provisioning later.

Using Too many retries waiting for SSH to be available, I can found two issues due to not set the SSH user correctly.

#3345
#17284

And there is no Rancher document about how to fill this field except one line mentioning in:

https://rancher.com/docs/rancher/v2.x/en/cluster-provisioning/rke-clusters/node-pools/ec2/
Make sure you configure the correct SSH User for the configured AMI.

Digital Ocean has mentioned the ssh user name in https://www.digitalocean.com/docs/droplets/how-to/connect-with-ssh/

The default username is root on most operating systems, like Ubuntu and CentOS. Exceptions to this include CoreOS, where you’ll log in as core, Rancher, where you’ll log in as rancher, and FreeBSD, where you’ll log in as freebsd.

It would be helpful if we can make it more clear about what's the usage about this field.

E.g. a prompt like:
This field must match the SSH user used inside VM image. If in doubt, don't modify. See <url> for details, which pointed to the cloud vendor specific document.

@atleag
Copy link

atleag commented Jun 27, 2019

In case anyone new are looking at this. We have attempted getting a 3 node rancher 2.2 HA cluster to create a 4 node cluster on OpenStack with the following template input:

field sshUser | field privateKeyFile

  • ubuntu | [contents of id_rsa]
  • ubuntu | /home/ubuntu/.ssh/id_rsa
  • ubuntu | ----- BEGIN RSA PRIVATE KEY------ [contents of id_rsa]
  • rancher | [contents of id_rsa]
  • rancher | /home/ubuntu/.ssh/id_rsa
  • rancher | ----- BEGIN RSA PRIVATE KEY------ [contents of id_rsa]
  • root | [contents of id_rsa]
  • root | /home/ubuntu/.ssh/id_rsa
  • root | ----- BEGIN RSA PRIVATE KEY------ [contents of id_rsa]

The id_rsa has been manually inserted into the /home/ubuntu/.ssh folder of the VM, so this should be correct.

No matter what we insert into these fields, the cluster creation fails with error
Error creating machine: Error detecting OS: Too many retries waiting for SSH to be available. Last error: Maximum number of retries (60) exceeded

In openstack, all instances are successfully spawned, and ssh-ing into them manually with the same key is working perfectly.

Important to note that the id_rsa key worked fine when creating cluster manually with the rancher master (workstation), which in the end created the cluster we are now working with.
These problems appear when using the Rancher UI to create clusters.

We are currently looking at which pod is responsible for the ssh and will attempt to mount the privateKeyFile.

@shaneharger
Copy link

shaneharger commented Jul 5, 2019

@Moskoskos I'm having the exact problem and have tried all of the above combinations in the node template. Manual SSH works, but creating the cluster using the Rancher UI just isn't working.

How did you work around this exactly? Also have you found a way to use the UI?

@atleag
Copy link

atleag commented Jul 5, 2019

The field does not work. Ill elaborate more in 20min.

@atleag
Copy link

atleag commented Jul 5, 2019

EDIT: Looks to be solved in future release. Issue #20787

Went abit longer than first anticipated but here goes.

TL;DR: Entering the private key in the privateKeyFile field will place the content inside an id_rsa file inside the rancher/rancher container running on one of your nodes. However the content is malformed. This can be solved by supplying an image with the privateKeyFile pre-installed. Thus there is no need to enter either keyPair nor privateKeyFile in the openstack node template.


The field correctly states that the content of the private key file should be entered. However the input gets malformed regardless of formating, escape characters and what not.
When you enter say something like:
----- BEGIN RSA ------
MSD92838smsdo28m23
2390ms8SMS2m902m3m
[etc]
----- END RSA ----

The contents in the field will be written to a id_rsa file inside a rancher/rancher docker container running on one of your nodes.
However the new line encodning (LF) aswell as any excplicit escape characters (\n) will be ignored.
Your input will look like this:
----- BEGIN RSA ----- MSD92838smsdo28m23 2390ms8SMS2m902m3m [etc] ----- END RSA-----

I do not understand crytography and certificates all that well, but apparently the lack of new lines and addition of spaces causes the file to be read incorrectly.

I tested this by listing all my pods running on each node:
docker ps -a
Found the rancher/rancher image in the list by description. (Furthest to the right).
Entered the container with:
docker exec -it <NAME OF CONTAINER> bash
Then find the directory named after the node you're setting up. For example, I use <NAMESPACE>-w for the worker node. Try entering the following command with the name of the node:
find / -name <NODE-NAME / NAMESPACE>.
The search will probably find a result with an abnormally long path, something like /etc/rancher/nodes/[an extreme amount of numbers]/<NODE_NAME>

Installing nano inside the container.
Open the id_rsa file, delete the content and paste your key in the proper format.
Rancher will then be able to get passed the "Waiting SSH..." and start setting up the nodes. Just watch out the node will be cleared after the 60 attempts fail and the folder you're working in will be gone.

You may test this inside the container, ssh'ing before and after this operation.

Had a look at the source-code but I was not able to determine how to fix this bug.

@DJAyth
Copy link

DJAyth commented Jul 16, 2019

I can't comment on the bug, but I've had no issues spinning up a cluster in OpenStack. I generally use Ubuntu distros mainly, so the sshUser is set to ubuntu.

As for the privateKeyFile entry if you leave it blank Rancher will create it's own keypair in Openstack and assign it to the node. You can then download said keypair from the UI once the cluster is available.

@atleag
Copy link

atleag commented Jul 17, 2019

I will give it a try when spinning up our next cluster. I recall attempting leaving the fields untouched, but I'm unsure what error caused us not to succeed in that regard.

As of now the Rancher / Kubernertes have assigned new keys in openstack successfully, while our prepacked key is used for external access.

@rafise
Copy link

rafise commented Oct 14, 2019

I'm having the same issue any work around?

@deniseschannon deniseschannon removed this from the Unscheduled milestone Jul 2, 2021
@stale
Copy link

stale bot commented Sep 1, 2021

This repository uses a bot to automatically label issues which have not had any activity (commit/comment/label) for 60 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the bot can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the bot will automatically close the issue in 14 days. Thank you for your contributions.

@stale stale bot added the status/stale label Sep 1, 2021
@stale stale bot closed this as completed Sep 15, 2021
@zube zube bot removed the [zube]: Done label Dec 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants