-
Notifications
You must be signed in to change notification settings - Fork 657
Cannot ssh into v0.5.0 instance on AWS #1090
Comments
This is happening to us too. 0.4.2 and 0.4.5 upgrades 0.5.0 on 4 hosts, all the same, cannot SSH into the machines anymore (
Basically kills the We've tried with interactive password too, but that does not work either. |
Having looked into this, I believe our error is due to the fact that RancherOS now somewhere does a |
We managed to move away from changing the uid of |
I managed to use the Then ran |
@pulberg I have been unable to reproduce. I have tried upgrading from v0.4.5 in the default console and in the ubuntu console. With both times, after a couple of minutes, I was able to ssh in without any issues. Did you make any changes like @michaellopez in terms of user? Could you provide me the results of |
@deniseschannon I don't have any customizations or changes to the AMI, here is the config export -
|
@pulberg When you started the AMI, did you pass in any cloud-config under user data or did you only use the key pair through AWS? |
@deniseschannon I don't pass in any cloud config, just use the key pair through AWS. |
@pulberg I was finally able to get a box to reproduce this issue! :) But it took many attempts and the steps were exactly the same as my other boxes that never hit this issue. We need to look further into it. |
@deniseschannon I just love how you and your colleagues never give up making your products better. It is very motivating to use your products knowing that they are backed by a fantastic team. Thank you so much and keep up the exemplary work! You are an inspiration. Here, have some cake 🍰 |
This seems to be some condition where for some reason the kernel gets stuck in booting. I have only been able to hit this issue once out of 20-30 times. The workaround would be to manually reboot the host. |
Actually, after reviewing the "Get Instance Screenshot" will end up showing "Booting the kernel." even if the kernel has booted. We need to capture the "Get System Log" for when this occurs. Note: We have had users report of having the issue, but have yet to consistently reproduce. We will keep trying to investigate. |
This is happening to me relatively frequently running in us-west-2 on m3.medium instance types. I checked the system log for the most recent failure and this is what I got:
|
Awesome.... looks like a Docker bug. We'll look into it. Thanks. |
Using v0.6.0-rc4, I launched 25 AMIs and was able to ssh into all instances. |
I think I somehow ran into this as well, is there a way to recover the console? First I did changed rancher-server to stable, followed the instruction for single node with bind volume. That part worked (including removing the old container). Then I did a Setting up this machine again is not a big deal, but how do you do this when the computer is not next to you but in a datacenter. That's why I wanted to ask about recovering the console. My initial cloud-init:
|
RancherOS Version: (ros os version) v0.5.0
Where are you running RancherOS? (docker-machine, AWS, GCE, baremetal, etc.) AWS
I noticed that after upgrading from v0.4.5 to v0.5.0 I could not ssh into the host, all attempts failed with a “connection refused” message –
I had to reboot the host from the AWS console, after it came up I was able to ssh again. This was not a 1 time incident, this has now happened to 4 hosts being upgraded consecutively.
Upgrading from RancherOS v0.4.5, AMI - ami-812ec0ec
Command used to upgrade host - sudo ros os upgrade
The text was updated successfully, but these errors were encountered: