Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: hcloud/setRescue: hcclient/WaitForActions: action _ failed: Unknown Error (unknown_error) #505

Closed
mnencia opened this issue Feb 17, 2022 · 12 comments
Labels

Comments

@mnencia
Copy link

mnencia commented Feb 17, 2022

What happened?

Sometimes Hetzner cloud servers fail to start.

If you run hcloud server list, you see the just created server is off.

see kube-hetzner/terraform-hcloud-kube-hetzner#49

What did you expect to happen?

The server goes up normally

Please provide a minimal working example

resource "hcloud_server" "first_control_plane" {
  name = "k3s-control-plane-0"

  image              = "ubuntu-20.04"
  rescue             = "linux64"
  server_type        = "cpx11"
  location           = "eu-central"
  ssh_keys           = [hcloud_ssh_key.k3s.id]
  firewall_ids       = [hcloud_firewall.k3s.id]
  placement_group_id = hcloud_placement_group.k3s.id

  connection {
    user           = "root"
    private_key    = local.ssh_private_key
    agent_identity = local.ssh_identity
    host           = self.ipv4_address
  }

  provisioner "file" {
    content = templatefile("${path.module}/templates/config.ign.tpl", {
      name           = self.name
      ssh_public_key = local.ssh_public_key
    })
    destination = "/root/config.ign"
  }

  provisioner "remote-exec" {
    inline = local.microOS_install_commands
  }

 ...
}
@mnencia mnencia added the bug label Feb 17, 2022
@mysticaltech
Copy link

Happens to me too many times!

@mnencia
Copy link
Author

mnencia commented Feb 17, 2022

Screenshot 2022-02-17 at 10 02 07

@mysticaltech
Copy link

I believe it fails to start sometimes when the rescue mode is requested.

@mysticaltech
Copy link

This has become a huge problem for us at https://github.com/kube-hetzner/kube-hetzner, in almost 50% of deploys it happens because we use the rescue mode to install a third-party OS on many nodes at once.

@LKaemmerling please do something about it, and any logs we can provide you, just tell us how to get them for you. Thanks!

@LKaemmerling
Copy link
Member

Hey @mysticaltech,

would it be possible that you give me server ids where this happened?

@mysticaltech
Copy link

@LKaemmerling Here you go, this one just happened: 18212271

I will leave it in my project for you and your team to investigate more. Thanks! 🙏

ksnip_20220223-081649

@mnencia
Copy link
Author

mnencia commented Feb 23, 2022

I just tried to provision a cluster with five nodes, and one remained off (id: 18212924)
Screenshot 2022-02-23 at 08 53 45

@LKaemmerling
Copy link
Member

Hey,

we just released v1.33.1 which contains an improvement for the situation. Can you please test it? It will be available in the next couple of minutes in the Terraform Registry.

@mysticaltech
Copy link

mysticaltech commented Feb 25, 2022

@LKaemmerling Thank you so much. However I have tested, and on the second try, one server, #18275324 stayed off, as before. Here are the details. Will not delete, so you can have a look if you want.

The initiating code is https://github.com/kube-hetzner/kube-hetzner

terraform --version
Terraform v1.1.6
on linux_amd64
+ provider registry.terraform.io/hashicorp/local v2.1.0
+ provider registry.terraform.io/hashicorp/null v3.1.0
+ provider registry.terraform.io/hashicorp/random v3.1.0
+ provider registry.terraform.io/hetznercloud/hcloud v1.33.1
+ provider registry.terraform.io/integrations/github v4.20.0
+ provider registry.terraform.io/tenstad/remote v0.0.23

hcloud server list after 5 minutes in:

ID         NAME                  STATUS    IPV4              IPV6                      DATACENTER
18275322   k3s-control-plane-2   running   78.46.194.108     2a01:4f8:c010:a0b1::/64   fsn1-dc14
18275323   k3s-agent-1           running   78.47.82.48       2a01:4f8:1c17:c7ac::/64   fsn1-dc14
18275324   k3s-agent-0           off       49.12.10.178      2a01:4f8:c17:8b1a::/64    fsn1-dc14
18275325   k3s-control-plane-0   running   116.202.98.33     2a01:4f8:1c17:f936::/64   fsn1-dc14
18275326   k3s-control-plane-1   running   142.132.188.100   2a01:4f8:c010:5d7f::/64   fsn1-dc14

ksnip_20220225-125443

ksnip_20220225-125454

@LKaemmerling
Copy link
Member

Hey @mysticaltech,

your server is online ;) Just the Server status is not correct. I passed it to the specific teams. Thanks!

@mysticaltech
Copy link

In that case, fantastic! Thank you so much... :) 🙏

@apricote
Copy link
Member

I am going to close this issue has the problem has been resolved. If this still occurs feel free to reopen the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants