Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Microos snapshot stage 2 #582

Merged

Conversation

ifeulner
Copy link
Contributor

Hi, I don't know why but the last pull request has vanished..

Anyway, this know removes the initial snapshot and just uses the packer based snapshot for all created servers now, including the cluster autoscaler nodes.
Please have a look and test it, hadn't hat much time for it yet.

But as recently came the request, to make this work also in terraform cloud, I see the following alternatives:

  1. fall back to have the packer step outside of the terraform script (initial idea). Needs some additional "glue" scripting
  2. avoid packer, do the snapshot with terraform - either by a separate script or include it here. Needs some additional "glue" scripting
  3. Include the snapshot creation in the "host" module. Should be possible (when no image_idgiven, create first a snapshot, otherwise use the image_id. But Increases the complexity of the script

For me, using packer/ or a separate terraform script to create the snapshot and assume the image_id in the terraform script would be the simpliest, cleanest way, as we can get rid of a lot of code in the main script.

What do you think?

Comment on lines +122 to +127
# reboot!
power_state:
delay: now
mode: reboot
message: MicroOS rebooting to reflect changes
condition: true
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ifeulner Why is this needed? As it already boots into the correct btrfs snapshot.

Copy link
Contributor Author

@ifeulner ifeulner Feb 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But within the first boot the install of k3s happens, which will includes a new transactional-update. That's why also the k3s service is only enabled and not started.
This also ensure that the hostname etc. is set right.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ifeulner Yes, but what I do not understand is why the reboot at the end of cloud-init since it's not even booted yet.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our install of k3s does not do a transactional-update.

most_recent = true
}

resource "null_resource" "packer" {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the backup terraform mechanism can happen here?!

@@ -8,6 +8,7 @@ module "agents" {
for_each = local.agent_nodes

name = "${var.use_cluster_name_in_node_name ? "${var.cluster_name}-" : ""}${each.value.nodepool_name}"
microos_image_id = data.hcloud_image.microos_image.id
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since at other places snapshot_id is used, we should keep the naming consistent, easy fix.

@mysticaltech
Copy link
Collaborator

mysticaltech commented Feb 13, 2023

@ifeulner About packer and terraform, I'm all for #1 and it would be a pleasure working on a small bash script to make the magic happen.

So we would need pure terraform logic that would kick in if no snapshot is detected, meaning it did not run via the init.sh script that calls packer.

This is great because we can do plenty of things like ensure that all the required CLIs are present, like terraform, packer, and ideally also hcloud.

But first, let's keep it simple! Just saving them to type both the packer and terraform command is a great start.

@mysticaltech
Copy link
Collaborator

About the previous PR, it's here #557 and you yourself closed it.

And thanks for this new one, looking great! Will test ASAP 🙏 🚀

@mysticaltech
Copy link
Collaborator

@ifeulner Let's forget the bash proposal above for now and instead use a snapshot_deploy_method variable, that would default to packer integrated and called via local-exec as you do already, but also have a pure terraform version like in the current master.

@@ -60,33 +59,6 @@ resource "hcloud_server" "server" {
EOT
}

# Install MicroOS
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ifeulner So this logic could come back and execute based on the value
of snapshot_deploy_method if it's equal to "terraform" for instance.

@mysticaltech
Copy link
Collaborator

mysticaltech commented Feb 16, 2023

@ifeulner Please let me know what you think of the above proposals. Also, to keep things well compartimentalized, I propose moving the packer logic to the host module, or a new module of its own, along with the terraform snapshot creation logic.

Basically, making sure that each concerns is well compartmentalized, while staying compatible with terraform cloud when the right options are chosen (in our case snapshot_deploy_method="terraform" for instance).

@mysticaltech
Copy link
Collaborator

mysticaltech commented Feb 16, 2023

One last thing, and maybe we can clarify this in a subsequent PR is that, we need to inform which variables can be changed after the initial apply, with or without terraform destroy -target null_ressource.kustomization && terraform apply.

@ifeulner
Copy link
Contributor Author

@ifeulner Please let me know what you think of the above proposals. Also, to keep things well compartimentalized, I propose moving the packer logic to the host module, or a new module of its own, along with the terraform snapshot creation logic.

Basically, making sure that each concerns is well compartmentalized, while staying compatible with terraform cloud when the right options are chosen (in our case snapshot_deploy_method="terraform" for instance).

Hi @mysticaltech there are multiple options now, but it‘s more about „packaging“ and not functionality anymore.
I would prefer to keep it as simple as possible, though. So Option 1 would be my preference.
But I understand you want to keep the terraform-only way, right?

@mysticaltech
Copy link
Collaborator

@ifeulner Yes, option one is fine, but let definitely with support for terraform only! 🙏

@gthieleb
Copy link
Contributor

gthieleb commented Feb 20, 2023

Using packer is a great idea. I suggest to keep packer and terraform logic decoupled. Tried tying together in the past and it makes things more complex. I also think a lot of the current remote-exec tf code could be refactored to be part of the image and glued together with cloud-init code. What do you think?

@otavio
Copy link
Contributor

otavio commented Feb 20, 2023

Using external logic for the snapshot allows for decoupling and easier code maintenance, but more critical is simplifying the snapshot update without needing to recreate the cluster.

@mysticaltech mysticaltech changed the base branch from master to snapshot-deploy February 20, 2023 16:22
@mysticaltech mysticaltech merged commit 255fdf7 into kube-hetzner:snapshot-deploy Feb 20, 2023
@mysticaltech mysticaltech mentioned this pull request Feb 20, 2023
@mysticaltech
Copy link
Collaborator

@ifeulner I created this new branch snapshot-deploy and opened #598, we can continue there and more people can participate if they want to.

@gthieleb Thanks for sharing your experience, I like your ideas, reducing the remote-execs if possible would be ideal, please don't hesitate to open a PR targeting the new branch mentioned above.

@aleksasiriski Same for you man, if inspired the new branch above should be the target.

Let's make this awesome folks! 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants