-
-
Notifications
You must be signed in to change notification settings - Fork 316
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Microos snapshot stage 2 #582
Microos snapshot stage 2 #582
Conversation
…at the nodes. Still work to do...
# reboot! | ||
power_state: | ||
delay: now | ||
mode: reboot | ||
message: MicroOS rebooting to reflect changes | ||
condition: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ifeulner Why is this needed? As it already boots into the correct btrfs snapshot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But within the first boot the install of k3s happens, which will includes a new transactional-update. That's why also the k3s service is only enabled and not started.
This also ensure that the hostname etc. is set right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ifeulner Yes, but what I do not understand is why the reboot at the end of cloud-init since it's not even booted yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our install of k3s does not do a transactional-update.
most_recent = true | ||
} | ||
|
||
resource "null_resource" "packer" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe the backup terraform mechanism can happen here?!
@@ -8,6 +8,7 @@ module "agents" { | |||
for_each = local.agent_nodes | |||
|
|||
name = "${var.use_cluster_name_in_node_name ? "${var.cluster_name}-" : ""}${each.value.nodepool_name}" | |||
microos_image_id = data.hcloud_image.microos_image.id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since at other places snapshot_id is used, we should keep the naming consistent, easy fix.
@ifeulner About packer and terraform, I'm all for #1 and it would be a pleasure working on a small bash script to make the magic happen. So we would need pure terraform logic that would kick in if no snapshot is detected, meaning it did not run via the init.sh script that calls packer. This is great because we can do plenty of things like ensure that all the required CLIs are present, like terraform, packer, and ideally also hcloud. But first, let's keep it simple! Just saving them to type both the packer and terraform command is a great start. |
About the previous PR, it's here #557 and you yourself closed it. And thanks for this new one, looking great! Will test ASAP 🙏 🚀 |
@ifeulner Let's forget the bash proposal above for now and instead use a |
@@ -60,33 +59,6 @@ resource "hcloud_server" "server" { | |||
EOT | |||
} | |||
|
|||
# Install MicroOS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ifeulner So this logic could come back and execute based on the value
of snapshot_deploy_method
if it's equal to "terraform" for instance.
@ifeulner Please let me know what you think of the above proposals. Also, to keep things well compartimentalized, I propose moving the packer logic to the host module, or a new module of its own, along with the terraform snapshot creation logic. Basically, making sure that each concerns is well compartmentalized, while staying compatible with terraform cloud when the right options are chosen (in our case |
One last thing, and maybe we can clarify this in a subsequent PR is that, we need to inform which variables can be changed after the initial apply, with or without |
Hi @mysticaltech there are multiple options now, but it‘s more about „packaging“ and not functionality anymore. |
@ifeulner Yes, option one is fine, but let definitely with support for terraform only! 🙏 |
Using packer is a great idea. I suggest to keep packer and terraform logic decoupled. Tried tying together in the past and it makes things more complex. I also think a lot of the current remote-exec tf code could be refactored to be part of the image and glued together with cloud-init code. What do you think? |
Using external logic for the snapshot allows for decoupling and easier code maintenance, but more critical is simplifying the snapshot update without needing to recreate the cluster. |
@ifeulner I created this new branch snapshot-deploy and opened #598, we can continue there and more people can participate if they want to. @gthieleb Thanks for sharing your experience, I like your ideas, reducing the remote-execs if possible would be ideal, please don't hesitate to open a PR targeting the new branch mentioned above. @aleksasiriski Same for you man, if inspired the new branch above should be the target. Let's make this awesome folks! 🙏 |
Hi, I don't know why but the last pull request has vanished..
Anyway, this know removes the initial snapshot and just uses the packer based snapshot for all created servers now, including the cluster autoscaler nodes.
Please have a look and test it, hadn't hat much time for it yet.
But as recently came the request, to make this work also in terraform cloud, I see the following alternatives:
packer
step outside of the terraform script (initial idea). Needs some additional "glue" scriptingimage_id
given, create first a snapshot, otherwise use the image_id. But Increases the complexity of the scriptFor me, using packer/ or a separate terraform script to create the snapshot and assume the image_id in the terraform script would be the simpliest, cleanest way, as we can get rid of a lot of code in the main script.
What do you think?