Recreate kind cluster on every evg host reboot #32

lucian-tosa · 2025-04-23T14:33:10Z

Summary

Evergreen hosts reboot every day or weekend (depending on your configuration). After every reboot, inter-cluster connectivity might be broken for some reason. Recreating the clusters is the only solution we have so far.
This is a systemd service that runs on every boot and recreates all clusters (including the kind-kind for single cluster tests).
Our tunnel command will now also get the kubeconfig from the host otherwise the tunnel won't open to the (new) ports of the recreated clusters.

The systemd service is not created by default. This needs to be explicitly done by running

evg_host.sh configure --auto-reboot

Remove the architecture flag, and instead left it to be inferred from uname.

Proof of Work

Tested locally, but it would be nice if someone can checkout this branch and try it for themselves.

Checklist

Have you linked a jira ticket and/or is the ticket in the title?
Have you checked whether your jira ticket required DOCSP changes?
Have you checked for release_note changes?

SimonBaeumer

I am split on this change, it risks loosing data or setups... I understand the desire to fix this, but on the other hand I don't like when my environment gets cleaned up automatically when it was still in use. For example, if I had run patches against deployments or single cluster deployments my environment will be resetted.

Can you add an env var to opt-in for this re-creation? Than each engineer can decide whether they want re-creation or not.

# Conflicts: # scripts/dev/recreate_kind_clusters.sh

github-actions · 2025-10-09T13:13:47Z

⚠️ (this preview might not be accurate if the PR is not rebased on current master branch)

MCK 1.5.0 Release Notes

New Features

Improve automation agent certificate rotation: the agent now restarts automatically when its certificate is renewed, ensuring smooth operation without manual intervention and allowing seamless certificate updates without requiring manual Pod restarts.

Bug Fixes

MongoDBMultiCluster: fix resource stuck in Pending state if any clusterSpecList item has 0 members. After the fix, a value of 0 members is handled correctly, similarly to how it's done in the MongoDB resource.
MultiClusterSharded: Blocked removing non-zero member cluster from MongoDB resource. This prevents from scaling down member cluster without current configuration available, which could lead to unexpected issues.

This reverts commit 07f049c.

Added flag

lsierant

LGTM! Trying it out immediately!

lucian-tosa added 3 commits April 23, 2025 17:32

Recreate kind cluster on every evg host reboot

5781a79

Fix dependency

5569ff0

Remove duplicate delete OM project

d56d502

lucian-tosa requested a review from a team as a code owner April 23, 2025 14:33

lucian-tosa requested review from SimonBaeumer and anandsyncs April 23, 2025 14:33

anandsyncs approved these changes May 14, 2025

View reviewed changes

SimonBaeumer previously requested changes May 14, 2025

View reviewed changes

lucian-tosa added 2 commits October 9, 2025 14:53

Merge branch 'refs/heads/master' into restart-kind-on-boot

8395f87

# Conflicts: # scripts/dev/recreate_kind_clusters.sh

Update paths

4b9374b

lucian-tosa added the skip-changelog Use this label in Pull Request to not require new changelog entry file label Oct 9, 2025

lucian-tosa added 9 commits October 9, 2025 15:34

Add flag

6771611

Revert changes

17e5edf

Cleanup script

9047c31

Remove download_kube_tools

07f049c

Verbose curl

2faec30

Revert "Remove download_kube_tools"

ab70f3f

This reverts commit 07f049c.

Fix linting errors

c83aeb7

Aggressive curl retries

1d47e0f

Remove --retry-all-errors

b1b0484

lucian-tosa added 2 commits October 10, 2025 11:56

Remove curl verbosity

326b75f

Merge branch 'refs/heads/master' into restart-kind-on-boot

7f06510

lsierant approved these changes Oct 10, 2025

View reviewed changes

lucian-tosa merged commit 7ca5fc8 into master Oct 10, 2025
33 of 37 checks passed

lucian-tosa deleted the restart-kind-on-boot branch October 10, 2025 12:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Recreate kind cluster on every evg host reboot #32

Recreate kind cluster on every evg host reboot #32

Uh oh!

lucian-tosa commented Apr 23, 2025 •

edited

Loading

Uh oh!

SimonBaeumer left a comment •

edited

Loading

Uh oh!

github-actions bot commented Oct 9, 2025

Uh oh!

lsierant left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Recreate kind cluster on every evg host reboot #32

Recreate kind cluster on every evg host reboot #32

Uh oh!

Conversation

lucian-tosa commented Apr 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Proof of Work

Checklist

Uh oh!

SimonBaeumer left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Oct 9, 2025

MCK 1.5.0 Release Notes

New Features

Bug Fixes

Uh oh!

lsierant left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

lucian-tosa commented Apr 23, 2025 •

edited

Loading

SimonBaeumer left a comment •

edited

Loading