Skip to content
This repository has been archived by the owner on Jun 29, 2022. It is now read-only.

Fix self-hosted kubelet on bare metal platform #436

Merged
merged 11 commits into from
May 29, 2020

Conversation

invidian
Copy link
Member

This PR fixes crashing self-hosted kubelet DaemonSet on bare metal and enables the tests for it on this platform, to make sure that it is actually fixed.

The PR use newly introduced Terraform modules from PR #392, to avoid the differences between the Ignition configuration across platforms, which caused this exact bug.

Closes #376

@invidian invidian force-pushed the invidian/bare-metal-fix-self-hosted-kubelet branch 8 times, most recently from fe36593 to f2f58a2 Compare May 18, 2020 12:33
@invidian invidian force-pushed the invidian/bare-metal-fix-self-hosted-kubelet branch 9 times, most recently from d7d8b0f to 1e2e08d Compare May 19, 2020 11:09
@invidian invidian marked this pull request as ready for review May 19, 2020 11:43
@invidian invidian force-pushed the invidian/bare-metal-fix-self-hosted-kubelet branch from 1e2e08d to dcebfdc Compare May 19, 2020 11:43
@iaguis iaguis force-pushed the invidian/bare-metal-fix-self-hosted-kubelet branch from dcebfdc to af2e06d Compare May 20, 2020 09:34
Copy link
Contributor

@iaguis iaguis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A small nit but LGTM after that.

So in the end we're not using the new Terraform modules here?

@invidian
Copy link
Member Author

So in the end we're not using the new Terraform modules here?

No, because it changes 10 other things, it seems too much of a burden for me. I created the issues for all those things instead. You can see those changes here though: https://github.com/kinvolk/lokomotive/compare/invidian/bare-metal-fix-self-hosted-kubelet-backup?expand=1

@invidian invidian force-pushed the invidian/bare-metal-fix-self-hosted-kubelet branch from af2e06d to 5e2a72e Compare May 20, 2020 10:13
iaguis
iaguis previously approved these changes May 20, 2020
Copy link
Contributor

@iaguis iaguis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@invidian invidian requested a review from iaguis May 28, 2020 08:11
iaguis
iaguis previously approved these changes May 28, 2020
Copy link
Contributor

@iaguis iaguis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TL;DR: lgtm

In principle I agree with @rata, I value more consistency especially if it's exposed to the user.

However, I see that we now have:

  • Packet: string
  • AWS: nothing
  • AKS: map[string]string
  • bare-metal (this PR): []string

So even if we choose string we won't be consistent across platforms. So I don't mind either way in this PR, but with #489 we should make these consistent.

@invidian
Copy link
Member Author

I'd change it to map[string]string, as this is also how you define labels in Kubernetes.

@invidian invidian force-pushed the invidian/bare-metal-fix-self-hosted-kubelet branch 4 times, most recently from 00d9ae9 to 2e5f1c9 Compare May 29, 2020 09:48
@invidian invidian requested a review from iaguis May 29, 2020 11:20
@invidian invidian requested a review from iaguis May 29, 2020 14:55
@invidian
Copy link
Member Author

@iaguis I implemented the test, please have a look.

Copy link
Contributor

@iaguis iaguis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A nit and a question.

To be able to test fix for #376 in the CI and to make sure we don't make
a regression there.

Signed-off-by: Mateusz Gozdek <mateusz@kinvolk.io>
Now we have some baremetal e2e tests, so they should be built and linted
as part of the CI process.

Signed-off-by: Mateusz Gozdek <mateusz@kinvolk.io>
Signed-off-by: Mateusz Gozdek <mateusz@kinvolk.io>
This commit adds 'configure-kubelet-cgroup-driver' script to bare-metal
nodes, which is then executed before kubelet systemd unit starts to
generate kubelet configuration file with automatically detected cgroup
driver to be used by kublelet.

This allows to use FCL Edge channel on bare metal nodes, as Edge channel
has different default cgroup driver than kubelet.

Generating mentioned configuration file also fixes crashing self-hosted
kubelet on bare metal platform, as it expects the file to be present on
host filesystem.

This feature is already implemented in the same way on other platforms.

Closes #376.

Signed-off-by: Mateusz Gozdek <mateusz@kinvolk.io>
This is required for self-hosted kubelet tests to pass as well.

Part of #376.

Signed-off-by: Mateusz Gozdek <mateusz@kinvolk.io>
Part of #376.

Signed-off-by: Mateusz Gozdek <mateusz@kinvolk.io>
This commit disables self-hosted kubelet tests for picking up correct
node labels, as it is flaky because of runc bug in current release of
Flatcar stable, which makes kubelet pod to never terminate when the pod
is scheduled for removal.

Once the fix of runc reaches Flatcar stable, this test should be
re-enabled.

See also #110

Signed-off-by: Mateusz Gozdek <mateusz@kinvolk.io>
So the template syntax is checked by unit tests.

Signed-off-by: Mateusz Gozdek <mateusz@kinvolk.io>
This commit changes the template method we use in baremetal Terraform
code responsible for generating workers Ignition configs from
template_file data source coming from 3rd party Terraform provider
to built-in 'templatefile' function, which is available from
Terraform 0.12, as it provides the exact same functionality, but
do not require downloading 3rd party provider.

Also 'template' provider recommends using this function:
https://www.terraform.io/docs/providers/template/d/file.html.

This also allows passing parameters to the template in complex types
(like lists) and call functions like 'join' inside the template, which
better separates data from how they are rendered.

Part of #196

Signed-off-by: Mateusz Gozdek <mateusz@kinvolk.io>
Part of #440.

Signed-off-by: Mateusz Gozdek <mateusz@kinvolk.io>
Now that we don't run disruptive test for self-hosted kubelet, we should
have a test, that the labels configured by the user for the worker pool
are actually applied on the node objects.

Signed-off-by: Mateusz Gozdek <mateusz@kinvolk.io>
@invidian invidian force-pushed the invidian/bare-metal-fix-self-hosted-kubelet branch from ff236ce to 6a2c4a8 Compare May 29, 2020 15:20
Copy link
Contributor

@iaguis iaguis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@invidian invidian requested a review from rata May 29, 2020 15:43
@invidian invidian dismissed rata’s stale review May 29, 2020 15:48

Dismissing as discussed.

@invidian invidian merged commit e849203 into master May 29, 2020
@invidian invidian deleted the invidian/bare-metal-fix-self-hosted-kubelet branch May 29, 2020 15:49
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Self-hosted kubelet does not work on bare-metal, missing /etc/kubernetes/kubelet.config
3 participants