Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add toleration for fluent-bit to run on masters #2671

Merged
merged 1 commit into from
Aug 7, 2020

Conversation

einfachnuralex
Copy link
Contributor

@einfachnuralex einfachnuralex commented Aug 6, 2020

How to categorize this PR?

/area logging
/kind enhancement
/priority normal

What this PR does / why we need it:
This PR adds tolerations to the fluent-bit daemonset, which runs on all seeds. For the gardener initial cluster these tolerations are needed to also collect logs from master nodes.

Which issue(s) this PR fixes:
Fixes #2638

Special notes for your reviewer:

Release note:

The fluent-bit DaemonSet is now tolerating the taint `node-role.kubernetes.io/master` with effect `NoSchedule`.

@einfachnuralex einfachnuralex requested a review from a team as a code owner August 6, 2020 06:09
@CLAassistant
Copy link

CLAassistant commented Aug 6, 2020

CLA assistant check
All committers have signed the CLA.

@gardener-robot gardener-robot added area/logging Logging related kind/enhancement Enhancement, improvement, extension priority/normal labels Aug 6, 2020
@gardener-robot
Copy link

@einfachnuralex Thank you for your contribution.

@gardener-robot-ci-1
Copy link
Contributor

Thank you @einfachnuralex for your contribution. Before I can start building your PR, a member of the organization must set the required label(s) {'reviewed/ok-to-test'}. Once started, you can check the build status in the PR checks section below.

@einfachnuralex einfachnuralex changed the title Add toleration for fluent-bit to tun on masters Add toleration for fluent-bit to run on masters Aug 6, 2020
@rfranzke
Copy link
Member

rfranzke commented Aug 6, 2020

/assign @amshuman-kr @vpnachev

@rfranzke
Copy link
Member

rfranzke commented Aug 6, 2020

/reviewed ok-to-test

vpnachev
vpnachev previously approved these changes Aug 6, 2020
Copy link
Member

@vpnachev vpnachev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/invite @amshuman-kr

rfranzke
rfranzke previously approved these changes Aug 6, 2020
Copy link
Member

@rfranzke rfranzke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Copy link

@amshuman-kr amshuman-kr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't the same thing be achieved by deploying gardener extensions gardener/kupid and a suitable ClusterPodSchedulingPolicy to inject tolerations specific to the particular masters into fluent-bit daemonsets rather than bake generic tolerations into seed-bootstrap manifests?

For example,

apiVersion: kupid.gardener.cloud/v1alpha1
kind: ClusterPodSchedulingPolicy
metadata:
  name: fluent-bit
spec:
  namespaceSelector: # this policy only applies to the garden namespace
    matchLabels:
      role: garden
  podLabels:         # this policy only applies to the fluent-bit daemonset and its pods
    matchLabels:
      app: fluent-bit
      role: logging
      gardener.cloud/role: logging
  tolerations:       # any specific tolerations can go here
  - key: node-role.kubernetes.io/master
    operator: Equal
    effect: NoSchedule
  - key: CriticalAddonsOnly
    operator: Exists

BTW there is also another gardener extension in the works (node-affinity-and-tolerations) to deploy specific ClusterPodSchedulingPolicy instances. It already supports injecting node affinity and/or tolerations though there is an assumption about the toleration key, value and effect based on the node labels which could easily be enhanced to support explicit tolerations specification.

For example, the node-affinity-and-tolerations extension's ControllerRegistration could be used as below to tolerate specific taints.

piVersion: core.gardener.cloud/v1beta1
kind: ControllerRegistration
metadata:
  name: node-affinity-and-tolerations
spec:
  deployment:
    type: helm
    policy: Always
  seedSelector: {} # optional to target specific seeds
  providerConfig:
    chart: ...
    values:
      targets:
        fluent-bit:
          nodeAffinity: false
            tolerations: true
            namespaceLabels:
              role: garden
            podLabels:
              app: fluent-bit
              role: logging
              gardener.cloud/role: logging
            nodeLabels:
              key: value    # As of now, these are used as the toleration key and value with NoExecute effect

@rfranzke
Copy link
Member

rfranzke commented Aug 6, 2020

IMO, the kupid extension is dedicated for more sophisticated use-cases and an unnecessary overhead just to get the fluent-bit deployed on master VMs of the seed.

@amshuman-kr
Copy link

IMO, the kupid extension is dedicated for more sophisticated use-cases and an unnecessary overhead just to get the fluent-bit deployed on master VMs of the seed.

Not disputing that. The point was only that there might be genuine use-cases for using NoSchedule and NoEffect taints to block all pods (including fluent-bit) from being scheduled or run on some nodes (say, for debugging). A generic NoSchedule or NoEffect taint would rule it out.

@rfranzke
Copy link
Member

rfranzke commented Aug 6, 2020

Hm, yes, agreed. Would it make sense to change the tolerations to

tolerations:
- key: node-role.kubernetes.io/master
  effect: NoSchedule

then?

@rfranzke
Copy link
Member

rfranzke commented Aug 7, 2020

@einfachnuralex can you please change the tolerations as mentioned in the comment earlier?
/status author-action
/needs changes

@gardener-robot
Copy link

@einfachnuralex The pull request was assigned to you under author-action. Please unassign yourself when you are done. Thank you.

@einfachnuralex
Copy link
Contributor Author

@amshuman-kr Yes, you are right. If we would use the gardener/kupid extension such a rule would be the right thing. ATM we are not using kupid so i also prefer a simpler solution, like adding this toleration.
Also i think for the default behavior of fluent-bit it make sense to include the master nodes for log aggregation.
@rfranzke Yes i will change it as you mentioned.

Changed to only master noschedule toleration
Copy link
Member

@rfranzke rfranzke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Thanks for your contribution!

Copy link
Member

@vpnachev vpnachev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/reviewed ok-to-test

@rfranzke rfranzke merged commit c2c9873 into gardener:master Aug 7, 2020
@nschad nschad deleted the add-fluentd-master-taint branch May 30, 2023 09:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/logging Logging related kind/enhancement Enhancement, improvement, extension
Projects
None yet
Development

Successfully merging this pull request may close these issues.

fluent-bit not collecting logs from master nodes
8 participants