Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for tolerating tainted nodes #584

Closed
paalkr opened this issue May 31, 2019 · 16 comments
Closed

Add support for tolerating tainted nodes #584

paalkr opened this issue May 31, 2019 · 16 comments
Assignees
Labels
component/longhorn-manager Longhorn manager (control plane) kind/feature Feature request, new feature
Milestone

Comments

@paalkr
Copy link

paalkr commented May 31, 2019

Ref
#583
#574

I have added dedicated nodes intended for Longhorn storage to my cluster, these nodes has EBS volumes attached. Other nodes does not have these extra storage volumes. So to make sure that the dedicated Longhorn nodes are not used for general purpose workloads they are tainted (and labeled).

Name:               ip-10-1-43-24.eu-west-1.compute.internal
Roles:              node,storage
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/instance-type=m5a.large
                    beta.kubernetes.io/os=linux
                    cputype=normal
                    failure-domain.beta.kubernetes.io/region=eu-west-1
                    failure-domain.beta.kubernetes.io/zone=eu-west-1a
                    kube-aws.coreos.com/autoscalinggroup=k8s6-Storage-1PWKNJJPPE5BU-Workers-C7BU4PXGMV6R
                    kube-aws.coreos.com/cluster=k8s6
                    kube-aws.coreos.com/launchconfiguration=
                    kube-aws.coreos.com/role=storage
                    kubernetes.io/hostname=ip-10-1-43-24.eu-west-1.compute.internal
                    kubernetes.io/role=node
                    node-role.kubernetes.io/node=
                    node-role.kubernetes.io/storage=
                    node.kubernetes.io/role=storage
Annotations:        flannel.alpha.coreos.com/backend-data: {"VtepMAC":"5a:be:3d:18:cc:89"}
                    flannel.alpha.coreos.com/backend-type: vxlan
                    flannel.alpha.coreos.com/kube-subnet-manager: true
                    flannel.alpha.coreos.com/public-ip: 10.1.43.24
                    kube-aws.coreos.com/securitygroups: k8s6-Network-1AJ9I9AGCOR35-SecurityGroupWorker-5M4RWB63CGHZ
                    node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Sat, 25 May 2019 16:42:42 +0200
Taints:             kube-aws.coreos.com/role=storage:NoSchedule
Unschedulable:      false

Currently there is no way to specify a taint for the engine-image DaemonSet. The result is that the Longhorn system doesn't start all it's components without manual editing of the DaemonSet created by the operator.

@yasker yasker added component/longhorn-manager Longhorn manager (control plane) kind/feature Feature request, new feature labels May 31, 2019
@yasker yasker added this to the v0.6.0 milestone May 31, 2019
@yasker
Copy link
Member

yasker commented Jun 10, 2019

Let's revisit the issue after the engine process refactoring is done.

@yasker yasker changed the title Add support for tainted nodes Add support for tolerations tainted nodes Jul 24, 2019
@yasker yasker changed the title Add support for tolerations tainted nodes Add support for tolerating tainted nodes Jul 24, 2019
@yasker yasker modified the milestones: v0.6.0, v0.7.0 Jul 24, 2019
@yasker
Copy link
Member

yasker commented Jul 24, 2019

It's unlikely we will have time to work on this during v0.6.0 release. Push it to v0.7.0 release. We can pick it up from v0.7.0 release if we decided to do a v0.6.1 release instead.

@yasker yasker modified the milestones: v0.7.0, v0.6.0 Aug 8, 2019
@jonstelly
Copy link

jonstelly commented Aug 13, 2019

For anyone else that finds themselves here, I just added a toleration (tolerate everything, just for testing) and it worked. I added the toleration to the longhorn-manager, engine-image, and longhorn-csi-plugin daemonsets.

(kubernetes dashboard / json)

"tolerations": [
  {
    "operator": "Exists"
  }
]

(yaml)

tolerations:
- operator: "Exists"

@paalkr
Copy link
Author

paalkr commented Aug 13, 2019

Absolutely, it's possible to make post deployment edits to make tainted nodes work, as I described in this comment #574 (comment)

@jonstelly
Copy link

Ah, I hadn't seen that, I saw that linked issue but didn't see your post with the toleration. Out of curiosity, is the engine-image the only place I needed to add the toleration? i.e. not longhorn-manager or csi-plugins?

The longhorn documentation is well written and this was really helpful, easier to digest than some of the other container attached storage options. But I have to admit I'm not quite sure yet what function/responsibility maps to which kubernetes component (csi-plugin, etc...)

@paalkr
Copy link
Author

paalkr commented Aug 14, 2019

It depends how you plan to operate the longhorn storage system, and how your kubernetes nodes are organized. I have dedicated nodes intended for longhorn storage backend which is tainted and labeled accordingly. And in that case you would also add taints to the manager DaemonSet. This is easier as the manager DaemonSet is something you deploy to your cluster yourself, and you can modify the asset before deployment.

apiVersion: apps/v1beta2
kind: DaemonSet
metadata:
  labels:
    app: longhorn-manager
  name: longhorn-manager
  namespace: longhorn-system
spec:
  selector:
    matchLabels:
      app: longhorn-manager
  template:
    metadata:
      labels:
        app: longhorn-manager
    spec:
      # nodeSelector:
        # kube-aws.coreos.com/role: storage    
      tolerations:
      - key: kube-aws.coreos.com/role
        operator: Equal
        value: storage
        effect: NoSchedule         
      containers:
...

The engin-image DaemonSet on the other hand is created by the longhorn controller/operator, so that's why you have to post edit that one.

@paalkr
Copy link
Author

paalkr commented Aug 14, 2019

@yasker @shuo-wu , will this longhorn/longhorn-manager#385 go into 0.6? I certainly hope so ;)

@yasker
Copy link
Member

yasker commented Aug 14, 2019

@paalkr Yes, this issue is targetted for v0.6.0.

@paalkr
Copy link
Author

paalkr commented Aug 14, 2019

That is absolutely brilliant. I think that the 0.6 release will great!

@meldafrawi
Copy link
Contributor

meldafrawi commented Aug 27, 2019

Validation: PASSED

Steps to test:

  • taint longhorn nodes with nodetype=storage:NoSchedule storage=longhorn:NoSchedule and on Settings/General page, set Kubernetes Taint Toleration to nodetype=storage:NoSchedule,storage=longhorn:NoSchedule

Expected Result: Longhorn components will be restarted.
Deployig any workload will be scheduled to non-tainted nodes.

@meldafrawi
Copy link
Contributor

Validation: PASSED

Steps to test:

  • Deploy a 3-node cluster, and install Longhorn
  • taint a node with storage=longhorn:NoExecute

Expected Result: Longhorn components will be removed from the tainted node.

  • In General/Settings page, set Kubernetes Taint Toleration to storage=longhorn:NoExecute

Longhorn components will be restarted, and Longhorn components will be rescheduled on the tainted node.

@virus2016
Copy link

This solution doesn't work in v1.

kubectl taint nodes pool-pieg1vnk9-3o6z5 storage=longhorn:NoExecute

Set Kubernetes Taint Toleration to 'storage=longhorn:NoExecute'

Is there another way to disable the node from runnning longhorn?

@shuo-wu
Copy link
Contributor

shuo-wu commented Jul 13, 2020

disable the node from runnning longhorn

Do you mean disabling nodes from running Longhorn? This requires you to set the taint for the nodes but not set the toleration for Longhorn.
The toleration setting in Longhorn is aimed to enable some nodes running Longhorn only.

@virus2016
Copy link

Taint the node but leave the longhorn field blank? I’ll add another comment soon with the taints etc and screenshots. I’m sure others are having issues with this?

@virus2016
Copy link

disable the node from runnning longhorn

Do you mean disabling nodes from running Longhorn? This requires you to set the taint for the nodes but not set the toleration for Longhorn.
The toleration setting in Longhorn is aimed to enable some nodes running Longhorn only.

Yes, I am looking to only have some nodes running longhorn.

@shuo-wu
Copy link
Contributor

shuo-wu commented Jul 21, 2020

Taint the node but leave the longhorn field blank? I’ll add another comment soon with the taints etc and screenshots. I’m sure others are having issues with this?

Yeah. I think you can check this post. It explains why Longhorn will be deployed on all worker nodes by default and why it's what we expect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/longhorn-manager Longhorn manager (control plane) kind/feature Feature request, new feature
Projects
Status: Closed
Development

No branches or pull requests

6 participants