Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Enhance scope of druid to create/manage additional resources done elsewhere #505

Closed
Tracked by #2
unmarshall opened this issue Jan 9, 2023 · 7 comments
Labels
area/control-plane Control plane related kind/enhancement Enhancement, improvement, extension status/closed Issue is closed (either delivered or triaged)

Comments

@unmarshall
Copy link
Contributor

unmarshall commented Jan 9, 2023

Feature (What you would like to be added):
Druid is an Operator for an etcd cluster. To that end today it creates and manages the following resources:

  • ConfigMap
  • Lease
  • Service
  • StatefulSet
  • PDB

It still depends on gardener to create the following resources:

  • Netpol for etcd members
  • Secrets which contains the keys/CA certificates
  • VPA/HVPA
  • Setting defragmentation and backup schedule
  • Deployment and Deletion of etcd-main and etcd-events resources

Since CA rotation and generation of keys is typically done for many components, we can keep that out for now. Similarly creation of etcd CR for etcd-main and etcd-events should be created by g/g. However, rest of the things that are done by g/g today should be revisited and some of it should be moved to druid. This makes it a complete operator which can then be easily used outside the gardener context as well.

So in a nutshell, any resource that is created for an etcd cluster should be done via druid operator. Any configuration or CA certificates + keys should be passed as configuration to the respective etcd CR so that druid can use these when creating the etcd cluster.

Motivation (Why is this needed?):
There seems to be 2 actors which are managing the etcd cluster. The intent of this feature is to delegate any creation/management task for any k8s resource created for an etcd cluster to druid as its purpose is to be an operator for any etcd cluster.

Approach/Hint to the implement solution (optional):

@unmarshall unmarshall added the kind/enhancement Enhancement, improvement, extension label Jan 9, 2023
@shreyas-s-rao
Copy link
Contributor

Netpol and HVPA/VPA can be deployed by druid, based on input received from the Etcd resource. However, values for defragmentation and backup schedules need to be provided by Gardener, given that these are either based on shoot maintenance window or randomized respectively. Deployment and Deletion of etcd-main and etcd-events is of course done by Gardener or any user of druid, since druid is an operator that simply acts on the deployed Etcd resource.

@unmarshall unmarshall added the area/control-plane Control plane related label Jan 9, 2023
@rfranzke
Copy link
Member

rfranzke commented Jan 9, 2023

I think network policies should be deployed by the "user" (gardener in our case). Only they know which other components exist and who might need to communicate with whom. The Etcd resource should just allow specifying labels for the components which the user can also use in their network policies.

VPA/HVPA is similar since etcd-druid should probably not make any assumptions about whether these components are installed to the cluster.

Regarding the schedules and the creation of the Etcd resources, I agree with Shreyas.

@unmarshall
Copy link
Contributor Author

unmarshall commented Jan 9, 2023

@rfranzke Thanks for your feedback!

VPA/HVPA is similar since etcd-druid should probably not make any assumptions about whether these components are installed to the cluster.

It will not make any assumption but it is possible to enhance etcd resource to define these declaratively. Would that make sense?

Regarding the schedules and the creation of the Etcd resources, I agree with Shreyas.

Yes that will not be done by druid.

I think network policies should be deployed by the "user" (gardener in our case). Only they know which other components exist and who might need to communicate with whom.

The idea was to make it declarative and include it as part of the etcd spec. For a multi-node etcd cluster druid could create Netpol for enabling communication amongst members (peer network policy). Any other network policy (client) could be created outside druid.

Would it make sense? Think of usage of druid as an operator outside gardener.

@rfranzke
Copy link
Member

It will not make any assumption but it is possible to enhance etcd resource to define these declaratively. Would that make sense?

I'm not sure whether it's a good idea to "hard-code" this into the API since there are also scaling means other than VPA/HVPA (e.g., custom metrics). If I were a user of this, would I open an issue/PR here to introduce my scaling mean as well in the Etcd API?
I'm not strongly objecting here, just saying that it seems to be a cleaner contract if the Etcd resource just provides which other resources it created (StatefulSet) in its status. Then, the user can decide what to do with it (create VPA/HVPA/whatever for it).

The idea was to make it declarative and include it as part of the etcd spec. For a multi-node etcd cluster druid could create Netpol for enabling communication amongst members (peer network policy). Any other network policy (client) could be created outside druid.

Would it make sense? Think of usage of druid as an operator outside gardener.

This would force people to use network policies also for other components that want to communicate with ETCD. A declarative label approach (which already exists) and the freedom for the user to decide what is needed sounds more appealing to me.

@unmarshall
Copy link
Contributor Author

just saying that it seems to be a cleaner contract if the Etcd resource just provides which other resources it created (StatefulSet) in its status. Then, the user can decide what to do with it (create VPA/HVPA/whatever for it).

Ok this is better and will also allow consumers to not be limited by VPA/HVPA.
I looked at one of the etcd resources and currently the status has the following:

Status:                                                                                                                                                                                                                                                           
│   Cluster Size:  1                                                                                                                                                                                                                                                │
│   Conditions:                                                                                                                                                                                                                                                     
│     Last Transition Time:  2023-01-10T02:30:51Z                                                                                                                                                                                                                   │
│     Last Update Time:      2023-01-10T08:10:41Z                                                                                                                                                                                                                   │
│     Message:               All members are ready                                                                                                                                                                                                                  │
│     Reason:                AllMembersReady                                                                                                                                                                                                                        │
│     Status:                True                                                                                                                                                                                                                                   │
│     Type:                  AllMembersReady                                                                                                                                                                                                                        │
│     Last Transition Time:  2023-01-10T02:35:47Z                                                                                                                                                                                                                   │
│     Last Update Time:      2023-01-10T08:10:41Z                                                                                                                                                                                                                   │
│     Message:               Snapshot backup succeeded                                                                                                                                                                                                              │
│     Reason:                BackupSucceeded                                                                                                                                                                                                                        │
│     Status:                True                                                                                                                                                                                                                                   │
│     Type:                  BackupReady                                                                                                                                                                                                                            │
│     Last Transition Time:  2023-01-10T02:30:51Z                                                                                                                                                                                                                   │
│     Last Update Time:      2023-01-10T08:10:41Z                                                                                                                                                                                                                   │
│     Message:               The majority of ETCD members is ready                                                                                                                                                                                                  │
│     Reason:                Quorate                                                                                                                                                                                                                                │
│     Status:                True                                                                                                                                                                                                                                   │
│     Type:                  Ready                                                                                                                                                                                                                                  │
│   Current Replicas:        1                                                                                                                                                                                                                                      │
│   Etcd:                                                                                                                                                                                                                                                           
│     API Version:  apps/v1                                                                                                                                                                                                                                         │
│     Kind:         StatefulSet                                                                                                                                                                                                                                     │
│     Name:         etcd-main                                                                                                                                                                                                                                       │
│   Members:                                                                                                                                                                                                                                                        
│     Id:                    af77f229f390c575                                                                                                                                                                                                                       │
│     Last Transition Time:  2023-01-10T02:30:51Z                                                                                                                                                                                                                   │
│     Name:                  etcd-main-0                                                                                                                                                                                                                            │
│     Reason:                LeaseSucceeded                                                                                                                                                                                                                         │
│     Role:                  Leader                                                                                                                                                                                                                                 │
│     Status:                Ready                                                                                                                                                                                                                                  │
│   Observed Generation:     11                                                                                                                                                                                                                                     │
│   Ready:                   true                                                                                                                                                                                                                                   │
│   Ready Replicas:          1                                                                                                                                                                                                                                      │
│   Replicas:                1                                                                                                                                                                                                                                      │
│   Service Name:            etcd-main-client                                                                                                                                                                                                                       │
│   Updated Replicas:        1   

Currently we do not capture all resources that were created by druid for a given etcd custom resource. We could enhance that.

@rfranzke
Copy link
Member

There is already .status.etcd which is about the StatefulSet. Not sure whether it's necessary to "pollute" the status with all the other (probably less-interesting) resources.

@unmarshall
Copy link
Contributor Author

/close as per the discussion above this is not required.

@gardener-robot gardener-robot added the status/closed Issue is closed (either delivered or triaged) label Mar 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/control-plane Control plane related kind/enhancement Enhancement, improvement, extension status/closed Issue is closed (either delivered or triaged)
Projects
None yet
Development

No branches or pull requests

4 participants