-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Open
Labels
area/etcdpriority/important-longtermImportant over the long term, but may not be staffed and/or may need multiple releases to complete.Important over the long term, but may not be staffed and/or may need multiple releases to complete.
Description
We have a "managed" cluster datastore driver interface, but it isn't well used used. The interface is defined here:
k3s/pkg/cluster/managed/drivers.go
Lines 15 to 29 in 3d0e0e2
| type Driver interface { | |
| SetControlConfig(config *config.Control) error | |
| IsInitialized() (bool, error) | |
| Register(handler http.Handler) (http.Handler, error) | |
| Reset(ctx context.Context, reboostrap func() error) error | |
| IsReset() (bool, error) | |
| ResetFile() string | |
| Start(ctx context.Context, clientAccessInfo *clientaccess.Info) error | |
| Restore(ctx context.Context) error | |
| EndpointName() string | |
| Snapshot(ctx context.Context) (*SnapshotResult, error) | |
| ReconcileSnapshotData(ctx context.Context) error | |
| GetMembersClientURLs(ctx context.Context) ([]string, error) | |
| RemoveSelf(ctx context.Context) error | |
| } |
There are many issues with this interface:
- We have only one implementation of this interface: embedded etcd. At one point dqlite was planned to provide another implementation, but this never went anywhere.
- The interface does not cover all the things we need to do with the cluster datastore, and a large number of things just call around it into the
etcdpackage directly. - Cluster bootstrap code assumes that the presence of a managed driver indicates managed etcd, and tries to either communicate with another etcd cluster member, or start up a temporary etcd instance to extract the bootstrap data. This should all be handled by the driver interface.
Examples of the leakiness of the abstraction abound in the cluster startup code, for example:
Lines 50 to 59 in 3d0e0e2
| if c.managedDB != nil { | |
| if c.config.DisableETCD { | |
| // secondary server with etcd disabled, start the etcd proxy so that we can attempt to use it | |
| // when reconciling. | |
| if err := c.startEtcdProxy(ctx); err != nil { | |
| return pkgerrors.WithMessage(err, "failed to start etcd proxy") | |
| } | |
| } else if isInitialized && !clusterReset { | |
| // For secondary servers with etcd, first attempt to connect and reconcile using the join URL. | |
| // This saves on having to start up a temporary etcd just to extract bootstrap data. |
We should either use this interface, or get rid of it
From a future-proofing perspective, I think that the best path forward would be to enforce its use:
- Add a mock driver implementation that can be used in tests to validate that we are not bypassing the interface anywhere
- Rework kine to function as a managed driver implementation, returning NotImplemented errors as necessary.
- Rework managed etcd (including snapshot save/reset/restore) to function only through the interface, and remove anything that pokes directly at etcd
Metadata
Metadata
Assignees
Labels
area/etcdpriority/important-longtermImportant over the long term, but may not be staffed and/or may need multiple releases to complete.Important over the long term, but may not be staffed and/or may need multiple releases to complete.
Type
Projects
Status
Accepted