New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Meta issue: Pluggable backend #10321

Open
philips opened this Issue Dec 13, 2018 · 6 comments

Comments

3 participants
@philips
Copy link
Contributor

philips commented Dec 13, 2018

etcd supports an on-disk backend via bbolt. However, there is growing interest to support other backends. For example, cloud provider's might have a hosted database with appropriate semantics to act as the backend and etcd can run in a "serverless" environment.

As a concrete example: Microsoft announced a CosmosDB backend for etcd to support their AKS project. I think we should work to see if we can get that backend integrated into etcd which might prove out the model. @khenidak @mboersma WDYT?

@xiang90

This comment has been minimized.

Copy link
Contributor

xiang90 commented Dec 13, 2018

trying out badger as an alternative backend here: #10311

@philips

This comment has been minimized.

Copy link
Contributor Author

philips commented Jan 2, 2019

Post holiday bump @khenidak @mboersma.

🌲

@justinsb

This comment has been minimized.

Copy link
Contributor

justinsb commented Jan 15, 2019

I've tried this a few ways. While it's possible only to replace boltdb, the alternative is to reimplement the proto API to make something that is wire-compatible with etcd but doesn't share much/any code. Most backends that people are considering themselves support distributed consistency, so the raft implementation which is the bulk of what's here is not particularly applicable.

I've found that reimplementing the API produces a much better result, so I think we should prioritize efforts to make it easier for people to do that e.g. conformance tests, #10324, data export tools.

@philips

This comment has been minimized.

Copy link
Contributor Author

philips commented Jan 15, 2019

@xiang90

This comment has been minimized.

Copy link
Contributor

xiang90 commented Jan 15, 2019

Most backends that people are considering themselves support distributed consistency, so the raft implementation which is the bulk of what's here is not particularly applicable.

The thing that can be reused is the mvcc package which supports the etcd data model and the gRPC layer rpc implementation layer which handles stream demultiplexing. etcdserver itself can be stateless, and raft will not even be involved.

I've found that reimplementing the API produces a much better result,

Have you gave it a try? Do you have a reference implementation that we can take a look?

@justinsb

This comment has been minimized.

Copy link
Contributor

justinsb commented Jan 16, 2019

The backend efforts that I'm aware of are CosmosDB (existing), DynamoDB (stopped?) and FoundationDB (speculative). The dynamodb efforts took place at the k8s storage layer, and I think it was in response to that that CosmosDB re-implemented the etcd protocol instead, as that is our recommended direction per kubernetes/kubernetes#53162. I don't know of anyone specifically that I could cc, but of course the "implement the etcd protocol" strategy can be done without contributing to the OSS repos, so we don't have full visibility.

I've put together a few prototypes - one where I effectively integrated the storage layer into apiserver (big wins, but architecturally difficult), one where I just implemented the etcd API, and one where I replaced boltdb in the etcd source code with one of the cloud key-value stores. The storage layer is kubernetes/kubernetes#37536, and I'll see if I can open source the other two. But I found there was a lot of overhead from reusing the existing etcd code (both code overhead and efficiency overhead), and I was surprised in comparison how easy it was to just shim to an existing distributed storage system. It gets even easier if you are willing to leader-elect a single node to run the watches, which is likely less scalable than a fully scale-out implementation but is probably comparable to anything Raft based.

My view was that it wasn't clear that my reimplementations were going to be that much better than etcd (as I was doing single-node anyway), and it wasn't clear what the effect would be on the etcd project. But none of these were more than a few hours work, and I learned a lot about the tradeoffs by doing it, so I'd encourage people to give it a try!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment