Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Etcd plugin / middleware proposal #4965

Closed
mqliang opened this issue Apr 5, 2016 · 16 comments
Closed

Etcd plugin / middleware proposal #4965

mqliang opened this issue Apr 5, 2016 · 16 comments

Comments

@mqliang
Copy link
Contributor

mqliang commented Apr 5, 2016

Etcd plugin / middleware proposal


Motivation

  1. As a user, I'd like etcd support criteria query and secondary indexing.

    In some case, we'd like etcd support criteria query, for example: in k8s, Scheduler just want list&watch unscheduled pods(pod.spec.nodeName=""), Kubelet
    just want list&watch the pods assigned to itself(pod.spec.nodeName="XXX"). Current, k8s address this in API server: API server list&watch all pods, but will filter the results by ListOption and just return "valid" ones. Moving the filter logic to etcd brings some network and decoding benefits.

  2. As a user, I'd like to integrate the authentication and authorization of etcd with my own KeyStone service.

    I already have a KeyStone service, I'd like use my own KeyStone service to do the authentication and authorization of etcd: who can just read some keys, who can read and write those keys.

  3. As a user, I'd like to do server side data sharding, so for example I may want a federation with 3 clusters, where:

    • Cluster A contains hash slots from 0 to 100.
    • Cluster B contains hash slots from 100 to 200.
    • Cluster B contains hash slots from 200 to 300.

    Client could send request to anyone of the three clusters, cluster will serve the request if the key is in the cluster's slot, or just forward the request to another cluster in the federation.

One possibe approache to meet above requirements

  1. Support third-party plugin
    Expose some hook functions to third-party developers. For example, for all Put(), Delete(), Get() methods, third-party developers could defines their own preXXX and postXXX methods in a plugin. For some popular foramt such json, third-party developers may provide a good and widely-use plugin to support criteria query and secondary indexing.
  2. Introduce middleware or something alike when serve requests.
    If etcd could introduce middleware when serve requests, third-party developer could easily write their own middleware to authenticate/authorize requests, and forward requests based on some sharding rules.

As a open source project, even if we don't want support some features since we want make etcd "simple, easy to use, focusing on reliable key-value store", allowing third-party developer write their plugin and middleware to implement their feature they want seems not a bad thing.

@mqliang
Copy link
Contributor Author

mqliang commented Apr 5, 2016

@xiang90 @hongchaodeng @magicwang-cn @HardySimpson

Just my random thoughts for discussion. And there may be other better approaches to address the requirements.

@xiang90
Copy link
Contributor

xiang90 commented Apr 5, 2016

@mqliang What prevent you from doing 1, 2, 3 externally? It seems like all you need is a smart proxy.

@mqliang
Copy link
Contributor Author

mqliang commented Apr 5, 2016

@xiang90 Yes, using a proxy could also address the use cases. Does the "Horizontally scalable proxy layer" in roadmap mean this? If yes, I'd like the proxy layer be pluggable and extensible: plugin to support criteria query, middleware to support authentication/authorization and sharding.

@magicwang-cn
Copy link
Contributor

+1.

yes, with the huge load, client will return (401) frequently, because etcd has a default watch window(1000 size for all keys), and the etcd client does not support streaming results. so filter in server side will be useful.

In some case, we'd like etcd support criteria query, for example: in k8s, Scheduler just want list&watch unscheduled pods(pod.spec.nodeName=""), Kubelet
just want list&watch the pods assigned to itself(pod.spec.nodeName="XXX"). Current, k8s address this in API server: API server list&watch all pods, but will filter the results by ListOption and just return "valid" ones. Moving the filter logic to etcd brings some network and decoding benefits.

@HardySimpson
Copy link
Contributor

But the server side criteria query is not a proxy can do.
It can save many network cost and etcd's memory, prevent etcd from OOM

Is etcd v3 already support this? I'm not sure

@mqliang
Copy link
Contributor Author

mqliang commented Apr 5, 2016

side criteria query could only be supported by some plugin like mechanism, since etcd requires that key/value are both string type. Only if user know their data are of some format(eg. josn, yaml), they could write their own parsing logic to support criteria query.

@heyitsanthony
Copy link
Contributor

  1. Would there be additional API features etcd would have to support?
  2. A plug-in architecture means supporting another client interface. Is it worth the developer effort and codebase ossification to maintain compatibility?
  3. Is it very difficult to write an etcd proxy as a separate project? (in my experience, it is straightforward to write a kv proxy using the etcd3 client, although I haven't tried etcd/etcd proxying yet)

I think a proxy would be enough for most of these use cases and an embedded client (#4709) proxy would be enough for all of them.

@mqliang
Copy link
Contributor Author

mqliang commented Apr 5, 2016

@heyitsanthony

Would there be additional API features etcd would have to support?

A plug-in architecture means supporting another client interface. Is it worth the developer effort and codebase ossification to maintain compatibility?

IMO, criteria query does NOT need a plug-in architecture, request/response middleware could do all the job. Only secondary indexing may need a plug-in architecture support(support hook functions). But there seem no other calls for this feature, we can evaluate and decide whether we should support this.

Is it very difficult to write an etcd proxy as a separate project?

To be clear, I am not object to proxy, we can definitely implement all those features(except secondary indexing) using a proxy. But putting the criteria query logic to server side makes sense, and all we need is request/response middleware.

If we eventually decide implements those(or part of those) features using proxy, I'd like the proxy be pluggable and extensible, so that third-party developer could easily write their own middleware/plugin. I am also glad to help with such a smart proxy.

@heyitsanthony
Copy link
Contributor

@mqliang you're going to have to be more specific with what you mean by "middleware". Anything that has to interact with the internals of the etcd server instead of through RPCs is a non-starter since the internals can't be public interfaces.

@mqliang
Copy link
Contributor Author

mqliang commented Apr 5, 2016

@heyitsanthony Sorry for the confusing term, I should be more specific. "middleware" here means request/response middleware used to filter/transform/forward requests/response, which is widely used in web framework, for example: the HTTP middleware in the PHP laravel framework(see https://laravel.com/docs/master/middleware). Kubernetes also have such a middleware-like mechanism, kubernetes calls this admission controller

@xiang90
Copy link
Contributor

xiang90 commented Apr 5, 2016

@magicwang-cn

yes, with the huge load, client will return (401) frequently, because etcd has a default watch window(1000 size for all keys), and the etcd client does not support streaming results. so filter in server side will be useful.

This has nothing to do with server filter in my opinion. etcd3 has already solved this by maintaining a user defined history.

@xiang90
Copy link
Contributor

xiang90 commented Apr 5, 2016

@HardySimpson

But the server side criteria query is not a proxy can do. It can save many network cost and etcd's memory, prevent etcd from OOM

Poxy does not have to involve network. We can put proxy co-located with etcd server or we can embed etcd into a proxy, check #4435 for more details.

I am not sure how it will prevent etcd from OOM. As long as you store more keys or serve more concurrent requests than your machine can handle, it will OOM. The application developer now has to prevent this from happening. We are working on a "smart" rate limiter though, but it will take us a while to finish it.

@xiang90
Copy link
Contributor

xiang90 commented Apr 5, 2016

If we eventually decide implements those(or part of those) features using proxy, I'd like the proxy be pluggable and extensible, so that third-party developer could easily write their own middleware/plugin. I am also glad to help with such a smart proxy.

For the proxy, it will be a library. So basically you can import it and do whatever you want it to do. A "middleware" will just be another http.Handler or gPRC.Interceptor to put in front of the proxy library.

@xiang90
Copy link
Contributor

xiang90 commented Apr 5, 2016

@mqliang

For example, Readonly proxy is a so-called "middle-ware" for the actual proxy hander here: https://github.com/coreos/etcd/blob/master/proxy/proxy.go#L43-L61.

@mqliang
Copy link
Contributor Author

mqliang commented Apr 8, 2016

Ideas in the proposal deserve more specific issues. Close this. @xiang90 feel free to ping me if you guys are planning to implement/improve the proxy layer, or there are any related issues.

@mqliang mqliang closed this as completed Apr 8, 2016
@xiang90
Copy link
Contributor

xiang90 commented Apr 8, 2016

@mqliang Sure. We probably will start a design doc for the v3 proxy early May.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

5 participants