-
Notifications
You must be signed in to change notification settings - Fork 39k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial proposal for node-local services #28637
Conversation
@therc it seems nice to me, and the goal is something I really think it needs to be solved. But what are other alternatives? For example, DaemonSet + Host network (then the pod connects to the node IP, that it can be known. The port is a problem with this, and unacceptable, but just to say something). Or why not extend DaemonSet in some way? I'm not saying that any of these should be done, I'm just curious about the alternatives not mentioned in the proposal. The service is becoming something that really does tons of different things depending on the params. I'm not saying it's not worth it, in fact I think it probably is and there is no other alternative now, just that it makes me wonder harder about the alternatives :) Thanks! |
The alternative I currently employ is a DaemonSet with hostPort (not the whole host network), but there is no way at the moment to obtain the node IP, so I use a magic external IP address. The nodes are set up outside of Kubernetes with a PREROUTING DNAT rule so that the magic address is redirected to the host. Applications are compiled with the magic IP address built-in. DaemonSet could be extended, but then it would just result in multiple cases of duplication, in most of the logic already handled by the service controller: VIP allocation in the controller manager, DNS exporting in kube-dns, endpoint watching and iptables management in kube-proxy. Those are code paths that only worry about services right now and would have to start caring about DaemonSets, too, while trying to avoid race conditions and clashes. What hostnames would kube-dns now have to return, if not X.svc.* because it's no longer a service? |
@therc Ohh, I see. Yeah, that alternative sucks :-/. I think you can get the node name via the downwards API, as it is in the pod yaml, and that resolves to the IP in my AWS cluster at least. But not sure if the downwards API would do that and, in any case, it doesn't really solve the issue. Yeah, very good points! I'm more convinced now! Sorry for disturbing, I was honestly curious :-) |
I think we have consensus, I just need to unify downward API. I would like for the service to be able to handle locality cleanly. |
Could we make an affinity rule instead for services that prefer local endpoints (VS remote ones)? |
Something like adding a new ServiceAffinity, RequireNodeLocal? Or even a second one, PreferNodeLocal. Because we're talking about potentially long-lived connections, some applications might prefer being told temporarily that there's no endpoint, rather than sticking with a suboptimal one for an indefinite amount of time. The latter scenario is guaranteed to happen when a daemonset gets updated. |
I agree there's potentially value in two affinity settings. I guess you On Fri, Jul 8, 2016 at 7:39 PM, Rudi C notifications@github.com wrote:
|
I updated the document to use a new ServiceAffinity type and mention more use cases. No more "magic" |
dc6e5a7
to
516b626
Compare
Replaced all "hostlocal" with "nodelocal" (if even I can't keep them straight...), rebased and squashed. |
I started to implement a prototype and the main issue seems to be that kube-proxy has no clue what the node's IP is. There's also a TODO about at least one optimization where plumbing the IP address into |
I like this. One thought: the client of a node-local service should not care that the service is node-local or a "classic" service, or some other kind of implementation. To the client, it should just look like a service that does the right thing. |
the assumption that the service and DaemonSet are in the same namespace seems restrictive. What motivates this assumption? |
The namespace assumption was first mentioned when I had the magic selector, I think. Still, even with a regular service, pointing to pods in a different namespace is not trivial, right? I'll just strike that sentence out. And yes, for clients, this should look just like any other service. |
Also, this needs to move to |
/cc @marun |
ref/ #15675 |
[APPROVALNOTIFIER] Needs approval from an approver in each of these OWNERS Files: We suggest the following people: |
Could someone please clarify if annotating a clusterIP service with If not, what is the currently recommended way to achieve that in 1.5 ? Daemonset + hostPort + getting status.hostIP from the downward API, and pointing the client to that ? My use case is simply to send metrics from application pods to their local statsd/Datadog agent, launched on every node by a daemonset. |
OnlyLocal does not affect pod clients - only external traffic. What you're
asking for is not well supported yet. The best answer is probably to get
the node's IP and connect to a hostport there.
…On Fri, Mar 31, 2017 at 10:52 AM, Renaud Guérin ***@***.***> wrote:
Could someone please clarify if annotating a clusterIP service with
externalTraffic=OnlyLocal is good enough to ensure that client pods will
only be directed to same-node service endpoints ?
If not, what is the currently recommended way to achieve that in 1.5 ?
Daemonset + hostPort + getting status.hostIP from the downward API, and
pointing the client to that ?
My use case is simply to send metrics from application to the local
statsd/Datadog agent launched by a daemonset.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#28637 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AFVgVDTUzjrNLooa8BYwxNGWgeHd3Q3Sks5rrT1HgaJpZM4JHhID>
.
|
- logging agents such as fluentd | ||
- authenticating proxies such as [aws-es-proxy](https://github.com/kopeio/aws-es-proxy), | ||
[kube2iam](https://github.com/jtblin/kube2iam) or loasd ([#2209](https://github.com/kubernetes/kubernetes/issues/2209)) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also: - VM launching pods speaking to a libvirtd pod on the same node as used by [KubeVirt](https://github.com/kubevirt)
|
||
## Detailed discussion | ||
|
||
Node-local services can reuse most of the existing plumbing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on taking more then the host as a locality measure. I also like that it references a label to use as the locality value.
|
||
and happens to be crazy (if it can be made to work reliably at all). | ||
|
||
## Implementation plan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we would only support DNS based lookups, couldnÄt this be solved on the node-level if the kubelet or some other component was a DNS resolver?
The node local DNS resolver would resolve a fqdn into a local address whenever the backing service is considered to be node-local.
Otherwise it would just dispatch the request to kube-dns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fabiand there is nothing right now in kubelet that works as the kind of DNS proxy that you describe, although it's an interesting idea.
Should this be limited to
Before I switched to k8s I just used [1] |
@klausenbusk DaemonSets make the most sense, as the proposal mentions. But your example shows that it could work also for regular pods with taints and tolerations. |
We are still missing a LGTM. Do we have consensus on if the scope should be increased to address some of the open questions? |
Might we benefit from keeping the scope more narrow for the initial implementation and experimentation? |
I'm an outsider to this conversation but I vote in favor of a narrow scope for initial implementation and feedback. At least to determine how useful this is (I need it, for one). |
I fully ack that maybe something sooner and less ideal would be
acceptable. Interestingly, there's work in storage-land that is touching
on this topology stuff, too, and so far it still seems to hold water and it
looks, dare I say, somewhat clean...
…On Thu, May 4, 2017 at 1:32 PM, Ashley Penney ***@***.***> wrote:
I'm an outsider to this conversation but I vote in favor of a narrow scope
for initial implementation and feedback. At least to determine how useful
this is (I need it, for one).
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#28637 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AFVgVHiYWe7lvVDOOAcvGiHHxoVXcodIks5r2jXMgaJpZM4JHhID>
.
|
@thockin do you have a pointer to the work in the storage-land you are thinking of? |
@thockin are you hinting at kubernetes/community#306 ? |
Also #44640 - look for
topology key. The idea is that persistent volume specifies a label that
indicates the topology in which it is available, beyond just region and
node, and the scheduling will respect it.
…On Fri, May 5, 2017 at 7:16 AM, Rudi C ***@***.***> wrote:
@thockin <https://github.com/thockin> are you hinting at
kubernetes/community#306
<kubernetes/community#306> ?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#28637 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AFVgVGbx60a05SHFtXyOx8k0OJmQiqypks5r2y9AgaJpZM4JHhID>
.
|
[pgbouncer](https://pgbouncer.github.io/) and | ||
[synapse](https://github.com/airbnb/synapse) | ||
- logging agents such as fluentd | ||
- authenticating proxies such as [aws-es-proxy](https://github.com/kopeio/aws-es-proxy), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this use case and others, some form of caller ID would be very useful.
This appears to have fallen silent. Is there a volunteer from sig-network, sig-apps, or sig-node who can help push this forward? The use case is pretty clear (maybe not in the top tier of complaints, but increasingly commonly mentioned). It appears that there isn't a ton of resistance to the simpler initial implementation. This has missed the window for 1.8, but getting the proposal marshaled would give it a chance to hit 1.9 as an alpha. |
Adding do-not-merge/release-note-label-needed because the release note process has not been followed. |
This would be very useful for a sharded service that keeps per-node local information in each shard. |
@smarterclayton I am from sig-network. I like this idea and have some bandwidth to help push it forward if possible. @therc I am familiar with IPVS-based kube-proxy and can take care of IPVS proxier changes if you are happy. |
There is a proposal which want to figure out a more generic way, see kubernetes/community#1551 We need more user cases to refine the API, please feel free to populate your comments there :) Thanks! |
I'm closing this in favor of the discussion around #41442 |
First step in #28610
cc @thockin
This change is![Reviewable](https://camo.githubusercontent.com/2d899f4291d07d3cd2fa4aaae1e3b243f164c23fce87d30a589ace0d496a444c/68747470733a2f2f72657669657761626c652e6b756265726e657465732e696f2f7265766965775f627574746f6e2e737667)