New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set pod conditions from a container probe #28658
Comments
This is not uncommon in enterprise pod balancers, although typically it's On the other hand, should petsets always make the leader the pod-0 (the On Jul 7, 2016, at 9:30 PM, Prashanth B notifications@github.com wrote: I'd like a way to export some key pieces of information from a pod, One can use this to implement a PetSet of "type=master/slave" by having a @kubernetes/sig-apps https://github.com/orgs/kubernetes/teams/sig-apps — |
This is for pets with innate failover. Making pet-0 the leader is easy, we can given pet-0 to all the clients of the db, but if it gets partitioned we need to either:
I'm going for 2 with this. Yeah we could force people to setup a loadbalancer in front which would failover to one of the backups, effectively achieving the same thing. The only caveat is we can't start sending writes to the old master as soon as it starts responding to health checks. Nginx and haproxy take a list of servers, we can mark the fist as primary and all the others as backup. When a new master is elected, we need to re-write the config (I don't think there's an automatic way). |
I mean, pets with innate failover for master/slave. For example we don't need this for zookeeper because you write to any voting member and it waits for an ack from a majority. With mysql master/slave you must write to the current master. I think the usual way to handle this is keepalived and a floating vip. |
A service clusterIP can manage this pretty effectively if we could figure out a way to have the endpoints controller select on the 'leader' signal. It's somewhat like ready where the service only marks a single pod as ready. Could we do this today with readiness checks and services as is? I.e. readiness check for each member only returns true if leader? |
What's interesting is that active/passive there should only ever be one "ready" item. But readiness is a pod condition, not a port condition, so we can't apply different rules necessarily to readiness per container / port per service. Another thought - most active/passive sets are for things that are not innately HA and need fencing / safety guarantees in place. If we assume fencing is solved at the volume level (either by innate behavior in the storage provider like attach/detach or locks), then the guarantee still needs some sort of "terminate old, make sure data is flushed, then do something to the new". That problem sounds a lot like pet set reconfiguration, but the decision to do the failover is triggered externally (probably by a deletion of the pod by the node controller). |
Let me sketch out two active/passive use cases that are related to this and to pet sets:
Leadership coming from the pods seems strange here, if only because I don't necessarily know which one to trust as the petset controller. |
We'd have to come up with a way to take majority (i.e pods respond with leader=pod-1), which gets complicated because of epochs. The simple answer is to assume a partitioned master knows it is partitioned and responds negative. I'm not sure we can solve this problem completely without doing something really complicated. I think the best we can do is describe a simple protocol and guarantee that if pods obey it, the Service will point at the new master. The Service will probably go through some period where it has 0 endpoints, clients should retry (or we can offer the normal CP vs AP choice). An example of such a protocol might be to put the petset controller in the failover loop. So the current master failing a probe leads to the petset controller delivering a "failover" event to each non-master pod, one at a time, and waiting for its leader probe result before sending the event to the next pod. The potential masters need to yield to the most advanced slave by responding negative. I guess the difference between your 2 cases is clustering. It does sounds like simple readiness is enough for the vm case. As you noted, I don't know if volume fencing is enough to completely repurpose readiness in the clustered case. |
The other way to solve this is to avoid a probe alltogether, and go with ttl'd leases. Pet-0 will 9/10 times get the master lease, every other pet sets a watch on the lease, if they ever see a ttl expiration they all try to grab it. The petset controller just applies the label to the name of the pet in the lease. This feels like the lease api, and requires all pods to understand the apiserver, though. Not to mention correctly handling a watch etc. I feel like such complications shouldn't be necessary just to receive a failover event, and we can resolve the lease-taking-race by just delivering this event serially. |
The conditions idea sounds very similar to custom metrics. |
/cc @crimsonfaith91 |
Is this still relevant? Do we think that custom metrics could support this? |
/assign |
This is still coming up in some contexts. Let's leave it open.
…On Tue, May 16, 2017 at 1:26 PM, cmluciano ***@***.***> wrote:
Is this still relevant? Do we think that custom metrics could support this?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#28658 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AFVgVAPakygSUEyKtAbQQWR0vs1MI1h1ks5r6gZxgaJpZM4JHpiz>
.
|
/unassign |
Issues go stale after 90d of inactivity. Prevent issues from auto-closing with an If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Rotten issues close after 30d of inactivity. Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
As anyone figured out a way to handle the initial use-case? It still seems relevant. /reopen |
@rfer: you can't re-open an issue/PR unless you authored it or you are assigned to it. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Yes, this seems unresolved. We are also looking for solution to similar problem. Anyone able to solve this? |
I'd like a way to export some key pieces of information from a pod, periodically, into the apiserver, without invoking kubectl or linking against the Kubernetes api. One way to do this is through a new type of http "condition" probe, that returns a json struct of fields to set in either the PodCondition, or Pod annotations under a special key.
One can use this to implement a PetSet of "type=master/slave" by having a probe on all the slaves return "isLeader=false". The PetSet controller can create a private master service with just a single endpoint that matches the one pet in the set that returns "isLeader=true". Users would hand out the DNS name/ip of this master Service to clients knowing that it will always redirect to the master.
@kubernetes/sig-apps @thockin
The text was updated successfully, but these errors were encountered: