-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update dual stack KEP metadata to include status and PRR fields #2327
Update dual stack KEP metadata to include status and PRR fields #2327
Conversation
Add PRR related fields in preparation for beta Signed-off-by: Lachlan Evenson <lachlan.evenson@microsoft.com>
Tracking enhancement issue #563 |
Signed-off-by: Lachlan Evenson <lachlan.evenson@microsoft.com>
Here's how I could reproduce the failure:
I think this is how we correct this test - update keps/prod-readiness/sig-network/563.yaml (but not with just any name):
We put one of the approved people in this file:
Then this check completes:
I'm not sure who's correct to list in this case. |
I don’t believe that Tim is a PRR approver, I believe only @deads2k or @johnbelamaric or @wojtek-are the only PRR approvers currently. Looks like the PRR approver field is empty, I’d recommend adding one of the approvers there. |
@jeremyrickard I do indeed see that they are the only ones in Of course, we'll start with our SIG leads and then get a PRR approval. Thanks! |
I've connected with @johnbelamaric via the prod-readiness channel and he has agreed to review this once ready. I will push a commit to update. |
Signed-off-by: Lachlan Evenson <lachlan.evenson@microsoft.com>
PRR sections added to KEP here - #2325 |
Add implementation history Signed-off-by: Lachlan Evenson <lachlan.evenson@microsoft.com>
Signed-off-by: Lachlan Evenson <lachlan.evenson@microsoft.com>
@johnbelamaric this is ready for review. |
I skimmed the checklist and I think it's accurate. |
Signed-off-by: Bridget Kromhout <bridget@kromhout.org>
Signed-off-by: Bridget Kromhout <bridget@kromhout.org>
Signed-off-by: Bridget Kromhout <bridget@kromhout.org>
Signed-off-by: Bridget Kromhout <bridget@kromhout.org>
suggesting updates to KEP PRR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi. I don't think the latest round of revisions really answers my questions. I think to answer them you need to shift mindset a little. These questions aren't about "how can this specific code fail" but about "what could go wrong now that this change is out there". That is, they really aren't narrowly focused on just your changes, but rather on how those changes interact with the system as a whole. What are the downstream consequences of enabling this feature?
Really put yourself in the shoes of someone that is responsible for 10s or 100s of thousands of workloads running on a fleet of clusters, and ask yourself: Am I willing to enable this feature? If I do, how can I make sure that I haven't broken the cluster control plane? How can I make sure I haven't broken user workloads? Is there any planning I should do before I enable it? How can I go home and not worry about getting paged? What do I have to check, what would inform me that I need to roll it back?
|
||
1. For services: | ||
|
||
Services are not in the path of pod creation. Thus, any malfunction or |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why you're only talking about pod creation. What about service creation? Are there metrics there that could be useful? Do we have metrics around IP allocations? What about endpoint selection/creation? Do we have metrics about endpoints created?
- Components exposing the metric: | ||
- [ ] Other (treat as last resort) | ||
- Details: | ||
Dual-stack networking is a functional addition, not a service with SLIs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not a service itself, but it can fail, right? You can fail to allocate the IP, you can fail to assign the IP to a Pod, you can fail to wire up the Pod sandbox, etc. These are likely already covered by existing metrics. Can you list them here, to inform operators what metrics they should be looking at? If they all of a sudden see a bunch of failures then they may want to turn it off.
What about watching for workload failures as I mentioned in my last review?
The point here is just to provide guidance and documentation. If you upgraded your cluster with 10,000 workloads on it, what would you look for to see that everything is going OK?
|
||
* **What happens if we reenable the feature if it was previously rolled back?** | ||
Similar to enable it the first time on a cluster. | ||
Similar to enabling it the first time on a cluster. We don't load balance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
without my questions this is hard to follow, it needs more context. Even with my questions I am still not clear on how the answer addresses the "imbalance during re-enablement" case.
If that case is really possible (and I haven't seen anything to say whether or not it is), we just need to document it so people know what to expect and what to look for.
Once people upgrade, this goes on by default. If something goes wrong, and they turn it off, and then later want to re-enable it, we need to make sure they can plan appropriately. They may need to find any dual-stack services with bazillions of endpoints and make them single-stack first, for example. Just to avoid that imbalance situation.
I think the statement "we don't load balance across families" is trying to say that the imbalance is not possible. But I don't see how that invalidates the scenario I described. In fact, if we did load balance across families and did do protocol translation, then the scenario I described would not be a problem.
- 99% percentile over day of absolute value from (job creation time minus expected | ||
job creation time) for cron job <= 10% | ||
- 99,9% of /health requests per day finish with 200 code | ||
N/A |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is true. If the SLIs above are existing SLIs, you might just be able to say something like: existing kubelet pod creation or service creations SLOs (and whatever others may apply).
|
||
Dual-stack components are not in path of pod creation. It is in the path | ||
of reporting pod ips. So pod creation will not be affected; if it is | ||
affected, then it is a CNI issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, does CNI have any input from the kubelet as to whether Pods should be single or dual stack? Or is that completely out-of-band configuration that doesn't come from this feature at all? How does CNI know today which family of address to allocate?
|
||
Dual-stack components are in the path of PodIPs reporting which affects | ||
kubelet. If there is a problem (or if there are persistent problems) | ||
then disabling the featuregate on api-server will mitigate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on apiserver or on kubelet? Shouldn't it be everywhere? Otherwise I imagine really bad stuff can happen.
@@ -1459,7 +1497,8 @@ of this feature?** | |||
|
|||
* **Will enabling / using this feature result in any new calls to the cloud | |||
provider?** | |||
No | |||
No. Errors are surfaced when the user makes a call. We don't focus on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this question isn't about metrics server, it's about calls to the cloud provider. I believe in some cases IP allocation is done by the cloud provider? In GKE we have native IPs assigned to Pods. I don't know where that allocation is done exactly, maybe @bowei knows. But it seems feasible that dual stack would require an extra call to the cloud provider to allocation one of those native IPs.
* **How does this feature react if the API server and/or etcd is unavailable?** | ||
This feature will not be operable if either kube-apiserver or etcd is unavailable. | ||
|
||
* **What are other known failure modes?** | ||
For each of them, fill in the following information by copying the below template: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this doesn't address workload failures nor the "imbalance", we probably want to discuss those.
2. kind dual-stack iptables: https://testgrid.k8s.io/sig-network-kind#sig-network-kind,%20dual,%20master | ||
3. kind dual-stack ipvs: https://testgrid.k8s.io/sig-network-kind#sig-network-kind,%20ipvs,%20master | ||
|
||
### Rollout, Upgrade and Rollback Planning | ||
|
||
* **How can a rollout fail? Can it impact already running workloads?** | ||
Try to be as paranoid as possible - e.g., what if some components will restart | ||
mid-rollout? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about workload failures? If I upgrade and suddently 10% of my workloads start crashlooping because they are stupid and don't know what to do with a AAAA record from DNS, I want to disable and/or rollback. Again this probably doesn't apply much in the upgrade case because there are no dual-stack services, but in the re-enablement case it does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you change the IPFamilyPolicy to SingleStack you will be back to the initial situation
Signed-off-by: Bridget Kromhout <bridget@kromhout.org>
Signed-off-by: Bridget Kromhout <bridget@kromhout.org>
Additional updates
Signed-off-by: Lachlan Evenson <lachlan.evenson@microsoft.com>
Signed-off-by: Bridget Kromhout <bridget@kromhout.org>
Signed-off-by: Bridget Kromhout <bridget@kromhout.org>
Signed-off-by: Bridget Kromhout <bridget@kromhout.org>
Additional updates to PRR docs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking really good. A couple minor points.
Pods and Services will remain single-stack until cli flags have been modified | ||
as described in this KEP. Existing and new services will remain single-stack | ||
until user requests otherwise. Pods will become dual-stack once CNI is | ||
configured for dual-stack. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A little confusing:
Pods and Services will remain single-stack until cli flags have been modified
as described in this KEP.
and
Pods will become dual-stack once CNI is configured for dual-stack.
are contradictory. I think the distinction is api vs pod runtime environment. Please clear that up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks - edited.
the controller manager will automatically update endpoints and endpointSlices | ||
to match the service IP families. When the feature is reenabled, kube-proxy | ||
will automatically start updating iptables/ipvs rules for the alternative | ||
ipfamily, for existing and new dual-stack services. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is where you should mention:
DNS will immediately begin returning the secondary IP family, while endpoints, endpointSlices, and iptables programming may take some time. This can lead to large or very busy services from receiving excessive traffic on the secondary family address, until the endpoints, endpointslices, and ip tables rules are fully populated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wish we had a decent mitigation for this. It can happen anytime a larger service goes from single -> dual stack.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks - edited.
1. (preferred) Do not create dual-stack services until the rollout of the | ||
dual-stack feature across the cluster is complete. | ||
or | ||
2. Cordon and drain the node(s) where the feature is not enabled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you for adding these
IPv4/IPv6 dual-stack](https://kubernetes.io/docs/tasks/network/validate-dual-stack/) | ||
to ensure that node addressing, pod addressing, and services are configured | ||
correctly. If dual-stack services are created, they have passed validation. | ||
Metrics to check could include pods stuck in pending; look in the event logs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, nice
Signed-off-by: Bridget Kromhout <bridget@kromhout.org>
Additional edits for PRR
Thank you, looking good! /approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dcbw, johnbelamaric, lachie83 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
And /lgtm |
Signed-off-by: Lachlan Evenson lachlan.evenson@microsoft.com