-
Notifications
You must be signed in to change notification settings - Fork 480
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GEP-2014: Declarative Policy #2015
GEP-2014: Declarative Policy #2015
Conversation
Signed-off-by: Flynn <emissary@flynn.kodachi.com>
Signed-off-by: Flynn <emissary@flynn.kodachi.com>
Signed-off-by: Flynn <emissary@flynn.kodachi.com>
Signed-off-by: Flynn <emissary@flynn.kodachi.com>
Signed-off-by: Flynn <emissary@flynn.kodachi.com>
Signed-off-by: Flynn <emissary@flynn.kodachi.com>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: kflynn The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this is a useful perspective/doc/etc.
While I agree with a lot of the concerns I am not sure I agree with most of the conclusions. I think a lot of this is personal preference territory, so I am certainly not saying you are wrong - just that our preferences may differ. Some of the below is also just playing devils advocate, to some extent - I probably agree with you more than it seems like.
First, I would argue this is not a new problem. As long as there is more than 1 type in Kubernetes, this problem exists. Similar arguments could be made for almost any <insert 2 Kubernetes types>. Service and Deployment could have been one object (or HPA, PDB, ResourceQuota, ...) but they aren't and users either adopt the resource model or consume higher level abstractions over these resources or use simpler products (docker-compose for example). Its not clear to me this is fundamentally a new problem.
Julian should not be building tooling, that is our job as the gateway-api community (or extended community - but certainly not each user).
This may just be a language thing, but I am not sure what makes this not declarative and it isn't described in the GEP. Seems declarative to me - they declared their intent to add retries at the namespace level and that intent was actuated by the system.
All of this feels answerable by just making policy attachment easy to grok for users, rather than fundamentally changing the concepts.
In any case, without a concrete proposal I am not sure what the next steps here are. I/We can agree policy attachment sucks, but also that extensibility is essential to the API. Without a better option, the bad option is still better than no option... Thats not to say there isn't value in calling it out as bad, but without alternatives proposed there doesn't seem to be a path forward other than encoding this as "Informational - Policy attachment has problems but we are doing it anyways"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious if you feel any of the ideas from #2012 help alleviate any concerns, IMO giving some visibility (via Status etc.) into what *Policies are in effect on particular resources would help a lot with the problems described here
Thanks for writing this up @kflynn! I'll be the first to agree that our existing policy attachment model is complex. In my opinion, the primary issue with it is that it is difficult for users to understand which policies are available and what they effect. We've also had some previous discussions that covered the complexity involved here, for example #1715. So I think we're roughly agreeing on the problem statement here, and many in the community likely have similar concerns. Having successful extension mechanism(s) is absolutely key to the success of the API. One of our key goals for this API has been to provide a better extension model than the Ingress and Service APIs so we don't end up with yet another explosion of annotations. With all of that said, I'm not sure what the proposal is here. It seems like this may be leading towards a suggestion that all policy needs to be directly included in or referenced by the resources it affects. For many cases, this is easier to reason with, and so we definitely tried to make this work in the initial policy attachment proposals. Ultimately we avoided it for the following reasons:
In the broader world of Gateway API extensions, we do already have custom HTTPRoute filters and GatewayClass params which are both direct refs from Gateway API resources to custom extensions. I think those have worked reasonably well. We may want to consider adding something like Gateway or Listener level (custom) filters to build on this model. This is decidedly not policy attachment but can be very useful for some extensions. As far as policy attachment itself, I think there are a lot of relatively small things we can do that will dramatically improve the experience. We're collecting ideas here: #2012. Wherever possible I'd like to build on what we have here with incremental improvements instead of introducing significant changes to the model. |
Okay. I've obviously got a lot of thoughts about this, so I'm going to take some time and address general things first, and then more specific things. Firstly, I really appreciate the writing out of the story of how things can go wrong. It's useful to have a standard example for us to talk about. I think it's important to remember that Policy Attachment as it stands is not complete. It's still needing the solve the exact problems that this story raises, I definitely agree. We have been putting this off because it's a very complex problem that has a lot of dimensions that aren't obvious at first glance (and are difficult to write down as well). So I think there is context that we've discussed in the past missing from the written documentation as well. To get specific: I strongly disagree that the Policy Attachment we have is not declarative. It's definitely declarative in that a user's intent is expressed, and the system as a whole is ensuring that the intent is represented. It's not easily discoverable on a single-resource level, which is where the action-at-a-distance feeling comes from. There are multiple pieces of feedback that we need but have not defined for Policy Attachment as it stands:
Reading between the lines and from out-of-band discussions, I believe that the likely proposal is something like: "We're going to work on adding something to To talk about this more concretely in relation to the given story, the problem arises because Jane has no way to know that there's a @robscott's discussion issue already has a couple of options here, and I have a couple of things I'd like to add, but I think the critical thing for us to discuss is about what makes something declarative and what doesn't. Because I think that's the critical impedance mismatch here. |
-- boilerplate: @howardjohn's first 2-3 paragraphs here -- We (Kuadrant) have been going through the challenges associated with the visibility/discoverability of the policies (status reconciliation, etc) – which I too believe that some of the ideas in #2012 could help mitigate significantly – or the complexity involved in having yet more layers of configuration to set up permissions for extending the behaviour of the Gateway API resources (more RBAC, ReferenceGrants, etc). So I won’t deny those issues of course! On the other hand, having a framework to guide the implementations so they don’t go into a madness of multiple different ways to structure such a common pattern like referencing a resource whose behaviour is intended to extend, in a declarative way is a good thing, IMO. When I think on metaresources as no much different than any other CRD, I could lean in favour of ditching out Policy Attachment for a moment. Any provider could do Policy Attachment ignoring guidelines of any sort offered by the base framework. However, I suppose that is true for pretty much anything in Kubernetes nowadays. It is also where things start getting really ugly. No standardisation == madness. I could echo here too on all the several advantages of Policy Attachment (in contrast to route filters and extension properties):
I also think that Policy Attachment may help justify the standardisation of other APIs that I now see growing in importance, such as the ReferenceGrant API. If the But if there's one aspect of Policy Attachment that I feel strong about, in a positive way, is how it opens up a lot more for third-party policy functionality providers, that are not necessarily gateway providers, to participate, with compliance. The extensibility it enables really shines here. Any other approach that involves declaring the behaviour as part of the affected resource would require a lot more direct support by the gateway implementation itself. It feels like, as a provider, without Policy Attachment, the only way into Gateway API is wrapping a full gateway implementation, even if that's only to add a minor functionality extension. At Kuadrant, we are aiming for adding policy functionality (multicluster gateway consistency, rate limiting, auth) based on Policy Attachment while keeping it gateway-agnostic (as much as we can). Whether it's Envoy Gateway, Istio, etc, we believe that, if we can deliver the intended behavior to the user, the underlying infrastructure detail of which provider implements the gateway should not matter – or at least it could be transparent to the user at this point. In this sense, the metaresource represents a contract strictly between the user and the party that provides that policy functionality. The problem of whether the user has jurisdiction over the affected resource is a matter of authorisation, not of declaring the intent in itself; and the problem of visibility of the effects of the policy is, IMO, a status reporting issue. |
Thanks for the comments, all! Rather than try to answer line-by-line or any such madness, let me instead try to tease out some commonalities across several responses and try to speak to them. I also want to explicitly acknowledge that there's a lot to unpack here. GEP-713 represents an enormous amount of work (thanks for all that, @youngnick and @robscott), and it's revealed a lot of new problems that simply can’t be solved without many of us in the community working together to get the best results. So I very much appreciate the willingness of everyone to pitch in and try to constructively get to the bottom of these things -- many thanks, especially, to @youngnick for taking the lion's share of the work and doing it gracefully and good-humoredly. 1. Policy attachment is definitely declarative; the issue is really one of discoverability.This is both a really important thing to dig into and a slippery slope where we could end up spending massive amounts of time on words about words. I’d really like to avoid the latter, so: the problem I see is that policy attachment can give Jane the sense that what she declares is being ignored and Kubernetes won’t tell her why. I used “not declarative” to summarize that, but I’m happy to use a different phrase if that’s more helpful; I care more about solving the problem than about being the one to give it a name. 🙂 If more explanation is useful: what we teach new folks coming to Kubernetes is that they declare what they want, Kube gives it to them, and they can always read the current state of the world directly from the cluster. There are boundaries, of course (RBAC, etc.), and in practice people understand that maybe they'll only get something close to what they want, but it's still presented as a contract that Kube will honor. If Jane declares that she wants a particular routing behavior and then gets something bewilderingly different, she will feel that this contract has been broken. If she can’t even read the cluster to find out why, she’ll feel it’s been broken twice. I feel like honoring that contract is an important part of being “declarative”: expressing the API in YAML docs rather than function calls is only part of it. Let me quote @youngnick for a moment:
There's a lot I agree with in here, but I want to call out explicitly that in the situation of the parable, Jane is going to feel like her intent is being ignored. That's why I think that this may be worse than a discoverability problem (though I absolutely agree that the discoverability issue is critical). As noted in the Gateway API meeting yesterday, policy attachment to date has largely focused on the cluster provider and below; this leaves Jane, as an application developer, in a difficult spot (especially as ops teams try to shift concerns closer to the developers). 2. This isn't a fundamentally new problem in Kubernetes. There are other areas where this is already a thing.Definitely, and this was one of our key considerations while discussing this GEP: we think it’s possible for the work we do in Gateway API to have an effect on the Kubernetes project as a whole if we’re successful here. These existing areas are actually some of the places where we see the most difficulty when teaching folks about Kubernetes: for example, think about teaching a novice about label selector linkage between Deployments, Pods, and Services. Even better, think back to learning it as a novice yourself. I would love to not make that worse. 3. Maybe the Gateway API can provide higher-level constructs to make things better.I definitely agree that there’s value in exploring that idea, though I’m concerned that getting to consensus on what those higher-level constructs would look like will take a very long time. Higher-level tooling or abstractions necessarily involve opinionation that feels tricky to make portable. (Tooling also has the interesting challenge that if we e.g. do a 4. Policy attachment is complex, but we can't change the Kubernetes core so we're stuck with the complexity.I have a lot of conflicting reactions to this one. First, I do fully believe that changes to the core are worse than anyone ever expects, even when you take that fact into account. 😕 Second, and most importantly, I have nothing but respect and appreciation for all the work that @youngnick and @robscott have put into this so far (again, props especially to Nick for being the one to take all the complaints!). I'm happy to defend them for the enormous amount of blood, sweat, and tears they've poured into this, and I'm deeply appreciative that they dove into this problem in the first place. Bearing that in mind... it’s very clear that necessity mothered the invention of the current policy attachment mechanism, and also very clear that as we experiment with it, we’re seeing problems that necessity didn’t reveal at the start. (This is a success story: I’m delighted and grateful that the experimentation has been happening!) I think we should dig into these things now, before we mark the API GA and people start throwing policies all over the place. If we can't start with core modifications, maybe we can start with giving all the Gateway API resources a consistent extension mechanism (rather than just the GatewayClass paramsRef and custom HTTPRoute filters), and see how that goes. If it's NP-hard to fully tackle everything all at once, is there a way we can restrict things and do something simpler while not painting ourselves into horrible corners? (The discussions in #2012 are most definitely relevant here.) Those are discussions that I think need careful attention before GA. (And as I've said before, I actually feel bad for raising all this stuff now, after not being around at the start of the Gateway API effort -- I'm sure that some of these have been explored in some context, and I apologize for that.) |
Thanks for the comment @kflynn, I just edited it to fix my Github handle so that some other guy didn't get a bunch of notifications. Still processing, will respond tomorrow. |
Thanks for that, @youngnick! 🤦♂️ I can't believe I did that. 🙁 🙁 |
/easycla (since this should be fixed now... 🤞) |
Thanks for the great response @kflynn!
Note that nothing that Jane declared was ignored, additional information/config was added on top though. I think #2012 has some compelling options to help with the discoverability aspect of this statement though.
To me adding retries does not feel like the contract has been broken unless Jane explicitly specified that there should not be retries.
Any tooling that we own as a project will be OSS and should be widely usable as a library for any other tools that want to build on top of these concepts.
What would we do with other resources in the meantime? I think almost everyone, likely including the project itself with Gateway -> Backend TLS config, will need to extend Service and/or ServiceImport somehow. I'd hate to have one way for Gateway API resources and a separate way to attach policy to everything else. I also think that direct refs from target resource to policy/extension would prevent us from some important use cases:
Although I'll be the first to admit that the current model was influenced by the limitations of the APIs we were targeting, it was also heavily influenced by the use cases described above. Even if we could have direct refs from all Kubernetes resources to Gateway API policies/extensions tomorrow, I'm not convinced that would actually be better. We would lose out on some significant and important use cases with that approach. With that said, it's a moot point because we're at least 16 months away (and a lot of effort) from that even being a possibility. I personally would rather focus that time on improving our current model with some of the ideas in #2012. |
This GEP is a request to consider the ability to "explain policy to all users with a role in the cluster" the lowest bar we would set for further graduation. If we worked together to define more explicitly what that bar means, I believe we may stand to better serve all future Gateway API users. In effect this is a request to define new requirements to govern the solutions to our problems, but my take is that most of the response here so far has been focused on improving the current solution. I think it might be helpful to step back briefly from iterating on the existing solution in favor of iterating on the requirements. I have a sense that the momentum of the current solution is driving us only deeper into it, and there would be general rejection to pivot or redux. I've carefully paid attention to the comments and conversations around this request and I believe it important that as a community we remind ourselves that the work so far has been I feel it would be in the best interest of future Gateway API users if we reject forgone conclusions due to current investment, and be open to the consideration that early adopters may need to pay some of the costs associated with progress rather than the costs being set upon other implementations, and the users over the next several years. At the same time as a project we enabled investments without strong clarity about the possible path we might take to |
Flynn and I talked about this last week, and I think we agreed on a few things. Flynn, please correct me if I say anything we don't actually agree on. 😃 Also, I've tried to keep this focussed on the parable, and using the parable to draw out new requirements for Policy Attachment, in the interest of moving forward like @shaneutt suggested above. The problem here is that Jane ends up having Policy affecting her resources without any way to know that ahead of time. Also, Julian had no way to know how many places the Namespace Policy was affecting (or even if it was affecting anything). This is what Flynn called "not declarative" (Because Jane has her own services being affected by configuration she doesn't own and doesn't know about), and I called "not discoverable". The naming is not as important as agreeing that it's a problem. Another way to say this problem uses something I've talked about before - we want to keep the troubleshooting distance for a Route affected by Policy as close as possible to 1. That is, Jane should be able to understand the state of her Route by doing a number of There are three things Jane needs to be able to know about her Route:
There's a relatively obvious solution here, that's worth showing and then talking about. status:
effectivePolicy:
- group: example.com
version: v1alpha1
kind: RetryPolicy
name: default-retry
namespace: system
attachedTo:
group: gateway.networking.k8s.io
version: v1beta1
kind: Gateway
policySettings:
defaults:
retries: 3
delay: 5
timeout: 10 Here This is the most information we could possibly supply to Jane, and it would completely stop the parable from happening. It also meets the third goal above - the full resultant set of Policy for an object. However, this creates significant scalability concerns in that this status would need to be updated every time the Policy or the Gateway it attaches to changed, and will need to be updated on every relevant object (both the Gateway and any attached HTTPRoute). This means that an update in a single object (the Some other problems I can see:
This problem arises from the number of relationships there are between a single Policy and a number of objects it affects, not in the way those relationships are created. For Direct Attached Policy, this is only ever This is why I don't think that changing Policy so that affected objects reference the Policy that should affect them is useful enough to justify the effort, because while it improves the experience for direct-attached policy, it doesn't for inherited policy. I also suspect that over time, inherited Policy will be as used as Direct if not more so. The above example is also only possible on objects that we control the This is why @robscott's discussion is important, I think we will need to use multiple options out of there, based on things like scalability, what apigroup the object is in, and so on. We should be aiming to make it straightforward for every implementation to do at least one of the pieces of information I listed above: That there's a Policy, which Policy is effective, and what settings that Policy provides. We also need to accept that different target objects may end up with different available information (because they're different objects!), and we can't control all the possible ways that this may be used. I think we're okay here because Policy Attachment is a pattern, not an API. So we're trying to give implementations direction on ways to avoid the problems in the parable, since we can't be too prescriptive about rules, because in the end, many implementations of Policy will be ImplementationSpecific anyway. So, to summarize this very long comment:
|
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
This has been subsumed by #2128. |
This GEP is a follow-up to GEP-713 Metaresources and Policy Attachment to recommend that we consider changing the "attachment" part of "policy attachment" in favor of something that is declarative at the affected resource level.
(Fixes #2014)
/kind gep