Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ResolvedRefs condition for GatewayClass parametersRef #2911

Open
sjberman opened this issue Mar 29, 2024 · 6 comments
Open

ResolvedRefs condition for GatewayClass parametersRef #2911

sjberman opened this issue Mar 29, 2024 · 6 comments
Labels
needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.

Comments

@sjberman
Copy link
Contributor

sjberman commented Mar 29, 2024

A discussion was had in Slack about adding a ResolvedRefs condition to a GatewayClass status to reflect the state of a parametersRef object that is attached. This led to a couple different concerns.

  1. How would this coexist with the current InvalidParameters reason that exists for the Accepted condition on a GatewayClass?
    Potential flows and their status outcomes:

    • Create GatewayClass with bad parametersRef target: GatewayClassStatus[{ type: Accepted, status: False?, reason: InvalidParameters? }, { type: ResolvedRefs, status: False, reason: ResolvedRefs? }]
    • Create GatewayClass with valid parametersRef target, but invalid values: GatewayClassStatus[{ type: Accepted, status: False?, reason: InvalidParameters? }, { type: ResolvedRefs, status: True, reason: ResolvedRefs }]
    • Create GatewayClass with valid parametersRef target and values -> delete parametersRef resource: GatewayClassStatus[{ type: Accepted, status: True?, reason: InvalidParameters? }, { type: ResolvedRefs, status: True, reason: ResolvedRefs }]
      a. Gateway created prior to this change expected to be okay unless dynamic changes are allowed, if so what happens?
      b. Gateway created referencing this GatewayClass after this change should set GatewayStatus[{ type: Accepted, status: False?, reason: ? }]
      c. Does GatewayClass move to Accepted: False?
    • Create GatewayClass with valid parametersRef target and values -> change parametersRef resource content so one field becomes invalid: GatewayClassStatus[{ type: Accepted, status: True?, reason: InvalidParameters? }, { type: ResolvedRefs, status: True, reason: ResolvedRefs }]
      a. Gateway created prior to this change expected to be okay unless dynamic changes are allowed, if so what happens?
      b. Gateway created referencing this GatewayClass after this change should set GatewayStatus[{ type: Accepted, status: False?, reason: ? }]
      c. Does GatewayClass move to Accepted: False?
  2. Another concern that was raised is what the expectations are if dynamic changes are allowed to a parametersRef (by either adding/removing the reference, deleting the resource, or updating the fields within it). Specifically, if a GatewayClass goes from being Accepted to not Accepted due to an invalid update to the parametersRef, how does that affect the entire config tree below it? By following a true declarative system, all Gateways under this class should no longer be Accepted, but this would nullify the entire configuration and be very disruptive. The other option is what was mentioned in the examples above, where previous Gateways live on, while new ones are not Accepted. Issue here, however, is that if the control plane restarts, it loses the previous state of the valid parametersRef, so the original Gateways that were left alone will no longer receive those valid params, and the config is void. Note that this is technically an issue today with the existing InvalidParameters reason on the Accepted condition, so nothing new with the proposed ResolvedRefs condition.

  3. Finally, what does conformance look like? While not stated in the spec, the existing ResolvedRefs condition for HTTPRoutes is required in the conformance tests. Would that make this condition a requirement for a GatewayClass?

@sjberman
Copy link
Contributor Author

In relation to point 2, one proposal our implementation has come up with is to set Accepted: true, reason: InvalidParameters, message: GatewayClass is Accepted, but params are invalid for X reason and will be ignored. This means we ignore the params and allow the existing configuration to continue on, while giving a clear message to users as to what is happening.

The risk here is that if paramsRef goes from valid to invalid, any settings that were in there would be reverted to defaults, leading to a change in behavior or leading to downtime. The impact of this is obviously dependent on the implementation and the settings that exist in that paramsRef. Could be a better alternative than tearing down the entire config tree due to a non-Accepted GatewayClass, though.

@shaneutt shaneutt added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Apr 2, 2024
@mikemorris
Copy link
Contributor

mikemorris commented Apr 2, 2024

I believe the current fuzziness in the spec around "snapshot/templating" behavior when creating Gateways referencing a configured GatewayClass combined with the difficulty handling of "dynamic" changes to parameters (by modifying either the reference or the referenced object) helps create the justification for adding a parametersRef field under infrastructure directly on Gateways, and trying to clean up this behavior to be better-defined.

Adding a parametersRef field as part of the infrastructure stanza was originally proposed and discussed extensively in #1868 and #1757 but was ultimately not included in the implementation in #2440 - I can't find the precise comment where it was blocked/dropped though (I think it happened after the GEPs merged, during API review).

/cc @howardjohn @robscott

@sjberman
Copy link
Contributor Author

sjberman commented Apr 2, 2024

Doesn't that just move the blast radius down one level? Unless we are considering the impact of an invalid policy attachment on a Gateway to be implementation-specific, versus stating in the spec that the Accepted condition should be false.

@mikemorris
Copy link
Contributor

mikemorris commented Apr 2, 2024

Doesn't that just move the blast radius down one level?

At a minimum, yes, and that alone feels like a good improvement.

I'd expect that it could be more reasonable for a Gateway to stay { type: Accepted, status: True } and just be missing some configured parameters (expressed through its existing { type: ResolvedRefs, status: False } or adding { type: PartiallyInvalid, status: True, reason: UnsupportedValue } similar to #2429 for HTTPRoute) than to figure out how to deal with a dynamic class/template becoming invalid in any more standardized way that doesn't have a huge blast radius.

@robscott
Copy link
Member

robscott commented Apr 3, 2024

Doesn't that just move the blast radius down one level?

That was also one of my original concerns. The proposal by @howardjohn in #2924 does limit the blast radius by making paramsRefs namespace-local, so the blast radius of a change would also be limited to the local namespace.

Unless we are considering the impact of an invalid policy attachment on a Gateway to be implementation-specific, versus stating in the spec that the Accepted condition should be false.

Maybe we should take these options and apply them more broadly?

// When this happens, implementations MUST take one of the following
// approaches:
//
// 1) Drop Rule(s): With this approach, implementations will drop the
// invalid Route Rule(s) until they are fully valid again. The message
// for this condition MUST start with the prefix "Dropped Rule" and
// include information about which Rules have been dropped. In this
// state, the "Accepted" condition MUST be set to "True" with the latest
// generation of the resource.
// 2) Fall Back: With this approach, implementations will fall back to the
// last known good state of the entire Route. The message for this
// condition MUST start with the prefix "Fall Back" and include
// information about why the current Rule(s) are invalid. To represent
// this, the "Accepted" condition MUST be set to "True" with the
// generation of the last known good state of the resource.

@sjberman
Copy link
Contributor Author

sjberman commented Apr 3, 2024

@robscott That's similar to the approach that I mentioned in my first comment for the GatewayClass. Basically just keep things in a good state, but fall back on the paramsRef to the default values (our implementation may just have to use this approach since we are intending to add infra configuration at the top level and the GC paramsRef is the way to do that right now). It's almost the lesser of two evils. Your paramsRef config will be reverted to a valid state, which could cause disruption downstream since values are changing, but it's less disruptive than nullifying the entire downstream config.

This obviously is the big issue to figure out, but I also want to make sure we touch on the questions around the existence of a ResolvedRefs condition and the conformance for it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.
Projects
None yet
Development

No branches or pull requests

4 participants