Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reject new Shoots with internal apiVersion in infrastructureConfig #4927

Merged

Conversation

voelzmo
Copy link
Member

@voelzmo voelzmo commented Oct 27, 2021

How to categorize this PR?

/area control-plane
/kind bug

What this PR does / why we need it:
New Shoots which specify __internal as apiVersion for their InfrastructureConfig are rejected. Updates to existing Shoots are still allowed for compatibility reasons.

Which issue(s) this PR fixes:
Fixes #4874

Special notes for your reviewer:
A similar issue seems to exist for ControlPlaneConfig, should I be adding a new PR with similar change?

Release note:

New Shoots can no longer specify `__internal` for the apiVersion in their InfrastructureConfig. For compatibility reasons, existing Shoots with this configuration can still be updated.

@voelzmo voelzmo requested a review from a team as a code owner October 27, 2021 16:10
@gardener-robot gardener-robot added area/control-plane Control plane related kind/bug Bug needs/review size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Oct 27, 2021
Copy link
Member

@rfranzke rfranzke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have expected such validation in the provider extensions instead of in gardener/gardener. Generally, Gardener does not know or make any assumptions of the content in *runtime.RawExtension fields, so it doesn't quite fit here.

@voelzmo
Copy link
Member Author

voelzmo commented Oct 28, 2021

I would have expected such validation in the provider extensions instead of in gardener/gardener. Generally, Gardener does not know or make any assumptions of the content in *runtime.RawExtension fields, so it doesn't quite fit here.

Yeah, I was stuck between a rock and a hard place here. Doing this in the extensions has the benefit that decoding is easy and therefore you can do this check with less hassle. On the other hand it requires pretty much the same code in all extensions, new extension releases, and so on. This check also isn't really provider specific, denying access to the internal version feels like a thing that shouldn't be possible ever.

I'm not sure if you were thinking of a better way of doing this? If you feel this should not belong in g/g, I'm happy to make this individual PRs in the extensions.

@rfranzke
Copy link
Member

Yes, that's all correct, but I fear that this is pretty much the price for having such extensibility.
The check itself is specific to the providers as such as it assumes that there is a Kubernetes API-like decodeable object in the first place (which might be true for the extensions we know, however, not in general). I see the check is optional and it would work for our case, but it's still very confusing (we use the core Kubernetes scheme and the corev1 SchemeGroupVersion). Personally, I would vote to indeed move it into the extension repos (maybe we can have a library function in our extensions library that the extensions can then call individually while validating to prevent a good portion of the duplication).

plugin/pkg/shoot/validator/admission.go Outdated Show resolved Hide resolved
@stoyanr
Copy link
Contributor

stoyanr commented Nov 1, 2021

@rfranzke I am also a bit struggling with your comment. Although I see your point, I would vote for a pragmatic approach here.

The check itself is specific to the providers as such as it assumes that there is a Kubernetes API-like decodeable object in the first place (which might be true for the extensions we know, however, not in general).

Isn't RawExtension supposed to be used exactly for this, especially in Kubernetes APIs?

I see the check is optional and it would work for our case, but it's still very confusing (we use the core Kubernetes scheme and the corev1 SchemeGroupVersion).

Why does this matter? From the tests I see that the decoding succeeds and the check works? What exactly is confusing?

Personally, I would vote to indeed move it into the extension repos (maybe we can have a library function in our extensions library that the extensions can then call individually while validating to prevent a good portion of the duplication).

We would need to modify each and every extension (not only provider extensions, every extension that uses any ProviderConfig). And some extensions don't even have an admission webhook, so essentially there is no place to host this check at the moment. Couldn't we save ourselves all this trouble?

@rfranzke
Copy link
Member

rfranzke commented Nov 4, 2021

Isn't RawExtension supposed to be used exactly for this, especially in Kubernetes APIs?

I don't think so, RawExtension can host any data blob and AFAIK there are no assumptions whatsoever that this is an inlined Kubernetes API-like object.

Why does this matter? From the tests I see that the decoding succeeds and the check works? What exactly is confusing?

I find the usage of kubernetesscheme.Scheme and corev1.SchemeGroupVersion confusing since the APIs of the extensions are neither part of this scheme nor of this SchemeGroupVersion in corev1.

Couldn't we save ourselves all this trouble?

We could, of course, and I am not strongly insisting on not having such validation centrally. Obviously, I see the arguments for this pragmatic approach. However, I still think that it is conceptually wrong and rather "hacky" doing it in this admission plugin. Hence, I'm personally rather tending towards implementing it "cleanly" (even if this is much more additional work and takes longer, however, the urgency of this fix is probably rather low and we are not under pressure).
But again, I'm not insisting and if you prefer this approach then please go ahead. Maybe you can collect some more thoughts from others like @timuthy @timebertt @BeckerMax etc. so that we at least have a consensus for the chosen approach.

@BeckerMax
Copy link
Contributor

BeckerMax commented Nov 5, 2021

For me the answer to this question comes down to whether RawExtension is defined as being an inlined kubernetes object or whether it can be an arbitrary blob. Because all g/g knows about the InfrastructureConfig is that it is a rawExtension.

(Please correct me if I read that wrong) But it seems here rawExtension is used to not store a kubernetes object : https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/daemon/update.go#L481

Following how rawExtension is used in the k8s codebase I tend to implement the validation logic in the extensions.

@timuthy
Copy link
Contributor

timuthy commented Nov 8, 2021

FMPOV, we miss unambiguous information about the usage of runtime.RawExtension. Let me try to reason the decision about this validation with the help of the doc string:

// RawExtension is used to hold extensions in external versions.

-> System-wide rule that internal versions should not be used. This is independent from the client/controller which knows which concrete type is anticipated.

// To use this, make a field which has RawExtension as its type in your external, versioned
// struct, and Object in your internal struct.

-> runtime.Object ought to be used as the internal counterpart to runtime.Extension. Hence, implementations must fulfill GetObjectKind() schema.ObjectKind which makes it clear that we talk about serializable Kubernetes objects.

Although I also first saw this validation rather to be part of every extension, I don't have any objections doing it in gardener/gardener when reading the statements above. We would just follow the best practices of runtime.RawExtension.

@voelzmo
Copy link
Member Author

voelzmo commented Nov 9, 2021

Thanks for all the comments, this is a very insightful discussion for me! When trying to take a step back to look at the arguments, here's what I'm understanding

Doing it in the Extensions:

  • runtime.RawExtension could contain anything, it does not need to have a GVK. Things which aren't objects get parsed into UnknownObject and remain accessible as json. This also happens in the k8s codebase.
  • If we want to do validation "right", there is only one place where this can be done: the extensions themselves know what they expect from their InfrastructureConfig and can therefore validate in detail if it is correct
  • If we don't do it in the extensions, we can only do it "hacky": using a Scheme and SchemeGroupVersion that we don't care about with the goal to parse some GVK out of a json
  • We don't know what kind of extensions people write/have written, therefore, we should not tighten the assumptions we make about how they are implemented (i.e. change the contract between Gardener and the extensions)

Doing it in Gardener:

  • Allows having validation in one central place instead of in every extension. This has multiple benefits:
    • No need to change extensions (and sometimes even add an admission webhook just for this purpose)
    • We get the validation even for extensions which we don't know/control
    • We can control when the is enabled and by which feature gate
    • re-usability of this check for other things than just InfrastructureConfig (see Stoyan's list above)
  • The thing we test here isn't really provider specific: We want to avoid that people use an inline json object to access an internal API. Which means: if this is not an object with a GKV, we don't have a problem and don't care from the Gardener perspective. If it is, we want to check that an external API version is used.

It seems to me that we have an argument on two different levels here: one side is based on principles (e.g. 'an extension can contain basically anything'), the other based on on concrete observations of actual usage and behavior (i.e. 'the extensions we know are using k8s objects in their InfrastructureConfig and there's more places where we would need this check').

Neither argument is wrong or incorrect here, I guess the main question is: Is this one of the issues where concrete needs should allow us to violate principles? It feels to me there are more drawbacks if we would stick to the principles here, I'm leaning towards being pragmatic with this validation. WDYT?

@stoyanr
Copy link
Contributor

stoyanr commented Nov 9, 2021

I don't think we are violating any principles. We can do this check in g/g and the extensions are still free to use whatever JSON blobs they want and do further checking and validation, as they see it appropriate. The "principle" that matters to me here is DRY ("don't repeat yourself") and the need to stay pragmatic having in mind our limited resources to invest in such improvements.

@rfranzke
Copy link
Member

Thanks @timuthy for digging this up. Fair enough then, as already said above, I'm not insisting on anything and fine with doing it in this admission plugin for the very valid pragmatic reasons. Let's just have some code comments explaining why we do such validation here and why we use apparently "wrong" schemes/scheme-group-versions so that readers of this code get a chance to understand, please.

@voelzmo voelzmo force-pushed the fix/internal-api-in-infrastructureconfig branch from abb0481 to 0dc1e0b Compare November 26, 2021 11:48
@gardener-robot gardener-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Nov 26, 2021
Copy link
Contributor

@stoyanr stoyanr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested this with a few of the raw extension kinds, the validation seems to be working as expected in general.

plugin/pkg/shoot/validator/admission.go Outdated Show resolved Hide resolved
plugin/pkg/shoot/validator/admission.go Outdated Show resolved Hide resolved
plugin/pkg/shoot/validator/admission.go Outdated Show resolved Hide resolved
plugin/pkg/shoot/validator/admission.go Outdated Show resolved Hide resolved
plugin/pkg/shoot/validator/admission_test.go Outdated Show resolved Hide resolved
@gardener-robot gardener-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. needs/second-opinion and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Nov 30, 2021
Copy link
Contributor

@stoyanr stoyanr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Copy link
Member

@rfranzke rfranzke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Thanks!

@rfranzke rfranzke merged commit 14bdc52 into gardener:master Dec 3, 2021
krgostev pushed a commit to krgostev/gardener that referenced this pull request Dec 7, 2021
…ardener#4927)

* Reject new Shoots with internal apiVersion in infrastructureConfig

* Make RawExtension internal version check re-usable

* Allow RawExtensions to contain arbitrary json

* Allow RawExtensions to be empty without failing API check

* Reject new Shoots with internal apiVersion in controlPlaneConfig

* Reject new Shoots with internal apiVersion in NetworkConfig

* Reject new Shoots with internal apiVersion in OperatingsystemConfig

* Reject new Shoots with internal apiVersion in MachineImageConfig

* Reject new Shoots with internal apiVersion in ContainerRuntimeConfig

* Extract GVK check into function

* Extract error message constant and path variable

* Declutter RawExtensions used in tests
krgostev pushed a commit to krgostev/gardener that referenced this pull request Apr 21, 2022
…ardener#4927)

* Reject new Shoots with internal apiVersion in infrastructureConfig

* Make RawExtension internal version check re-usable

* Allow RawExtensions to contain arbitrary json

* Allow RawExtensions to be empty without failing API check

* Reject new Shoots with internal apiVersion in controlPlaneConfig

* Reject new Shoots with internal apiVersion in NetworkConfig

* Reject new Shoots with internal apiVersion in OperatingsystemConfig

* Reject new Shoots with internal apiVersion in MachineImageConfig

* Reject new Shoots with internal apiVersion in ContainerRuntimeConfig

* Extract GVK check into function

* Extract error message constant and path variable

* Declutter RawExtensions used in tests
krgostev pushed a commit to krgostev/gardener that referenced this pull request Jul 5, 2022
…ardener#4927)

* Reject new Shoots with internal apiVersion in infrastructureConfig

* Make RawExtension internal version check re-usable

* Allow RawExtensions to contain arbitrary json

* Allow RawExtensions to be empty without failing API check

* Reject new Shoots with internal apiVersion in controlPlaneConfig

* Reject new Shoots with internal apiVersion in NetworkConfig

* Reject new Shoots with internal apiVersion in OperatingsystemConfig

* Reject new Shoots with internal apiVersion in MachineImageConfig

* Reject new Shoots with internal apiVersion in ContainerRuntimeConfig

* Extract GVK check into function

* Extract error message constant and path variable

* Declutter RawExtensions used in tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/control-plane Control plane related kind/bug Bug size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Do not allow creating a Shoot with .spec.provider.infrastructureConfig using an internal API version
9 participants