Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

De-share InitContainer type from Container #115362

Closed
SergeyKanzhelev opened this issue Jan 27, 2023 · 31 comments
Closed

De-share InitContainer type from Container #115362

SergeyKanzhelev opened this issue Jan 27, 2023 · 31 comments
Labels
area/api Indicates an issue on api area. area/code-organization Issues or PRs related to kubernetes code organization kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. sig/node Categorizes an issue or PR as relevant to SIG Node.

Comments

@SergeyKanzhelev
Copy link
Member

SergeyKanzhelev commented Jan 27, 2023

In order to simplify future KEPs and allow more flexibility on adding new fields and behaviors on Containers I suggest we fork the type of InitContainer and regular Container.

This de-sharing will be very helpful with the Sidecar containers KEP kubernetes/enhancements#3759 where we want to introduce a new field on Init containers, but not on any other containers.

There will be more requests for change of these types going forward and de-sharing of types will simplify discussions and improve velocity of features development.

It will also allow us to have a better documentation of Init Containers type - today the same documentation for each field is used.

Existing approach

We already have a precedence in code where EphemeralContainer is a separate type from Container. The way it's implemented is by introducing a collection EphemeralContainerCommon that contains all the same fields as a regular Container so it can be casted to the Container type. See the type definition and a test.

This approach simplified the development in some places, but made types heavily interdependent so no new fields can be added to the Container type without updating EphemeralContainer as well.

Proposed

The proposal is to completely split these types making them a separate types that cannot be casted to one another.

This will require some changes in interfaces, mostly for the various resource managers. Those interfaces will be extended to provide separate API methods for InitContainer and a regular Container.

There are a few places that may need to iterate over all containers making similar changes. If those patterns common across the code, we will implement helper functions that would simplify this iterating exposing only common fields, to the iterator function, not the entire Container class.

Risks

There are two types of risks - risk of having separate types going forward vs. risk for breaking existing integrations.

  1. Since types will be separated, some new feature development may cause errors when, for example, InitContainers were not iterated over for some sort of a verification or update. This risk exists today, but may be less than after de-share.

  2. Existing integrations that are using the same types as declared in k/k would need to be rewritten to use new types. In most cases it will be code duplication. This risk is high, but since it's one time change and only affects code that likely will be re-built, the risk is not huge.

Steps to split types

The work of splitting types will require modifications all over the places. Thus the suggestion is to do it in stages to simplify review process and avoid constant rebases:

  1. Do some refactoring in places that can be simply changed to avoid type sharing.

  2. Introduce the InitContainer type alias and rename all obvious places across the codebase.

    type InitContainer = Container
  3. Change interfaces that recieve Container type by extending it for InitContainer as well.

  4. Implement InitContainer as a separate type with all the same initial fields as Container has.

  5. Do the split for EphemeralContainer type.

/sig node
/sig architecture
/area code-organization
/area api
/priority important-longterm
/kind cleanup

/cc @thockin @jpbetz

@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. area/code-organization Issues or PRs related to kubernetes code organization area/api Indicates an issue on api area. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jan 27, 2023
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@thockin
Copy link
Member

thockin commented Jan 27, 2023

I am generally in favor of de-sharing, but we should be clear about HOW MUCH work this is (and risk) and why we want to do it.

I also want to call out the need for API docs to not be totally muddied by this. The fact that we use Go code as our IDL means that changes like this (and embedding types) can expose details that no consumer of our API should care about, and can make the docs hard to navigate.

@SergeyKanzhelev
Copy link
Member Author

Since this change will keep all the existing fields in place, I'd say we can start doing it now. I updated the motivational part of this issue description. Also added risks section.

In terms of HOW MUCH work this is - it doesn't look overwhelmingly a lot. Definitely not a single PR. I can try to get a single big PR ready as POC, but would really love to split it into steps.

Would you be open to temporary type aliasing like #115364 to simplify reviews process?

@liggitt
Copy link
Member

liggitt commented Jan 27, 2023

The proposal is to completely split these types making them a separate types that cannot be casted to one another.

what are the implications for things like:

  • VisitContainers, which rely on being able to iterate over all container definitions of all types in a pod
  • validateContainerCommon, which has a lot of validation common to all container definitions

@liggitt
Copy link
Member

liggitt commented Jan 27, 2023

We already have a precedence in code where EphemeralContainer is a separate type from Container. The way it's implemented is by introducing a collection EphemeralContainerCommon that contains all the same fields as a regular Container so it can be casted to the Container type. See the type definition and a test.

EphmeralContainers were implemented that way so internal API validation, external admission policy, and callers of VisitContainers could continue to operate on the v1.Container schema for all container definitions.

@SergeyKanzhelev
Copy link
Member Author

SergeyKanzhelev commented Jan 27, 2023

  • VisitContainers, which rely on being able to iterate over all container definitions of all types in a pod

Some (most?) VisitContainers calls in codebase can be replaced with VisitContainerResources or similar like here: #115366 (not a complete PR, just for demo). Others with the code duplication. Not that many places.

  • validateContainerCommon, which has a lot of validation common to all container definitions

Code duplication wouldn't be too much. Most of validators operate on fields of a container. Those can be extracted and passed as arguments instead of a container object

EphmeralContainers were implemented that way so internal API validation, external admission policy, and callers of VisitContainers could continue to operate on the v1.Container schema for all container definitions.

Do external validators use types from types.go or have their own casting of types? Alternative for the sidecar containers KEP is to apply the same pattern for InitContainer as we use for EphemeralContainer. Would it be less controversial or will it require equal amount of work for external admission admission policy?

xref: https://kubernetes.slack.com/archives/CJUQN3E4T/p1674844985206829

@liggitt
Copy link
Member

liggitt commented Jan 30, 2023

Code duplication wouldn't be too much. Most of validators operate on fields of a container. Those can be extracted and passed as arguments instead of a container object

The point of the VisitContainers helper was to avoid duplicating code that iterates over containers in a pod... when initContainers and ephemeralContainers were added, there were many ad hoc iterations that missed those container types and had to be updated.

Do external validators use types from types.go or have their own casting of types?

I see three main approaches:

  1. use types.go directly (only go-based validators would do this)
  2. visit containers themselves using their own version of VisitContainers
  3. write untyped validation rules against the v1.Container schema in something like rego

Diverging schemas would be likely to cause issues for approaches 2 and 3

@thockin
Copy link
Member

thockin commented Jan 30, 2023

Is validation the primary concern? I assume there's lots of other code which wants to handle real/init/ephemeral containers congruently.

The ephemeral container code asserts that 2 types must be identical so pointers can be safely cast between them (I don't understand the history of why it's not one type embedded in both places). Last I looked, our doc generation didn't handle this as well I wanted.

e.g.

type EphemeralContainer struct {                                                                                                                                                 // 
    // Ephemeral containers have all of the fields of Container, plus additional fields
    // specific to ephemeral containers. Fields in common with Container are in the
    // following inlined struct so than an EphemeralContainer may easily be converted
    // to a Container.
    EphemeralContainerCommon `json:",inline" protobuf:"bytes,1,req"`

and

// EphemeralContainerCommon is a copy of all fields in Container to be inlined in
// EphemeralContainer. This separate type allows easy conversion from EphemeralContainer
// to Container and allows separate documentation for the fields of EphemeralContainer.
// When a new field is added to Container it must be added here as well.
type EphemeralContainerCommon struct {                              

This seems VERY tedious and fragile, we must not do it as a regular thing. Once any interior type changes, it breaks.

We have historically avoided adding methods to our types - preferring to keep them "dumb data", but this is what they are good for.

Something like https://go.dev/play/p/mMaLdurF7dg means we don't have to keep structs in exact sync, but it is a lot of boilerplate and another level of abstraction.

Or we could rip off the bandaid - just break apart the types and fully drop the idea that you can use the same code on different flavors of container.

I never thought I would be wishing for C++ templates or C macros :)

@SergeyKanzhelev
Copy link
Member Author

Specifically for sidecar containers KEP we want to introduce a field for InitContainer, but not for Container. We potentially may want this field on regular Container later, but maybe even with the different values allowed. I haven't heard any requests to have this field for EphemeralContainer.

Easiest we can do it to introduce this field now for all three types of containers. This will require no additional work for any external validators and very little change for the existing code.

Any change that will introduce a field on InitContainer-only or on regular&init containers, but not Ephemeral will require changes with these three validator patterns:

  1. use types.go directly (only go-based validators would do this)
  2. visit containers themselves using their own version of VisitContainers
  3. write untyped validation rules against the v1.Container schema in something like rego

Diverging schemas would be likely to cause issues for approaches 2 and 3

Since we will guarantee that the first few fields will match exactly for back compat, validators may be rewritten in reasonably-generic way with the patterns like one @thockin suggested above. But for sure there will be change needed in those.

The fork of types doesn't seem to be breaking any API contracts, but likely IS a breaking change de facto. With the sidecar containers and future changes we need to choose between confusing end users vs. ask one time change from various components owners.

I am generally against breaking changes. But in this case I'm on a fence - I have a feeling this is not the last discussion and postponing the types fork will only make problem worse.

What data may help to make a decision? If forking types is a way forward, we may want to do it early in 1.27 cycle.

@jpbetz
Copy link
Contributor

jpbetz commented Jan 30, 2023

The fork of types doesn't seem to be breaking any API contracts, but likely IS a breaking change de facto. With the sidecar containers and future changes we need to choose between confusing end users vs. ask one time change from various components owners.

Given the history of breaking changes to client-go and how that has impacted clients, I'm not excited about another breaking change. One option to consider is that we defer a breaking change such that we can bundle it up with other client-go breaking changes into a "major" release of client-go at some point in the future.

@thockin
Copy link
Member

thockin commented Jan 30, 2023 via email

@SergeyKanzhelev
Copy link
Member Author

One more data point here - with the proposals like this: https://github.com/obiTrinobiIntel/enhancements/tree/master/keps/sig-node/3675-resource-plugin-manager which include new interfaces between kubelet and plugins and take a dependency on a fact that there is a single Container type, we will be accumulating more debt.

@jpbetz
Copy link
Contributor

jpbetz commented Feb 1, 2023

@lavalamp WDYT?

@lavalamp
Copy link
Member

lavalamp commented Feb 1, 2023

From an API perspective this is fine and at least in some cases would let you give better documentation.

In this particular case I'm not sure that's true (duplicating types will give users a lot of redundant documentation to wade through, actually obscuring the differences rather than making them easy to see), and it definitely seems like it'd be a ton of work to compensate, e.g. I'm sure kubelet would have to have a common interface to hide all possible containers behind.

@SergeyKanzhelev
Copy link
Member Author

If we will go ahead with the types split, one more decision to make is whether we need to split ContainerStatus types at the same time. They are already in different collections.

I went ahead and reviewed the ContainerStatus type: #115463

@liggitt
Copy link
Member

liggitt commented Feb 2, 2023

it definitely seems like it'd be a ton of work to compensate, e.g. I'm sure kubelet would have to have a common interface to hide all possible containers behind.

agree, splitting types seems like it will make it much harder to operate on all the lists of containers consistently, and require adding a lot of code to try to mask any differences introduced to get back to common handling

@SergeyKanzhelev
Copy link
Member Author

it definitely seems like it'd be a ton of work to compensate, e.g. I'm sure kubelet would have to have a common interface to hide all possible containers behind.

agree, splitting types seems like it will make it much harder to operate on all the lists of containers consistently, and require adding a lot of code to try to mask any differences introduced to get back to common handling

How would one approach weighting on changes needed and amount of code duplication vs. user confusion having unrelated fields on certain types? What additional information needed?

If we can confirm that rego will keep working on common types without changes, will it be a good indication that we can fork the types?

Specifically for sidecar containers KEP kubernetes/enhancements#3761 - it would be great to make this decision soon. I'm happy to work on gathering more information if needed.

@thockin
Copy link
Member

thockin commented Feb 2, 2023 via email

@liggitt
Copy link
Member

liggitt commented Feb 2, 2023

I think everyone is saying "this will be a lot of work" but nobody is saying "don't do it"

I mean… I don't think we should do it… I think the result will be harder to maintain and understand for pretty much everyone.

@lavalamp
Copy link
Member

lavalamp commented Feb 2, 2023

Don't do it.

@lavalamp
Copy link
Member

lavalamp commented Feb 2, 2023

(By which I mean, it seems like the effort is out of proportion with the benefit.)

@thockin
Copy link
Member

thockin commented Feb 2, 2023

The problem I have with @liggitt answer "the result will be harder to maintain and understand" is that the tradeoff is having fields in schema that do not apply in some contexts, which is also "hard to maintain and understand", just on a different axis.

(By which I mean, it seems like the effort is out of proportion with the benefit.)

Who am I to tell other people how to spend their time? :)

If I could wave a wand and de-share EVERYTHING I probably would, even though it will result in net more LOC. The fact that our "official" client exposes this level of implementation detail just makes me cry.

@SergeyKanzhelev
Copy link
Member Author

(By which I mean, it seems like the effort is out of proportion with the benefit.)

Who am I to tell other people how to spend their time? :)

Internally in k/k repo or externally total hours spent by people adopting the change? Internally it doesn't look to be too huge.

@thockin
Copy link
Member

thockin commented Feb 3, 2023

Here's the fulcrum for me: it's all about client-go.

We have, in the past, make incompatible changes, and we don't currently version it. That's not really an excuse to wantonly break people.

How impactful do we think a change like this would be on people? Do most of them do things like:

for _, c := range pod.Spec.Containers {

or do most of them do things like visitAllContainers(pod, myFunc) ?

I want to believe it's the former, which would make this client-go change bad but not BAD bad. But I don't have data to back that up.

@SergeyKanzhelev
Copy link
Member Author

Do most of them do things like:

sometimes like this:

for _, container := range append(pod.Spec.InitContainers, pod.Spec.Containers...) {

which is worst of two. See #115458 for example

@jpbetz
Copy link
Contributor

jpbetz commented Feb 4, 2023

I'm not in favor of this given the impact to client-go. Downstream projects that consume not just k8s, but also other packages that depend on k8s and use the corev1 types get into a bind where they need all the projects that depend on k8s updated before they can upgrade.

@thockin
Copy link
Member

thockin commented Feb 7, 2023

As a project maintainer, I really want us to do the type split, and I really don't want "you can't use this here" fields.

To do the type split properly we either hugely impact client-go or we write a NEW client-lib that doesn't expose all of our internal gunk, and actually has API guarantees. I think we should do that, but it's REALLY not a side-quest.

That said, I am not willing to block the development of sidecars KEP on this HUGE, unstaffed, and guaranteed contentious effort.

So with great regret, I think the best course of action here is to add the field to Container and ensure that it doesn't get used outside of InitContainers, with a link back to this issue explaining why we are self-flagellating.

If someone wants to undertake the type-sharing problem, I am happy to offer my time to help figure out how.

@SergeyKanzhelev
Copy link
Member Author

@thockin thank you!

One last comment, would it be OK to introduce a type alias like this: type InitContainer = Container? This will allow to write code a bit cleaner while no back combat will be broken. See this PR: #115364

@thockin
Copy link
Member

thockin commented Feb 8, 2023 via email

@SergeyKanzhelev
Copy link
Member Author

/close

we will proceed with shared types

@gjkim42
Copy link
Member

gjkim42 commented Feb 22, 2023

we will proceed with shared types

As part of the KEP, init containers and regular containers will be split into two different types.

We'll need updates on the KEP. Is it ok to defer the KEP update to the next release?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/api Indicates an issue on api area. area/code-organization Issues or PRs related to kubernetes code organization kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. sig/node Categorizes an issue or PR as relevant to SIG Node.
Projects
None yet
Development

No branches or pull requests

7 participants