Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Working group proposal for image compatibility #128

Merged
merged 1 commit into from Oct 23, 2023

Conversation

mfranczy
Copy link
Contributor

This PR proposes a new working group to create image compatibility specification.

References:

@mfranczy mfranczy changed the title Working group proposal for image compatibility spec [WIP] Working group proposal for image compatibility spec Sep 12, 2023
Copy link
Member

@tianon tianon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned that the language here makes it sound like we are proposing to create a whole new OCI specification, when this really sounds IMO like a subset or intersection of the existing image and runtime specifications to me (exactly which one is probably a good topic for the WG, but my guess/gut is that image is most appropriate).

I also think one of the most important lessons we in the OCI need to have learned from previous WGs is to make sure we have relevant maintainers involved and committed before approving a WG (unfortunately I'm currently on extended leave, so this can't really be me, but also me alone wouldn't be enough for quorum). 👀

(I'm not on the TOB, so my comments aren't binding here by any means, but I've been part of the OCI since it was created and I'm currently a maintainer on both image and runtime specifications.)

proposals/wg-image-compatibility.md Outdated Show resolved Hide resolved
@mfranczy
Copy link
Contributor Author

mfranczy commented Sep 12, 2023

I'm concerned that the language here makes it sound like we are proposing to create a whole new OCI specification, when this really sounds IMO like a subset or intersection of the existing image and runtime specifications to me (exactly which one is probably a good topic for the WG, but my guess/gut is that image is most appropriate).

One of our ideas is to provide image compatibility specifications as an artifact (similar like SBOM). I think it would be best if we didn't have to play around with image and runtime specifications too much. If we in the working group agree and if TOB is OK with artifact way... then yes, we will define a new OCI spec for image compatibility wrapped by artifact.

I also think one of the most important lessons we in the OCI need to have learned from previous WGs is to make sure we have relevant maintainers involved and committed before approving a WG (unfortunately I'm currently on extended leave, so this can't really be me, but also me alone wouldn't be enough for quorum). 👀

Fair point.

@mfranczy
Copy link
Contributor Author

when this really sounds IMO like a subset or intersection of the existing image and runtime specifications to me

Also.. I don't think it's gonna be a subset or intersection of already provided specifications. We have to express requirements against kernel configuration, cmdline, modules, drivers. This kind of stuff is not available in the already existing specs.

@mfranczy
Copy link
Contributor Author

mfranczy commented Sep 12, 2023

I also think one of the most important lessons we in the OCI need to have learned from previous WGs is to make sure we have relevant maintainers involved and committed before approving a WG (unfortunately I'm currently on extended leave, so this can't really be me, but also me alone wouldn't be enough for quorum). 👀

Fair point.

However, If we were extending image or runtime specification, we would certainly need relevant maintainers as owners.
Although, if we are working on a completely new specification, I think it should not be a hard requirement to have quorum of image or runtime maintainers.

Don't get me wrong, I would be more than happy to have them in the group if they are interested.

@mfranczy mfranczy changed the title [WIP] Working group proposal for image compatibility spec Working group proposal for image compatibility spec Sep 12, 2023
@mfranczy mfranczy marked this pull request as draft September 12, 2023 17:18
proposals/wg-image-compatibility.md Outdated Show resolved Hide resolved
even for the same device different versions of drivers may be used. Therefore, different operating system distributions are not identical.
Even within the same OS distro, different releases are also not identical.

An image compatibility specification would help container image authors describe compatibility requirements in a standard way.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love it!

proposals/wg-image-compatibility.md Outdated Show resolved Hide resolved
proposals/wg-image-compatibility.md Show resolved Hide resolved
proposals/wg-image-compatibility.md Show resolved Hide resolved
proposals/wg-image-compatibility.md Show resolved Hide resolved
proposals/wg-image-compatibility.md Show resolved Hide resolved

* Define image compatibility spec that describes special host OS requirements of containerized application.
* The spec should describe container requirements for Linux, FreeBSD and Windows.
* The final list of supported fields (like kernel modules, configuration, out-of-tree drivers etc.)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love it! Over in HPC land we are also concerned about optimised binaries. We thought of putting the optimisation description in annotations so that the runtime can check (and fail if the image is optimised for a non-compliant micro-arch) and a registry can pick the right image. Blog post about that
FOSDEM talk

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at first, I also have thought about using annotations/labels to describe the compatibility requirements. But I found in some scenario annotation way is not flexible: the annotation is built into the image by the author, if the image is compatible with new release kernel or device drivers, and the annotations need to updated, then the author has to release a new version of image. If artifact is used instead, then the image can keep unchanged, just replace the artifact(or attach a new version artifact). This is quite useful in production environment, the production environment can keep application running while introducing new OS distribution, and no need to test the new image. The another one is artifact can provide possibibility to describe application(a set of images) level compatibility, annotations will work on single image granularity, this topic needs more discussion after working group is created.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ChristianKniep I was going to email you about this effort this morning! I'm so glad you found it!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ChristianKniep would you like to join the wg?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed I would :)

Copy link
Contributor Author

@mfranczy mfranczy Sep 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mfranczy I am interested too, but I need to get a sense of the meeting time and frequency.

@vsoch once a week should be enough. Time I don't know yet, first TOB will have to agree to form this working group, then we will have to find a common ground about time. We will have multiple time zones so I think to create a poll to see what works.

@vsoch, @ChristianKniep let me add you to the owners. What about stakeholders, what should I put in there?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stakeholders are our employers? If so I’d need to ask first.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stakeholders are our employers? If so I’d need to ask first.

https://github.com/opencontainers/tob/pull/128/files/7e2ea045dbbe650c66a249124bf7750ee7374fec#r1324498485

As suggested in the comment, list of likely adopters and implementers of the change. If you need the compatibility spec to influence any specific product then we can add it. Additionally, if you have agreement with your employer to work on this project then I think it would be fine to put them too. If you do it on your own we don't have to place them.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every domain/field can go as shallow and deep into specific annotations as they like. Our idea with annotations was that we can bake them into OCI Manifests AND Indexes.
We can even carry them as LABELS within Docker images. They can be used to smoke-test a potential image candiate to see if the node matches all requirements and expectations.
And even before. I need to pick a manifest from a given index with manifest of the same platform but with different annotations.

how to define spec is a little too far, the first step is to create a working group :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Individuals that want to participate would be listed under "Proposed Owners". "Stakeholders" are the projects that would implement it.

proposals/wg-image-compatibility.md Outdated Show resolved Hide resolved
proposals/wg-image-compatibility.md Outdated Show resolved Hide resolved
proposals/wg-image-compatibility.md Show resolved Hide resolved
proposals/wg-image-compatibility.md Show resolved Hide resolved
* Define image compatibility spec that describes special host OS requirements of containerized application.
* The spec should describe container requirements for Linux, illumos, FreeBSD and Windows.
* The final list of supported fields (like kernel modules, configuration, out-of-tree drivers etc.) is to be determined by the working group for each supported OS.
* Make the change as minimally invasive as possible to the image, runtime and distribution specs - that could be achieved over delivering compatibility artifact type.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I might be missing something -- how are the goals of an "image compatibility" spec different from image-spec, especially re: the platform object described here?

My first thought would be that the output of this WG would be to standardize some values and/or new fields there, not produce a new artifact type.

Copy link
Contributor

@estesp estesp Sep 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my personal opinion (although I haven't spend time with the stakeholders/proposers yet) is that this WG should flesh out the use cases and determine/propose what is the best model to carry this kind of information alongside and/or within an image definition. Given I was involved in the original Docker v2 discussions about the platform object when multiplatform images were first evolving, the features flag was meant to potentially hold the kind of descriptive information that runtimes could use to determine some of the compatibility issues noted in this WG proposal. However, there has never (to my knowledge) been any attempt to define how to standardize the set of disparate information (CPU flags/features? GPU available? certain kernel modules available?) and encode it in that object entity. I would hope this WG can discuss the tradeoffs of fields available versus new fields versus a new artifact before pre-determining the specific solution that is necessary to meet all the use cases/requirements.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a quick summary of my thoughts - compatibility is much more than platform, and actually the data can be enormous. The reason to have a separate spec would be to 1. account for that, 2. allow for using it to be optional (without putting burden on tools to support it and users to download data they don't need), and 3. to allow for (possibly) extension to other aspects of image compatibility, such as on the level of the library. When you especially consider that third point, and realize ABI compatibility goes into binary analysis of storing not just high level metadata but also (potentially) symbols / names and types and metadata about compilers, hardware, and the kernel, then it becomes a much harder (and larger) problem to do the assessment. I'm not saying that will be the ultimate outcome here, but I it's such an important issue that I don't think we want to close the door that it couldn't be an idea supported (e.g., if someone made a custom compatibility spec for it). I also don't think we want to, off the bat, mix up the purposes of these two (imho) very different things. The platform specific tags are used to determine, on a high level, which to pull, but not to assess compatibility of those detailed things.

Copy link
Contributor Author

@mfranczy mfranczy Sep 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I might be missing something -- how are the goals of an "image compatibility" spec different from image-spec, especially re: the platform object described here?

My first thought would be that the output of this WG would be to standardize some values and/or new fields there, not produce a new artifact type.

One of the arguments against extending the image spec for compatibility was to not make an image container release lifecycle dependend on third-party software versions in all cases. For example, whenever a new cuda driver release happens, the image author must test the driver and create a new image build to update the compatibility spec. This is not optimal unless user application changes are required. Users would be forced to release a new version just to update the metadata.

We thought to use artifact to have a distinct entity that has self-contained structure and is easily interchangeable to describe image compatibilty.

Additionally, as we explored the use cases, we realized that the compatibility spec can be a slightly more complex structure. We will need to provide users to create some sort of abstraction to cover different vendors (different configuration) that enables similar features.

I would hope this WG can discuss the tradeoffs of fields available versus new fields versus a new artifact before pre-determining the specific solution that is necessary to meet all the use cases/requirements.

That's fair. Let me adjust the scope to reflect this. Looks like me, @ChaoyiHuang and @vsoch are in the artifact camp, but let's have a discussion within the wg to hear others.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would hope this WG can discuss the tradeoffs of fields available versus new fields versus a new artifact before pre-determining the specific solution that is necessary to meet all the use cases/requirements.

totally +1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@estesp I removed any suggestions for potential implementation from the document and focused only on the goals.

proposals/wg-image-compatibility.md Outdated Show resolved Hide resolved
proposals/wg-image-compatibility.md Outdated Show resolved Hide resolved
proposals/wg-image-compatibility.md Outdated Show resolved Hide resolved
@mfranczy mfranczy changed the title Working group proposal for image compatibility spec Working group proposal for image compatibility Sep 15, 2023
## Scope

* Define image compatibility that describes special host OS requirements of containerized application.
* The compatibility should describe container requirements for Linux, illumos, FreeBSD and Windows.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cpuguy83 would you be interested in this working group? I'm wondering if there is a need for images compatibility with Windows, except for the os.version and os.features fields. Maybe someone at Microsoft would be interested in this?

These are usually described in the release notes or installation guides.
We believe we have identified the missing puzzle in the OCI standard, the image compatibility, which is a key feature that describes the requirements of special containerized applications that have hard requirements for specific kernel versions, configurations, modules, or out-of-tree drivers.

For example, a container application free5GC UPF requiring the gtp5g kernel module will only be compatible with kernel 5.0.0-23-generic or 5.4.x due to the module's hard kernel version requirements - https://free5gc.org/guide/3-install-free5gc/.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not correct. The application is dependent on the feature not a specific kernel version. While the feature may have appeared in some arbitrary kernel version that does not imply that the feature does not exist in downstream kernel versions that are older. Feature backports happen on day to day basis by teams that build distributions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, let me clarify that setting kernel version will not be mandatory. You’re right that trying to describe compatible kernel versions (upstream/downstream) is a rabbit hole. Generally speaking, compatibility should be described based on features.

However, if an author of out-of-tree module, for whatever reason, states that the module is compatible only with a set of kernel versions then a container that requires the module must follow the constraint.

This kind of situation should not be common, however, when it happens we have to give a chance to reflect this in the compatibility schema.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is about for example a specific out-of-tree kernel driver/feature that must be present, or else the workload will not work? And not the more generic case of having a fallback solution/code path should the specific driver/feature not be present (I'm thinking along the lines of all the dozens of checks Linux kernel RAID code does on startup to select the speediest method of operation)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is about for example a specific out-of-tree kernel driver/feature that must be present, or else the workload will not work?

Yes. That's the general idea. However, within the working group we will have to discuss what makes sense what does not.


An image compatibility would help container image authors describe compatibility requirements in a standard way.
The specification will be uploaded with the image to the image registry.
This makes hard container compatibility requirements discoverable, programmable, and will support different consumers and cover use cases where the application requires a specific compatible environment.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appears to follow the idea that exists outside the container world today. Where ISV X certifies their application on a distribution Y, or multiple distributions U, V, and W; runs tests against all those distros and then declares kernel-A, kernel-B, and kernel-C versions are supported. Effectively cutting out all other distributions. Recreating such a "restrictive" environment via a specification may not be in the best interest of the ecosystem as a hole.

Said differently if I provide an application container and the only means I have to specify requirements are via versions, i.e. kernel-6.2 than there is a very good chance that at least for a period of time I may be cut out of certain platforms until the hosts of those platforms, thinking of EKS, GKE, AKS for example, catch up to my requirement.

On the contrary if I can specify a feature or interface requirement in some way shape or form then I may still be cut out for some time, but I as the container vendor am not forcing the host environment to do a wholesale upgrade of the kernel, in this example. A wholesale upgrade may be more painful than a feature backport such as a specific network driver.

As such the question presents itself if this is not an opportunity to work at the very least with the Linux kernel community of a feature definition system rather than depending on arbitrary version numbers.

Copy link
Contributor Author

@mfranczy mfranczy Sep 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appears to follow the idea that exists outside the container world today. Where ISV X certifies their application on a distribution Y, or multiple distributions U, V, and W; runs tests against all those distros and then declares kernel-A, kernel-B, and kernel-C versions are supported. Effectively cutting out all other distributions. Recreating such a "restrictive" environment via a specification may not be in the best interest of the ecosystem as a hole.
Said differently if I provide an application container and the only means I have to specify requirements are via versions, i.e. kernel-6.2 than there is a very good chance that at least for a period of time I may be cut out of certain platforms until the hosts of those platforms, thinking of EKS, GKE, AKS for example, catch up to my requirement.

I think this is a slightly different use case. It’s just about expressing minimal mandatory requirements for containers in terms of kernel configuration/features that are crucial for containers to run on the host. It’s not about restricting containers to run in the environment in which they were tested, or creating some kind of “certification” chain for different distributions or kernel versions.

On the contrary if I can specify a feature or interface requirement in some way shape or form then I may still be cut out for some time, but I as the container vendor am not forcing the host environment to do a wholesale upgrade of the kernel, in this example. A wholesale upgrade may be more painful than a feature backport such as a specific network driver.

As a container vendor, if you require specific configuration or kernel modules that are critical to run a containerized application, then as an environment provider you have no choice but to meet such requirements, if you want the container to run there.

Such a requirement could be to enable KVM modules on the host for containers that run a VM iniside.

As such the question presents itself if this is not an opportunity to work at the very least with the Linux kernel community of a feature definition system

The work with the kernel community could be done in parallel, but IMO it's a much more complex topic that would take a lot of time than introducing optional image compatibility schema. We can already express the features over configuration, boot args etc.

rather than depending on arbitrary version numbers.

Kernel version will not be a mandatory field to set, but may apply to containers requiring modules that have very strict restrictions on kernel version.

I get the impression that you think we want to make compatibility spec to become a hard requirement for images or completely change the way how containers should be launched. We don’t. We simply want to give image authors a chance to describe the critical requirements (if such exists) for containerized applications that they can run on the host. What container consumers do with this information will be up to them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rjschwei would you like to join the working group?

## Purpose

In mission-critical industries, applications often require special features provided by host operating systems.
These are usually described in the release notes or installation guides.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Has Node Feature Discovery been investigated for this particular issue? It provides detection of hardware features available on nodes, https://kubernetes-sigs.github.io/node-feature-discovery/v0.14/get-started/introduction.html, and seems to be suited to this problem statement. Device plugins is another possible solution whose applicability should be investigated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NFD describes the node side. We want to provide a possibility to describe container requirements, the container side.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NFD also mentioned here https://docs.google.com/document/d/1lzwh8DGMu5vXXHwJmnewYIMffkcOEvH8owX4UYjRcw0/edit?disco=AAAA5ncnzqM

The image compatibility is to describe what special needs to the host OS by the image author from container application perspective, but not what the node features exist. For example, for free 5GC UPF, https://free5gc.org/guide/3-install-free5gc/, kernel module gtp5g and kernel versions are required to run UPF normally.

Yeah, there may be some overlap with NFD reporting in concepts(or objects). For k8s use case, NFD and image compatibility could be combined to make appropriate scheduling decisions if needed.

But the image compatibility is not only for Kubernetes use cases. It can be discussed in the working group, whether some fields in NFD are generic enough and can be re-used in image compatibility.

Kata Containers use cases are the best example: https://github.com/kata-containers/kata-containers/tree/main/docs/use-cases.


The described incompatibility issue applies to all use cases where specific host configuration is required, from bare-metal (e.g. high-performance computing) to distributed systems.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from bare-metal (e.g. high-performance computing) to distributed systems.

how about update it to "from bare-metal (e.g. high-performance computing) to distributed systems, from non-kubernetes to kubernetes orchestrated applications. "

the image compatibility can be used in various non-kubernetes involved use cases too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the image compatibility can be used in various non-kubernetes involved use cases too.

That's a very good point, although it does increase the scope of the problem discussion quite a bit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's worth it. I will update the PR. This may be a long-term goal, in the initial phase we can narrow the scope.

@mfranczy mfranczy marked this pull request as ready for review October 10, 2023 09:14
@mfranczy
Copy link
Contributor Author

mfranczy commented Oct 10, 2023

@opencontainers/tob I kindly ask you to vote. Please express your opinion or approve the working group.

(see vote in #128 (comment))

@mfranczy
Copy link
Contributor Author

I added additional owner and stakeholder. The PR is finished, please continue voting.

Copy link
Contributor

@sudo-bmitch sudo-bmitch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have questions but I think they are best resolved in the working group itself and not the proposal. Based on the scope, variety of owners, and stakeholders, this LGTM.

proposals/wg-image-compatibility.md Outdated Show resolved Hide resolved
@neersighted
Copy link

I have a combination of concerns/thoughts on the scope and ambition of such an effort. However, they're probably best for actual technical and design discussions and not for here.

If it's not too late, I'd be happy to be a stakeholder for a major runtime.

@mfranczy
Copy link
Contributor Author

I have a combination of concerns/thoughts on the scope and ambition of such an effort. However, they're probably best for actual technical and design discussions and not for here.

If it's not too late, I'd be happy to be a stakeholder for a major runtime.

It's not too late. I added you to the working group.

Signed-off-by: Marcin Franczyk <marcin0franczyk@gmail.com>
Copy link
Contributor

@estesp estesp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. While I don't have the bandwidth to commit to being a stakeholder in this WG, I'm very interested in this space per my initial involvement with "manifest lists" originally in Docker v2 image format and multi-platform support. I will try and follow along and provide feedback when possible even if I can't fully participate.

Copy link
Contributor

@jonjohnsonjr jonjohnsonjr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@samuelkarp
Copy link
Member

samuelkarp commented Oct 23, 2023

Moving the TOB vote to a comment that can only be edited by the TOB 😂

2/3 vote is required, so 6/9 TOB members.

@cyphar
Copy link
Member

cyphar commented Oct 23, 2023

And with that, the motion passes. 🎉

@samuelkarp samuelkarp merged commit e935c85 into opencontainers:main Oct 23, 2023
1 check passed
@mfranczy
Copy link
Contributor Author

FYI for those who are not subscribed to dev OCI mailing list and are not present on slack but are watching this PR.
I created the #wg-image-compatibility on opencontainers.slack.com.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet