-
Notifications
You must be signed in to change notification settings - Fork 996
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFE: Re-think Kata Containers releases #9064
Comments
So my two pence - I've not lead a release (I was signed up to shadow one, but there haven't been many to go around), but the think that intimidates me about it is the process to confirming all the things that need to be backported, backporting them and getting the CI passing (which should be much easier without the separate tests repo and after the GHA changes in fairness). So I can see clear benefits in this approach. I don't have enough information to know if any users of the stable release would be impacted - but given that they haven't had anything to use since Oct 23rd, I don't think I've heard any complaints at AC meetings asking for more and to get involved in the release? I would like to hear more from those users though to find out more. I personally have some concerns about our CI (our nightly hasn't passed for 5 months, but I think that's mostly due to non-required jobs failing), but it feels like any resource diverted from backporting to maintaining and fixing the CI might be a net positive for the project and I strongly agree that we should have the goal to maintain the Also think the vPTG is the place to discuss this more and we've already put it on the agenda. |
Thanks @fidencio for writing up the proposal. As @stevenhorsman mentioned the vPTG, I added the link to this issue to the planning etherpad: https://etherpad.opendev.org/p/kata-ptg-planning-april-2024 |
I would love to see more frequent releases. We're adding so many new features per month that we should make them part of a release sooner rather than later. Downstream projects will add the needed features anyway because they were not released. Also, I agree with Steve about the state of the CI. |
Having led several stable releases, I agree with @fidencio that the backport effort to the stable branch is a huge burden. It requires a close inspections of a lot of PRs and manual backport of the ones that are considered worthy. It is error prone and often takes several rounds to complete. @zvonkok we can certainly do rolling releases on main but it would not match the current definition of stable since the branch would get new features, not just fixes. I don't think we'd be able to release at any point in time. Another option, currently experimented by https://github.com/openshift/sandboxed-containers-operator/ , is a so called "single stream" release process where the project has two branches:
When releasing a major or minor version, i.e. when content of the development branch matches what was planned for the release, it is merged to the release branch and the release branched is tagged. If some hot fix is needed between two releases, it can be fixed in the development branch and applied to the release branch. A fix release can then be cut. With this model, the project only has one single version to support at any given time and is able to quickly release a hot fix. |
Well, I am only concerned about another Our goal should be that our main is stable and rock solid. Then it does not matter if you do point releases or rolling releases.
Based on that, we could have additional "stable" releases if we have the manpower to do that, in a different release branch. |
@gkurz, I sincerely fail to see how much your suggestion differs from what we're doing, and it'd still require backports and people to work on both fronts, which is exactly the proiblem we currently have. Let me expand here a little bit. The main problem we have is commitment from someone to actually go there and take the bullet to cut the release. By doing a stable branch and a development branch (which, from my point of view, is not much different from what we have), we'd still need someone to adapt tests and similar things in case a hot fix is needed on the stable branch, and the commitment to do so continues to be one the biggest problems which will not be addressed. |
Another thing that I'm suggesting is to NOT have a stable branch, from the original proposal:
""" |
This reduces the backport load to only hot fixes, e.g. mostly CVEs, where we want to release right away without risking any interference coming from the current development effort. Non-hot fixes would come with the next usual release, just like the rest.
The release branch isn't exactly a stable branch. No development happens there and it should ideally only be a collection of merges from the development branch. It doesn't require commitment from anyone to track all changes in the development branch and decide whether they are valuable fixes to be released. It just leaves an opportunity to be able to ship a critical fix if the need arises. |
It requires the commitment of backporting tests changes to cover the proposed fix, which may or may not be a big chunk of work depending on how much the CI changes. I'd not be willing to change what we have to add something that has the chance (and not a small one) to have exactly the same issues as we currently have. There's also the commitment to be testing both branches and make sure that thing won't get rotten in the stable branch. That's also something we currently struggle to do, and I don't see how your proposal would improve it. |
I'm not sure that's going to be true. The dev branch is always going to be ahead of ― or very briefly at the same point as ― the release branch. But that means we have two options for handling legacy environments:
|
Here's my take: lets first define "what is our stable API" and "how quick do we promise to be wrt CVEs". CVEs: If we say - CVEs will be patched within 1 month and we do monthly releases then there is no problem - we ship CVE fixes with the next release. If we see that doing monthly releases is a bit too much and we switch to releasing every 2 months, then doing what @gkurz proposes makes sense (leave open the possibility of a security release based on the latest release). Stable API: I think rolling, time based releases make the most sense for a most non-library project like kata. But we need to clearly define expectations for users/downstreams. Is the stable API the CRI interface or is it the kata-config file (I prefer including the kata config in the stability promise)? "Stable" != "no new features/fields" - it just means we don't break existing configurations, and handle additional features by having sane defaults. How much work is a release currently, and how much work could it be with automation? Are there artifacts that need to be published to destinations that can't be automated? |
Let me touch in one specific part here:
Currently the release is partially automated, but backports to the stable branch are not. So, on every release someone would have to go through what's been merged in the last 4 weeks and do the backports, considering the release schedule would work as planned (which is not the case). With what I'm proposing, the release will be fully done by a GitHub action that can be manually triggered, but apart from "click a button", nothing else is needed from the maintainer. |
Hi @fidencio , I have some questions about this proposal. |
@studychao, exactly. And the reason I'm proposing this is because we're really not mainaining the stable release branch at all. Right now it's just a non maintained burden for releases, which IMHO gives the wrong impressions to the consumers of those branches. |
Hi @fidencio First of all, I strongly support canceling maintenance on stable versions, which will save us a lot of porting bug fix patches. As an open source community product, it is reasonable for the downstream to be responsible for the stable version. But naming the version with the year and month, unlike the previous major version numbers such as 1.x, 2.x, 3.x, will it not be easy to distinguish some major features brought by each major version? So, can we still keep the previous version naming? Of course, this will indeed make automatic publishing more difficult. |
Thanks, I think that is good idea since it release our burden on maintaining a stable branch |
So, a few things to mention about this.
With all that said, keeping the versioning number we have is still do-able and still a big improvement just for dropping the stable branch, and I think we can make this work. WDYT? |
Hi @fidencio If we use time-based version numbers to release, will the 4.0 version that we are all anticipating or planning now cease to exist? If this is the case, do we need to plan at the beginning of each year what major features will be released in the coming year? |
No, that's not accurate. We do time based releases on the minor till we decide that we need a major one. @lifupan, does it sound reasonable? |
Hi all! First of all, thanks @fidencio for bringing up this topic. My personal opinion, not necessarily of my employer, I agree with the proposal of dropping the stable branch as we are clearly over-committed. It was implemented an automation to automatically backport pull requests which wasn't broadly advertised and isn't maintained. Even if we fix that automation, I still believe that downstream consumers should keep stable branches (except for CVEs, that we own them). I wanted to go back to @jepio 's comment on stable API, related with @lifupan 's point on time-based versioning: inevitably at some times we will have to break the backward compatibility (API and configuration files). How to proper advertise those breakages with time-based versioning? |
This is a very good question, and the answer is orthogonal to whether we do this change or not. |
Just a summary from the AC meeting from February 20th. Copying and pasting from the etherpad:
|
Yes, agreed. And it seems we will stick with the semantic versioning, so this (breakages) should be discussed in another occasion. |
I agree dropping the stable branch and having a consistent release cycles would make things easier for downstreams. Following up on @jepio's question: how exactly do we address hotfixes/CVEs with this approach? I think we'd still want to branch out of main in those cases, otherwise we'd end up releasing more changes than just the fixes. |
I think the proposal is that we would release more changes than just the fix (hence no patch releases any more, only minor ones), with the thought being that if our CI is good enough (which is a big if) then there shouldn't be extra risk with a maximum of ~24days of code going into the release. Based on the discussion from the AC call, my understanding is that under this proposal, if people want to just have CVE fixes on top of a previous release then they've have to fork it downstream and handle that. |
The automated release workflow starts with the creation of the release in GitHub. This is followed by the build and upload of the various artifacts, which can be very long (like hours). During this period, the release appears to be fully available in https://github.com/kata-containers/kata-containers/ even though it lacks all the artifacts. This might be confusing for users or automation consuming the release. Create the release as draft and clear the draft flag when all jobs are done. This ensure that the release will only be tagged and made public when it is fully usable. Fixes #9064 - part VI Signed-off-by: Greg Kurz <groug@kaod.org>
The automated release workflow starts with the creation of the release in GitHub. This is followed by the build and upload of the various artifacts, which can be very long (like hours). During this period, the release appears to be fully available in https://github.com/kata-containers/kata-containers/ even though it lacks all the artifacts. This might be confusing for users or automation consuming the release. Create the release as draft and clear the draft flag when all jobs are done. This ensure that the release will only be tagged and made public when it is fully usable. If some job fails, the workflow will get canceled and the draft release is left behind. This draft release will be ignored if the workflow is restarted and a new one will be created. The leaked release has no impact appart from polluting the realease page at [1]. Until we find a way to have it deleted automatically, this will need to be done by hand. [1] https://github.com/kata-containers/kata-containers/releases Fixes kata-containers#9064 - part VI Signed-off-by: Greg Kurz <groug@kaod.org>
Now that the version is an invariant for the entire workflow, it isn't required to obtain it with an environment variable. Just rely on the content of the `VERSION` file like other actions. Fixes #9064 - part VI Signed-off-by: Greg Kurz <groug@kaod.org>
For a prettier rendering in the web UI. Fixes #9064 - part VI Signed-off-by: Greg Kurz <groug@kaod.org>
The automated release workflow starts with the creation of the release in GitHub. This is followed by the build and upload of the various artifacts, which can be very long (like hours). During this period, the release appears to be fully available in https://github.com/kata-containers/kata-containers/ even though it lacks all the artifacts. This might be confusing for users or automation consuming the release. Create the release as draft and clear the draft flag when all jobs are done. This ensure that the release will only be tagged and made public when it is fully usable. If some job fails, the workflow will get canceled and the draft release is left behind. This draft release will be ignored if the workflow is restarted and a new one will be created. The leaked release has no impact appart from polluting the realease page at [1]. Until we find a way to have it deleted automatically, this will need to be done by hand. [1] https://github.com/kata-containers/kata-containers/releases Fixes #9064 - part VI Signed-off-by: Greg Kurz <groug@kaod.org>
The automated release workflow starts with the creation of the release in GitHub. This is followed by the build and upload of the various artifacts, which can be very long (like hours). During this period, the release appears to be fully available in https://github.com/kata-containers/kata-containers/ even though it lacks all the artifacts. This might be confusing for users or automation consuming the release. Create the release as draft and clear the draft flag when all jobs are done. This ensure that the release will only be tagged and made public when it is fully usable. If some job fails, the workflow will get canceled and the draft release is left behind. This draft release will be ignored if the workflow is restarted and a new one will be created. The leaked release has no impact appart from polluting the realease page at [1]. Until we find a way to have it deleted automatically, this will need to be done by hand. [1] https://github.com/kata-containers/kata-containers/releases Fixes kata-containers#9064 - part VI Signed-off-by: Greg Kurz <groug@kaod.org>
The automated release workflow starts with the creation of the release in GitHub. This is followed by the build and upload of the various artifacts, which can be very long (like hours). During this period, the release appears to be fully available in https://github.com/kata-containers/kata-containers/ even though it lacks all the artifacts. This might be confusing for users or automation consuming the release. Create the release as draft and clear the draft flag when all jobs are done. This ensure that the release will only be tagged and made public when it is fully usable. If some job fails, the workflow will get canceled and the draft release is left behind. This draft release will be ignored if the workflow is restarted and a new one will be created. The leaked release has no impact appart from polluting the realease page at [1]. Until we find a way to have it deleted automatically, this will need to be done by hand. [1] https://github.com/kata-containers/kata-containers/releases Fixes kata-containers#9064 - part VI Signed-off-by: Greg Kurz <groug@kaod.org>
The automated release workflow starts with the creation of the release in GitHub. This is followed by the build and upload of the various artifacts, which can be very long (like hours). During this period, the release appears to be fully available in https://github.com/kata-containers/kata-containers/ even though it lacks all the artifacts. This might be confusing for users or automation consuming the release. Create the release as draft and clear the draft flag when all jobs are done. This ensure that the release will only be tagged and made public when it is fully usable. If some job fails because of network timeout or any other transient error, the correct action is to restart the failed jobs until they eventually all succeed. This is by far the quicker path to complete the release process. If the workflow is *canceled* for some reason, the draft release is left behind. A new run of the workflow will create a brand new draft release with the same name (not an issue with GitHub). The draft release from the previous run should be manually deleted. This step won't be automated as it looks safer to leave the decision to a human. [1] https://github.com/kata-containers/kata-containers/releases Fixes kata-containers#9064 - part VI Signed-off-by: Greg Kurz <groug@kaod.org>
This doesn't make sense anymore with the re-design of the release process described in kata-containers#9064. Remove this file and all its references in the repo. Fixes kata-containers#9064 - part VI Signed-off-by: Greg Kurz <groug@kaod.org>
Most of the content of `docs/Stable-Branch-Strategy.md` got de-facto deprecated by the re-design of the release process described in kata-containers#9064. Remove this file and all its references in the repo. The `## Versioning` section has some useful information though. It is moved to `docs/Release-Process.md`. The documentation of the `PATCH` field is adapted according to new workflow. Fixes kata-containers#9064 - part VI Signed-off-by: Greg Kurz <groug@kaod.org>
Most of the content of `docs/Stable-Branch-Strategy.md` got de-facto deprecated by the re-design of the release process described in kata-containers#9064. Remove this file and all its references in the repo. The `## Versioning` section has some useful information though. It is moved to `docs/Release-Process.md`. The documentation of the `PATCH` field is adapted according to new workflow. Fixes kata-containers#9064 - part VI Signed-off-by: Greg Kurz <groug@kaod.org>
Most of the content of `docs/Stable-Branch-Strategy.md` got de-facto deprecated by the re-design of the release process described in kata-containers#9064. Remove this file and all its references in the repo. The `## Versioning` section has some useful information though. It is moved to `docs/Release-Process.md`. The documentation of the `PATCH` field is adapted according to new workflow. Fixes kata-containers#9064 - part VI Signed-off-by: Greg Kurz <groug@kaod.org>
Most of the content of `docs/Stable-Branch-Strategy.md` got de-facto deprecated by the re-design of the release process described in kata-containers#9064. Remove this file and all its references in the repo. The `## Versioning` section has some useful information though. It is moved to `docs/Release-Process.md`. The documentation of the `PATCH` field is adapted according to new workflow. Fixes kata-containers#9064 - part VI Signed-off-by: Greg Kurz <groug@kaod.org>
Most of the content of `docs/Stable-Branch-Strategy.md` got de-facto deprecated by the re-design of the release process described in kata-containers#9064. Remove this file and all its references in the repo. The `## Versioning` section has some useful information though. It is moved to `docs/Release-Process.md`. The documentation of the `PATCH` field is adapted according to new workflow. Fixes kata-containers#9064 - part VI Signed-off-by: Greg Kurz <groug@kaod.org>
Now that the version is an invariant for the entire workflow, it isn't required to obtain it with an environment variable. Just rely on the content of the `VERSION` file like other actions. Fixes kata-containers#9064 - part VI Signed-off-by: Greg Kurz <groug@kaod.org>
For a prettier rendering in the web UI. Fixes kata-containers#9064 - part VI Signed-off-by: Greg Kurz <groug@kaod.org>
The automated release workflow starts with the creation of the release in GitHub. This is followed by the build and upload of the various artifacts, which can be very long (like hours). During this period, the release appears to be fully available in https://github.com/kata-containers/kata-containers/ even though it lacks all the artifacts. This might be confusing for users or automation consuming the release. Create the release as draft and clear the draft flag when all jobs are done. This ensure that the release will only be tagged and made public when it is fully usable. If some job fails because of network timeout or any other transient error, the correct action is to restart the failed jobs until they eventually all succeed. This is by far the quicker path to complete the release process. If the workflow is *canceled* for some reason, the draft release is left behind. A new run of the workflow will create a brand new draft release with the same name (not an issue with GitHub). The draft release from the previous run should be manually deleted. This step won't be automated as it looks safer to leave the decision to a human. [1] https://github.com/kata-containers/kata-containers/releases Fixes kata-containers#9064 - part VI Signed-off-by: Greg Kurz <groug@kaod.org>
Most of the content of `docs/Stable-Branch-Strategy.md` got de-facto deprecated by the re-design of the release process described in kata-containers#9064. Remove this file and all its references in the repo. The `## Versioning` section has some useful information though. It is moved to `docs/Release-Process.md`. The documentation of the `PATCH` field is adapted according to new workflow. Fixes kata-containers#9064 - part VI Signed-off-by: Greg Kurz <groug@kaod.org>
This release is a special case, as we've slacked for 6 months and the release content is way too long ... long enough to exceed the allowed limit for the release notes. With this in mind we'll just remove the `--generate-notes` for now, and then revert this commit as soon as the release is out, as releases should be happening every month and, ideally, we won't reach this situation never ever again. Fixes: kata-containers#9064 - part V Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This reverts commit 0fa59ff, as now we'll be able to use the `--generate-notes`, hopefully, without blowing the allowed limit. Fixes: kata-containers#9064 - part VI Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Now that the version is an invariant for the entire workflow, it isn't required to obtain it with an environment variable. Just rely on the content of the `VERSION` file like other actions. Fixes kata-containers#9064 - part VI Signed-off-by: Greg Kurz <groug@kaod.org>
For a prettier rendering in the web UI. Fixes kata-containers#9064 - part VI Signed-off-by: Greg Kurz <groug@kaod.org>
The automated release workflow starts with the creation of the release in GitHub. This is followed by the build and upload of the various artifacts, which can be very long (like hours). During this period, the release appears to be fully available in https://github.com/kata-containers/kata-containers/ even though it lacks all the artifacts. This might be confusing for users or automation consuming the release. Create the release as draft and clear the draft flag when all jobs are done. This ensure that the release will only be tagged and made public when it is fully usable. If some job fails because of network timeout or any other transient error, the correct action is to restart the failed jobs until they eventually all succeed. This is by far the quicker path to complete the release process. If the workflow is *canceled* for some reason, the draft release is left behind. A new run of the workflow will create a brand new draft release with the same name (not an issue with GitHub). The draft release from the previous run should be manually deleted. This step won't be automated as it looks safer to leave the decision to a human. [1] https://github.com/kata-containers/kata-containers/releases Fixes kata-containers#9064 - part VI Signed-off-by: Greg Kurz <groug@kaod.org>
Most of the content of `docs/Stable-Branch-Strategy.md` got de-facto deprecated by the re-design of the release process described in kata-containers#9064. Remove this file and all its references in the repo. The `## Versioning` section has some useful information though. It is moved to `docs/Release-Process.md`. The documentation of the `PATCH` field is adapted according to new workflow. Fixes kata-containers#9064 - part VI Signed-off-by: Greg Kurz <groug@kaod.org>
This release is a special case, as we've slacked for 6 months and the release content is way too long ... long enough to exceed the allowed limit for the release notes. With this in mind we'll just remove the `--generate-notes` for now, and then revert this commit as soon as the release is out, as releases should be happening every month and, ideally, we won't reach this situation never ever again. Fixes: kata-containers#9064 - part V Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This reverts commit 0fa59ff, as now we'll be able to use the `--generate-notes`, hopefully, without blowing the allowed limit. Fixes: kata-containers#9064 - part VI Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Now that the version is an invariant for the entire workflow, it isn't required to obtain it with an environment variable. Just rely on the content of the `VERSION` file like other actions. Fixes kata-containers#9064 - part VI Signed-off-by: Greg Kurz <groug@kaod.org>
For a prettier rendering in the web UI. Fixes kata-containers#9064 - part VI Signed-off-by: Greg Kurz <groug@kaod.org>
The automated release workflow starts with the creation of the release in GitHub. This is followed by the build and upload of the various artifacts, which can be very long (like hours). During this period, the release appears to be fully available in https://github.com/kata-containers/kata-containers/ even though it lacks all the artifacts. This might be confusing for users or automation consuming the release. Create the release as draft and clear the draft flag when all jobs are done. This ensure that the release will only be tagged and made public when it is fully usable. If some job fails because of network timeout or any other transient error, the correct action is to restart the failed jobs until they eventually all succeed. This is by far the quicker path to complete the release process. If the workflow is *canceled* for some reason, the draft release is left behind. A new run of the workflow will create a brand new draft release with the same name (not an issue with GitHub). The draft release from the previous run should be manually deleted. This step won't be automated as it looks safer to leave the decision to a human. [1] https://github.com/kata-containers/kata-containers/releases Fixes kata-containers#9064 - part VI Signed-off-by: Greg Kurz <groug@kaod.org>
Most of the content of `docs/Stable-Branch-Strategy.md` got de-facto deprecated by the re-design of the release process described in kata-containers#9064. Remove this file and all its references in the repo. The `## Versioning` section has some useful information though. It is moved to `docs/Release-Process.md`. The documentation of the `PATCH` field is adapted according to new workflow. Fixes kata-containers#9064 - part VI Signed-off-by: Greg Kurz <groug@kaod.org>
Folks,
I'm taking the time to write down something that's been going on on my mind for a while, and I'd really like to gather feedback from the @kata-containers/architecture-committee, @kata-containers/kata-containers-maintainer, and users of our project.
Kata Containers has had, in the past, the commitment to do one release per month on each maintained branch, be this release an alpha one or a stable one. In the past we used to maintain 2 stable branches, currently we've been maintaining only one.
However, the last release we had was 5 months ago, and the last backported commit to the stable branch is from 3 months ago. This is not because we haven't had content to be released, quite the opposite, we had content, but we didn't have people's power to actually commit to backport patches and cut a release.
With this said, and knowing that historically we've struggled to get our releases out of the door, I'd like to propose a few things here:
It's importanta to say that if a CVE needs to be fixed, we'd be able to cut a YYYY.MM.n+1 release at any moment.
For instance, if a CVE shows up in January 2025, after our release, we'd be able to cut a new release,
2025.01.1
as soon as the patches land on the main branch.Theoretically, this can help us to focus on running one simple script, which would get what we have passing CI, and consider this a release. This would also eliminate the burden to do the maintenance for stable releases (which we currently struggle quite hard to do anyways), and it'd also simplify a lot all the tools around the project (such as kata-deploy and other installers).
Please, I'd like to hear from everyone who's interested in the topic, and we can have a final discussion about this during the vPTG, but the sooner we start discussing this the better.
It's also important to note that what I have in mind for the release process is a GitHub action that would be manually triggered, and that would automagically do the release for us, including tagging, and everything else that's manually done Today.
The text was updated successfully, but these errors were encountered: