Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PROPOSAL] Introduce in automatically generated release notes #2677

Closed
ZilongX opened this issue Oct 27, 2022 · 20 comments · Fixed by #2732
Closed

[PROPOSAL] Introduce in automatically generated release notes #2677

ZilongX opened this issue Oct 27, 2022 · 20 comments · Fixed by #2732

Comments

@ZilongX
Copy link
Collaborator

ZilongX commented Oct 27, 2022

Proposal in one sentence

Enable the automation of release notes generations for OpenSearch-Dashboards repo

What problem are you trying to solve?

Currently, both Release Notes and CHANGELOG are updated mainly by manual efforts.

These manual updates are introducing in unnecessary burdens into each and every single release and PR, which is not serving the best interest of the OSD project and the whole community.
We need to improve our current process and make the sharing of release/change information smoother and less-painful for both our consumers and contributors.

Why should we solve it?

Release Notes / CHANGELOG are important, we definitely need to have a clear and concise way of telling people what has changed, when the changes happened and more details (like PR link) whenever needed. Yet the collection of these information should be in a much smarter, or in another word, automatic way. The current manual process should be revised ASAP, given some specific reasons including :

  • The current manual way is Time-consuming

  • The current manual way is prone to Human Errors

    • We are all human beings and Human beings make errors. While we stay inclusive when human errors happened, it's better for us to find an automatic way to minimize the chance of its happening. Recently I have learned this story from @ashwin-pc 's sharing and would like to share it here as well as an example that how could we create and maintain a good and better community as a whole :
      • Basically for this PR Ability to start OS Dashboards with a newer and compatible nodejs version  #2091, our contributor here had all the changes ready, yet later an entry in CHANGELOG is required, the contributor made the change yet didn't sign off the very commit (cause usually we ask all commits to be signed). We made decisions as this is OK since the squash commit would have the signoff.
      • Yet, to dive deep, why can't we just let each contributor to stay focused on their actual changes and just leave the maintenance work of changelog to us (maintainers , release managers and uhh enthusiastic contributors..)

How do you propose to solve it?

One word - Automation

Specifically, enable the automation of release notes generations, which is a native github offering (ref doc : https://docs.github.com/en/repositories/releasing-projects-on-github/automatically-generated-release-notes

Kudos to @seraphjiang as first proposed and implemented this feature in the dashboards-anywhere repo : opensearch-project/dashboards-anywhere#90

Here is proposed improvement steps in detail :

  • Create a new file .github/release.yml to enable the automation for release notes generation
  • Define and update the label template in .github/release.yml (we can reuse the format in our current CHANGELOG file)
  • Create / update label suites if needed
  • Test the updates (in a fork), by creating testing releases and make sure release notes are generated as expected

Once the automation is tuned as expected, next step is proposed :

  • Remove the CHANGELOG update requirement in each individual PR
  • Update the CHANGELOG based on the automatically generated release notes
  • Release Notes preparation is still a mandatory step in the release process, yet this time it would fully depend on the automation process instead of manual efforts.
@ashwin-pc
Copy link
Member

Having seen it being used in the other repo's I can attest that this is much better! Thanks @seraphjiang and @ZilongX. But, since i've never actually had to generate a release notes or a changelog for an OSD release, I'd love to hear from @ananzh and others who have actually gone through the process of generating one to see if this meets the requirements. And if not, how can we adapt it such that it does.

@mihirsoni
Copy link
Contributor

To make it super productive we need to ensure that we have very high bar on the commit messages while we merge PRs, ensure they we use squash merge to reduce noisey commits. In general these automated generator takes the commit messages.

@seraphjiang
Copy link
Member

To make it super productive we need to ensure that we have very high bar on the commit messages while we merge PRs, ensure they we use squash merge to reduce noisey commits. In general these automated generator takes the commit messages.

Agree, we need to ensure high bar on commit message. and ensure use squash merge.
The automated generator takes from PR (commit) as my observation.

The effort to ensure high quality commit message for PR and high quality for change log message are same. To save effort, we should focus on ensure high bar on commit message anyway.

@seraphjiang
Copy link
Member

@ZilongX @ashwin-pc @ananzh @AMoo-Miki

suggest to add releease.yml and have a try in 2.4.0 release, compare the auto generated note vs manual changelog so we know the gap.

@zhongnansu
Copy link
Member

@ZilongX Great proposal. Change log today is making lots of backport PRs to fail. Because different PR are touching the same file and even worse, same section. Backport automation bots fail a lot. I've had a nightmare working on backport PRs. I like the idea to target for automation. The key seems to correctly label PRs, and as mentioned, have high bar of commit messages. Again we can't rely on human, we should rely on automation or checks in CI to prevent "bad" PRs (poorly labeled, or with poor commit message). I wonder if you have explored any existing solution, or github workflows/actions.

Also I think we still have these 2 open issues discussing the release automation. It talks about using https://github.com/release-drafter/release-drafter. It seems pretty similar to github native release notes automation. I wonder if you have thought about using that, and what's the pro and cons.

Just noticed that in the next major plan for release-drafter, they have plans to onboard a "Pull Request Label Checker". release-drafter/release-drafter#1173

@kristenTian
Copy link
Contributor

Thanks Zilong, really had the pain to resolve conflicts on Changelog when it is manual work, also people need to require random X times of approval base on the X times they need to rebase.

I also would like to try this new approach and if not sufficient for our use case can migrate to other tools like @zhongnansu mentioned. In that case, given that these tool only impact on newly created PR so even the migration should be backward compatible.

@kavilla
Copy link
Member

kavilla commented Oct 31, 2022

Thanks for creating this issue @ZilongX,

I think there was discussion in another repo but I am not able to find it so I will recapture some of the comments I posted there.

Historically, any automation was really painful for the release owner still because the commits were not descriptive enough. As @mihirsoni stated, we should have a high bar for commit messages and that point we are still requiring contributors to ensure they are properly doing things. Which I don't think is a bad thing.

in real we are just shifting the burden from release managers to each individual PR contributors

I think this should be the case, in an ideal state there should be no release managers. Release managers are required because releasing is still a very manual process but the OpenSearch project is a shared governance model. Not to say that this is a reason we should keep the CHANGELOG but I wouldn't consider it a bad thing if we raise the bar for commits (ie increase the burden on contributors).

Another historically thing that makes this painful is that our release-notes purposefully did not include commits that were already released. We can revisit this if this is too hard but how do we plan on addressing this? So now a single person will have to generate the release notes and revisit the previous releases an ensure that there is no duplication? Which kind of seems this puts us back to making a release notes generation process a blocker and a couple hours for the person in charge of the release instead of requiring contributors or maintainers to handle a conflict.

The conflicts seem bad right now but I think right now it's because the CHANGELOG was really not intended for feature development. A lot of the conflicts I have seen has been due to the increase speed of contributions into the repo which I would want to reflect if the idea to keep in the original merge for the feature even though it has a lot of fast follows and bug fixes prior to an official release was a good idea. Like if there was a required increase of commits that then caused multiple updates to the CHANGELOG daily then would have reverting the originally commit that caused this influx would have also relieved this pain point. There also would not have been any backport PRs required for MD and multiple reviewers if we have originally reverted the initial commit. But that is the past but I would be hesitant to delete the CHANGELOG based on issues that caused from us probably not following best practices.

The CHANGELOG verifier action is being updated to be able to skip files if in the future another feature development comes along another proposal is if we just skip the check for adding to the changelog if the feature hasn't even been released. Like in reality for the MD project we could have just added a single bullet point for MD and then keep adding PRs to that single bullet point.

One other point is that the CHANGELOG stuff is an OpenSearch project thing. I can see the benefit of the repos matching the general schema. If I work in the OpenSearch repo to add a feature and then it requires me to add a CHANGELOG and then want to add the corresponding OpenSearch Dashboards repo feature but don't have to add a CHANGELOG I would be slightly confused at the lack of standard for contributor requirements. So perhaps this discussion is better held within the .github repo and picked up by all teams.

But I think the biggest driving factor why we moved from automated release-notes was the issue of release cadence and when commits get released and avoid duplication in release notes which was a huge blocker for every release as stated above.

@seraphjiang
Copy link
Member

seraphjiang commented Nov 1, 2022

@kavilla

One other point is that the CHANGELOG stuff is an OpenSearch project thing.

This per repo things for now. Only OpenSearch and OpenSearch Dashboards tried, and no other repo is adopting it. We have discussed with @dblock and he agree to experiment in different repo.

opensearch-project/OpenSearch#4769

But I think the biggest driving factor why we moved from automated release-notes was the issue of release cadence and when commits get released and avoid duplication in release notes which was a huge blocker for every release as stated above.

The way github's release note generated based on PR not the commit, it supports ignore labels to skip the PRs are not needed.

e.g.

changelog:
  exclude:
    labels:
      - ignore-for-release
    authors:
      - octocat

One of strong motivation is existing CHANGELOG file approach causes almost 100% backport conflict for simple CVE fix. Maintainer is lack of bandwidth to do it manually. Contributor has to raise separated backport PR. I'm sure it won't scale with limited maintainer group.

There are lots of discussion, why not we have a try in 2.4.0 and compare to the change log created manually, to see what's the real gap. what do we lose for a try?

it also support rest api, and other way to integrate with existing automation
https://docs.github.com/en/rest/releases/releases

in the case we see some gap, we could provide feedback to github

community/community#5962

This will not benefit open search project, but all github projects

@bandinib-amzn
Copy link
Member

Let's have discussion in main thread

@seraphjiang
Copy link
Member

Let's have discussion in main thread

@bandinib-amzn we have reach agreement, this is per repo based discussion. That thread is only for OpenSearch. We should focus drive the best approach for OpenSearch Dashboards repo.

@seraphjiang
Copy link
Member

@ZilongX @AMoo-Miki @kavilla @ashwin-pc @dblock

Could we try the release note automation provide by Github?
love automation, but no need to reinvent wheel.

We already had a try on other repo, results are elegant. I really what to know are concerns to give a try which could be done in 5 minutes.

https://github.com/opensearch-project/dashboards-anywhere/releases/tag/v0.8.0
https://github.com/opensearch-project/opensearch-dashboards-functional-test/releases/tag/v2.4.0-rc2

@ashwin-pc
Copy link
Member

Could we try the release note automation provide by Github?
love automation, but no need to reinvent wheel.

We already had a try on other repo, results are elegant. I really what to know are concerns to give a try which could be done in 5 minutes.

My biggest concern here is, how useful is the changelog. (FYI, The one we have now isnt much better, but since its manual, we can have entries very different from the commit or PR titles). I went through the actual changelog entries and as a user of the FTR repo, many of those changes dont make sense to me (except for the new contributors sections, thats just chefs kiss). After having taken a closer look at all the conversations around this issue, I think I'm not so sure we can wash our hands off with some automation alone.

@bandinib-amzn we have reach agreement, this is per repo based discussion. That thread is only for OpenSearch. We should focus drive the best approach for OpenSearch Dashboards repo.

@seraphjiang, while that thread is for OpenSearch, we face similar problems. In fact this seems to be a problem faced by other large OpenSource repo's too (See my link at the end). I think the discussion there is very relevant to our situation as well. While i was previously in favor of this proposal, after reading some of the comments there I think we need to ask ourselves what do we actually want in the change log?

For example.

In each of these cases, what we seem to need is less not more automation since changelog's are after all for humans and not machines. As a dev, we should be easily able to tell the changes introduced between versions. However the fundamental flaw with our approach right now is two fold:

  1. Every PR needs a changelog. We should really not be enforcing this since many changes fix issues introduced in the same version.
  2. When 2 or more PR's edit the same changelog file section, we have a conflict. This is the big pain point right now

An interesting solution i came across was actually implemented by the GitLab team: A brilliant solution to this is documented by the GitLab team: https://about.gitlab.com/blog/2018/07/03/solving-gitlabs-changelog-conflict-crisis/

I like this approach because

  1. It keeps the changelog human, where we are intentional about what we want to add to the changelog
  2. Has room for automations (not as sick as this 5 min autogeneration, but imo, more use than it)
  3. Gets rid of the changelog conflict hell that we are currently facing.
  4. Is already in use by a mature opensource project that faced a similar issue.Does not need a changelog entry

@dblock
Copy link
Member

dblock commented Nov 7, 2022

With this proposal, how can I find out what has been unrelased on 2.x? This is a major ask and a big part of why the manual CHANGELOG was introduced in the first place. If I cannot tell what changed since the last release before the release, I don't have an opportunity to provide input before something ships, which is a core feature of open-source collaboration.

@dblock
Copy link
Member

dblock commented Nov 7, 2022

@kotwanikunal I believe you've resolved much of the backport conflict problem in OpenSearch, what do you recommend doing here?

@kotwanikunal
Copy link
Member

@kotwanikunal I believe you've resolved much of the backport conflict problem in OpenSearch, what do you recommend doing here?

Sorry for the delayed response. I was away for a while.
The way we resolved much of the issues with the OpenSearch repo is by not requiring a changelog entry for all the PRs.
Release notes need any major change entries done by contributors and one of the key things @andrross pointed out was PRs can have a M:1 relation where we can have multiple PRs for a feature or any other category for that matter.
This simplification led to a general approach where maintainers reviewing PRs can mark a PR as skip-changelog, a feature which no-ops the workflow.
By default, the workflow dictates a changelog which the maintainers essentially override. This resolved a ton of conflicts in terms of all the issues listed above -

  • Reduces burden on the contributor for every PR
  • Reduces conflicts when backporting the said PRs
  • Generates changelog which is usable and succinct

The only challenge we face right now is PRs with changelog require conflict resolution when backported. I did have the GHA updated before I left for my vacation, and I will get that going as soon as I get a chance, which should also fix this last hiccup.

More on the process here: https://github.com/opensearch-project/OpenSearch/blob/main/CONTRIBUTING.md#changelog

@ashwin-pc
Copy link
Member

Thanks for the context @kotwanikunal. I generally agree with the approach you guys have taken. We have the same issue too for M:1 relationship changes and I'd also add changes that fix issues that were created in the same release cycle. I too think that a lot fewer entries need to actually go into the changelog. Some questions I have about the approach you guys took is:

  1. How much of a burden is the changelog process now for maintainers since its still quite manual?
  2. How does it work during releases? Is there usually a lot of changes made to the release notes/changelog to clean them up before a release?
  3. There still seems to be changelog conflicts with this approach. Are there any proposed plans to resolve that issue?

@kotwanikunal
Copy link
Member

Thanks for the context @kotwanikunal. I generally agree with the approach you guys have taken. We have the same issue too for M:1 relationship changes and I'd also add changes that fix issues that were created in the same release cycle. I too think that a lot fewer entries need to actually go into the changelog. Some questions I have about the approach you guys took is:

  1. How much of a burden is the changelog process now for maintainers since its still quite manual?
  2. How does it work during releases? Is there usually a lot of changes made to the release notes/changelog to clean them up before a release?
  3. There still seems to be changelog conflicts with this approach. Are there any proposed plans to resolve that issue?
  1. Yes, the process is still manual since the ultimate goal of the release notes is to be human readable, by humans.
  2. During release, we essentially copy over the changelog file into a release notes file, clear out the changelog for that branch which is a quick process for the release manager.
  3. I was trying to look for an answer for this one and it looks like we have a way forward. The main changelog needs to maintain a section for the previous major version (as in here). As long as the changelog on the previous version branch and main for the previous version are in sync, the backport actually moves it over without any conflict (sample PR)

@joshuarrrr
Copy link
Member

Should be considered for opensearch-project/.github#148

@dblock
Copy link
Member

dblock commented Apr 27, 2023

With this proposal, how can I find out what has been unrelased on 2.x? This is a major ask and a big part of why the manual CHANGELOG was introduced in the first place. If I cannot tell what changed since the last release before the release, I don't have an opportunity to provide input before something ships, which is a core feature of open-source collaboration.

I still have this question. I don’t see how the automation understands the baseline.

My 0.02c is that commit messages are a non starter because they can’t be changed. Queue spelling mistakes. We also squash. PR titles would be better.

@joshuarrrr
Copy link
Member

@ZilongX Thanks for providing this proposal - after considering some of the tradeoffs, we've decided to move forward with this alternative instead: opensearch-project/.github#156

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.