Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
Better tooling for releases and development #220
Let's have an issue for collecting discussion about improving our tooling for release automation and similar.
I think it would be nice to move to towncrier, and possibly adopt the twisted rule that non-trivial PRs should always close an issue. (The idea is that the issue records what the problem is and what was considered for fixing it, while PRs are used for discussing a particular candidate patch.) Basically the way this works is that whenever you submit a PR, you make a new file in a special directory that contains a description of that change. This way you can generate your user-oriented release notes as you go, in a standard format, without hitting merge conflicts. (Then you also need the rule that PRs have to actually contain one of these files, possibly enforced by a linting bot.)
Alternatives include openstack's reno which seems substantially more complicated (it involves yaml and a special tool that contributors need to use to generate the file, etc.). CPython had a long discussion about this and decided to go with their own tool that they can tweak to match the peculiarities of their workflow, but it's a useful survey (if very long).
[Edit: #262 set things up to use towncrier for release notes on master; let's see how it goes.]
The Rust community has this great bot that does their merging called "homu" (or sometimes "bors", but the modern implementation is "homu"). The way it works is that when a change has passed review, the reviewer tells the bot to go ahead and merge it, and then the bot makes sure the tests pass on current master before merging. (With github status checks, the normal thing is: (1) PR submitted, (2) status checks test that it works against current master, (3) other stuff gets merged into master, (4) PR gets merged into master. So the actual commits on master are generally not tested until after they get merged, which is kind of absurd if you think about it.)
Unfortunately homu itself doesn't currently support our setup with appveyor+travis. It looks like this is the PR for that: barosl/homu#134 (sitting unmerged for ~16 months now, woo. maybe the servo/homu fork is the real one now?)
(I'd also be nervous about using the hosted version of homu while this bug remains unfixed.)
What I'd really love is a bot that works like homu, but whose sequence goes like:
Rationale for the formatting thing: that it's lovely to just not have to worry about formatting during code review, and it's not reasonable to expect casual contributors to set up precommit hooks. But you shouldn't merge code you haven't tested! So the reformatting should be done before the gating tests.
Rationale for having the bot handle releases: traditionally, F/OSS projects handle trust issues by having an inner circle of devs who have commit bits and have a lot of power over the project (e.g., the technical ability to quietly insert backdoors!), and so join this circle there's a long period of building trust. With trio we're trying a model of development where instead of having strict barriers to who's allowed to commit, we give out commit bits like candy, but use branch protection to make sure everything happens in the open through PRs, so if anything nefarious is going on there are lots of people who will notice. But... there's then a second level of special privileges, which is who has permissions on PyPI. For obvious reasons I don't want to just give this out to everyone. But I don't want to be the single point of failure, either, and for several reasons it would be nice not to have to drop this responsibility on some carefully-selected group either. Instead, it would be great if releases could somehow go through our usual PR workflow.
I'm not sure what the "appropriate condition" would be: something like
...I guess thinking a bit more, it might make more sense to do something like, you file an issue called "v0.2.0 release tracking" and say
This stuff is extremely not the biggest priority right now, when we're still trying to figure out basic APIs and it's not clear whether trio will get traction. But since I had some thoughts, I figured I'd record them here where they can be picked up again later :-).
I'd be happy if we copied the continuous release tooling and process from Hypothesis: https://hypothesis.works/articles/continuous-releases/
TLDR; this wouldn't scale up to CPython or Rust but it works very well for Hypothesis and I expect would for Trio too, if you like the style.
@Zac-HD That's an interesting idea! I really like Hypothesis's approach, and also I'm really excited to see someone actually trying it for real :-). But I see two big differences between Trio and Hypothesis, which may or may not affect the viability of this approach:
These are both more like, vague worries than an actual argument either way. And this is very much a question of like, what works better for us, so there's no one true right answer. I'd definitely be interested to hear what other contributors think. Would auto-releasing have freaked you out when you first got the commit bit? Would more frequent releases make you more excited to contribute? Do you think users would prefer deprecations to trickle out one day at a time, or to get a chunk of them every month or two in a larger release?
I'd be nervous about anything unseen being merged. I think it would be better if a bot tested each commit and automatically issued PRs against the PR branch to fix any formatting problems, or just pushed directly to the PR branch so no intervention was necessary. At least that way the changes could be reviewed
A lot of PRs include a lot of rubbish
I like Zac's suggestion, and I'm excited by the prospect of dropping a formal release ceremony, but I share some of the already explained worries, especially on this:
If a merged PR becomes to mean "a change worth releasing", does that mean that every single PR is supposed to improve the overall quality of the project? Because that's what I expect from subsequent releases. But maybe my expectations are off and this is the kind of assumption that I need to change. Still... it seems hard not to suggest the latest release to a newcomer, and PyPI/pip are biased towards the latest release.
It just seems to put a little more pressure on contributors. If I'm worrying about this then probably someone else would worry too. Does that mean, for example, that a new feature cannot be merged if it causes a performance regression (that maybe can be fixed by another pending PR)? Or when it alters the behavior of something else that was already planned to be removed? What if again it introduces a performance bug, but only on platforms that I have no access to test myself? And the big question is still: what if it's an API change?
In my experience this kind of continuous release works very well for product releases, but I know very little about adoption by open source projects. Hypothesis experience is very encouraging, and I would love to hear more from other projects!
Really excited to see some discussion on this one
Re: formatting, I'd leave this alone in Trio for now - not least because Hypothesis will probably try using Black soon. Basically we fail CI if formatting again produces a git diff. It's possible to have Travis commit the diff and try to push it back to the user branch if they accept maintainer pushes (the default), but this risks nasty problems for inexperienced users of Git and IMO isn't worth it.
Re: commit bit, merging, and problematic code... no clear answer from me. In Hypothesis we don't do speculative merges (where a maintainer might need to fix it later); to compensate we try to start review very quickly, give ongoing feedback, and offer to finish any PR that gets too frustrating (while continuing to credit the original contributor). A network library has a very different perspective on security to an offline fuzzing tool!
Re: size of features, you eventually get used to the idea that minor and patch releases just aren't a big deal - every API change (feature, deprecation, or bugfix) gets a minor release, and everything else gets a patch release. For the really big changes - Trio's 0.2, Hypothesis 4.0 - we set up another branch and accept PRs into that, and maintainers merge that one into master later. It doesn't come up often though!
Re: @sorcio's excellent questions on contributor pressure, I'm without a clear answer again - some people probably will worry, but I don't think it's insurmountable. To answer the specifics: no merging features with performance impacts that we don't want to release (but maintainers could address that in the PR). Any changes to behavior in Hypothesis are guided by backwards compatibility, not planned deprecation, but we are unlikely to accept new feature that we know we want to remove. For performance bugs that weren't caught before merge, we issue a patch release (it turns out that this is a great way of finding corporate users). API changes are handled in the usual way - project-specific in the 0.x phase, then minor release for compatible changes and major releases for incompatible changes.
The only bit I really worry about is vandalism; everything else is like the current system but faster - albeit with correspondingly more frequent demands on reviewers.
I mean... PRs are always supposed to improve the overall quality of the project, right? :-)
I believe that in Hypothesis's setup, they distinguish between two kinds of PRs: trivial ones that don't have any news entry and don't trigger a release, and non-trivial ones that have a news entry and do trigger a release. So that means that e.g. a commit that just tweaks
Now, as a general principle, I think this is what we want to strive for anyway. If a PR is known to cause a significant performance regression, then the options are (1) decide the change is important enough to justify the regression, (2) block merging that PR until the regression is fixed, (3) merge the PR anyway, and then block releasing until the regression is fixed. The main difference between (2) and (3) is that in (2), this one change is blocked from reaching users, while in (3), all changes are blocked from reaching users. Plus it's just awkward and frustrating to try to keep track of which known issues are currently lurking in master, and if you have a steady stream of PRs like this then you can even end up in a state where master is never releasable; this is how you end up with the cliche of an open-source project that makes like, one release every 5 years or something. So in theory, we should prefer (1) or (2) and avoid (3) entirely.
But I've known all this theory for like a decade, and yet 0.2.0 still happened – I think because the new features were large and complicated enough that I needed multiple PRs worth of iterations to figure them all out. There is a way around this though: make a branch, submit multiple PRs against that branch, and then once the branch is in a nice complete state, merge it into master.
There's also another case, where a PR causes a significant performance regression but we don't realize it until after it's merged. In this case I think it's fine if we let the release out – if we didn't catch it before merging then it's not likely that letting it sit in master for a few weeks, with no-one using it, is going to help us catch it either. The sooner we release it, the sooner someone will notice the problem, file a bug, and then we can fix it and improve our CI so that the next time this happens we can catch it in the PR stage.
Oh, we've been using yapf in this mode since before black existed :-). The musing in my original post was about whether we can make things even simpler for contributors.
All else being equal, I don't either. But the number 1 priority for a project that wants to successful and attract contributors is to make contributions as easy and frictionless as possible (esp. early contributions) – see e.g. and surrounding thread. Slightly messy revision histories have much less effect on project success, so I don't worry about them as much :-).
Any PR to Hypothesis may trigger a release by including a changelog entry in
The distinction between (1) and (2) is just a matter of timing: if you think the PR with regression is a big enough upgrade to release now instead of later, do so! And of course the whole point of the system is to avoid (3), with exactly that feature-branch workaround for major releases.
There's a proposal for the Trio deprecation period - at least one month between first release with deprecation and removal - over in #500.
I... am tempted to agree unconditionally to the general idea. But I can't just say "yes" because there is no single "quality" metric. Not all patches are Pareto improvements :) Maybe "quality" is a red herring in this case and I should instead say "every PR is supposed to further the project goals". Every release is one step closer to the goal, and the path can be made of tradeoffs (let's just assume that the project governance is able to decide what is an acceptable tradeoff). My point then becomes: a contributor proposing a PR will feel the pressure of making a decision.
The rest of your answer makes it pretty clear that you have already thought through the potential issues. And I shall thank you for listening and writing up on this. Trio may be young but it's an excellent example of a project that fosters real collaboration. Yes, with this kind of environment, support from the community, and good tools (e.g. feature branches, mandatory reviews, open discussion on decisions), then the worries I'm exposing fade.
@Zac-HD thank you for your answer, it's precious input :)
Regarding the whole release automation thing, I think I've convinced myself that the main issues to worry about are:
Brainstorming about how to address these, I came up with two ideas:
Option 1: implement release-after-merge, and figure out how to make it work
For new maintainers: I think the crucial thing is to have some way to feel like if you did make some major error, there's some chance for someone else to catch it -- but without actually requiring "someone else" to sign off. Maybe if there were a way to merge on a timer? Like we could have a bot where if someone with commit access marks a PR as "approved", and after 24 hours no-one has "requested changes", then the bot goes ahead and merges it.
For users: So it's probably a good idea in any case to pin your trio version in your project, and then use a service like pyup.io to send a PR when a new release comes out. And if you're using pyup.io, then you can pick how often it sends these updates, e.g. you can tell it that no matter how often someone releases, it should only nag you once a week. But... as @pquentin pointed out to me, using pyup.io inside companies may be impossible because for private repos it costs money and many companies are dysfunctional about this kind of thing. On the other hand, maybe those companies are already dysfunctional about upgrades and only upgrade once a year or something, I don't know?
Option 2: Do automated releases on a fixed schedule, e.g. once a week. (Unless there's a bug labeled "release blocker", I guess, which should hopefully never happen, but maybe you want the option.)
I can't decide if this idea is super clever or would just be kind of annoying :-). It keeps that key "releases just magically happen" flavor, so you're still forced to figure out how to automate things, people can count on timely updates, etc. But for maintainers, it breaks that feeling that every time you hit merge you're immediately pushing code out to the world, and increases how long it takes bug fixes to get out to users.
There's also some question about what to do with all the other python-trio projects. For the ones that only get small and occasional changes (like
There's also still the concern about vandalism, but it's so vanishingly rare that maybe we should wait for it to actually happen before worrying about it.
(And again, would love to hear feedback from other maintainers and users, because really I'm just guessing about what would help you all :-).)
Re: "merges can be scary" - I'd stick to the simple system! The timer approach is both more complicated, and would make me scared to approve some things at all. For example: when I started reviewing PRs on Hypothesis, I'd often go through and make my comments, and then ping David to check that I'd found all the important issues - and in fact on complex PRs I'll still often approve it with a 'looks good to me', but ping David to do a final review of the scary bits before clicking merge. TLDR: we're not operating on a deadline, and wanting more eyes on something should be encouraged - it's a great way to learn too!
Re: user upgrade issues - users should pin their versions. If your project or company is disfunctional, that's not Trio's problem. (though many of us offer individual consulting services
Re: fixed schedule - this mostly defeats the point, and is much more difficult to implement robustly. IMO if it's worth doing, we should jump directly to releasing every merge.
We were definitely not comfortable with the idea at first - HypothesisWorks/hypothesis#537
Starting by implementing this for the smaller packages sounds great to me though - both as a confidence-builder and to work out what the "Trio standard" is for continuous deployment. Maybe we could eventually add it to the cookiecutter...
I had an idea for this, and started a conversation here: https://forum.bors.tech/t/running-custom-code-before-and-after-the-merge-for-continuous-deployment-or-other-uses/315
The bors-ng maintainer has accepted the proposal to add hooks for autoformatting and continuous deployment. So the next step is for someone to implement it.
The stuff we need to implement is relatively straightforward, but there's an interesting wrinkle: bors-ng is written in Elixir. Elixir is pretty cool! It's a kind of friendly modernized Erlang (runs on the Erlang VM, interoperates with Erlang libraries, but you don't have to put
So... anyone looking for an excuse to learn Elixir, and/or implement something real in Elixir?
@auvipy Basically implementing this spec as a PR for bors-ng: https://bors.tech/rfcs/0322-pre-test-and-pre-merge-hooks.html
The alternative plan would be to write our own implementation of bors-ng's features in Python, perhaps in snekomatic. I can see arguments either way... bors-ng is an existing thing with some sophisticated features and community. But a local bot in a more familiar language is a lot easier for us to hack on if we have issues, plus we could just implement the hooks we want directly instead of having to do this complex thing involving bouncing back and forth between multiple cloud services.