Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a section for Team Standards in the team compass #454

Closed
choldgraf opened this issue Sep 27, 2021 · 15 comments · Fixed by #464
Closed

Add a section for Team Standards in the team compass #454

choldgraf opened this issue Sep 27, 2021 · 15 comments · Fixed by #464

Comments

@choldgraf
Copy link
Member

Proposed change

Following a conversation in #453 - I think it'd be helpful if we defined some "official team standards" for the repositories under the jupyterhub/ org. We've had a number of conversations about things like black, pyupgrade, pre-commit, documentation conventions, etc. Rather than discussing on a case-by-case and informal basis, how about we define a few basic things that we "approve of" as a community-wide practice.

These standards should be relatively lightweight and applicable in most repositories. We could host them in a "Team Standards" section in the team guides part of our documentation. Anything that was listed there is "pre-approved" for implementation in one of the repositories. If something isn't there and a person wants to implement it (especially for a core repository like jupyterhub/, then they should open an issue to discuss.

Alternative options

The main alternative is to keep our current informal process, which isn't working too badly, though I've noticed that it can lead to some confusion about what kinds of PRs will be "easy to merge" when it comes to things like adding new pre-commit hooks.

Who would use this feature?

The main beneficiary is developers in our repositories. I think it could have a few benefits:

  • Define standards explicitly, and a process for adopting new ones, reduces the amount of discussion around these issues at the margins of the JupyterHub org (e.g. as a part of a PR that does other stuff as well)
  • Standards in general will make the developer experience more consistent across our repositories
  • Standards that follow best practices will in general improve aspects of our codebase in general.

(Optional): Suggest a solution

I think that we should try to keep the "community-wide standards" relatively lightweight and restrict it as much as possible to things that can be automated. So something like:

  • Use pre-commit to enforce standards (to start, we can use the hooks in the jupyterhub repo minux pyupgrade as accepted standards)
    • Use black + flake8 + reorder-imports for Python code
    • Use prettier for JS/TS code
    • Generally accept any of the "native" pre-commit hooks like "end-of-file-fixer"
  • Use Sphinx for documentation
  • Use pytest for testing infrastructure

(I might be missing something, this is just top-of-my-head)

And an example of an issue that wishes to extend these standards would be #453 or #379. So the conversations there might play out like "first try it out on a single repository (like how JupyterHub already uses pyupgrade, and then after 2 months decide if we want to adopt it broadly).

@minrk
Copy link
Member

minrk commented Sep 28, 2021

Adopting auto-format PRs is often very disruptive to open PRs and ~never a pressing issue, so I think it's a good idea to be circumspect any time we adopt a new tool, and try to prioritize open PR review whenever such a PR is likely to cause lots of conflicts

I like both having standards and consistency and being informal and letting subprojects organically develop their own practices. Most jupyterhub repos are on a pretty thin maintenance resources right now, and 'just do what we do everywhere else' is often the lightest burden. I like having a documented central place for what we've generally agreed upon makes sense, and the proposed collection sounds good.

I like the 'trial on one repo, then spread if we like it' approach. I think that matches positive experiences we've had so far with e.g. adopting pre-commit itself.

A question, though: is it better to have a 'default' pre-commit-config.yaml that we ship with the docs, or should we have e.g. the file in jupyterhub/jupyterhub be the 'common' starting point? It might be nice to not maintain a duplicate, but not everyone has developed the knowledge to necessarily recognize what parts of jupyterhub/jupyterhub's config are generic and which are peculiar.

One more thing to add here might be publishing releases from GHA.

@manics
Copy link
Member

manics commented Sep 28, 2021

I like @minrk's general approach of "both having standards and consistency and being informal and letting subprojects organically develop their own practices".

I'd prefer pointing to example pre-commit configs in one or more existing repos as we can guarantee they're regularly tested, whereas anything embedded ina doc is liable to go out of date.

I'd still prefer the configs to be suggestions rather than mandatory though. If you're converting an existing large repo you may want to be more lenient with things like a flake8 config due to the amount of manual work invovled in fixing all warnings, whereas if it's a new repo or you've already invested time in applying strict coding standards it'd be a shame to apply a lax config and for the code quality to drift, or to throw away other pre-commit hooks such as mypy.

From a developers point of view I don't think it makes much difference if different repos have different pre-commit configs. They all use a common tool so you just blindly run pre-commit run -a (or use a git hook) and if it passes eveyone's happy. If there's a failure you'll need to investigate why, but that's the same as a CI test failure- we've got a wide variety of testing tools and frameworks.

One more thing to add here might be publishing releases from GHA.

There's are two related issues:

@choldgraf
Copy link
Member Author

choldgraf commented Sep 28, 2021

I agree w/ both of your takes that we don't want a heavy hand here - my goal is to create guidelines and make it clear when we don't need to have lots of discussion before moving forward. Also to provide some general encouragement for these practices by documenting them. (I also think it probably matters more for the "core repositories" than for the ones that are not as heavily used)

I'm thinking of scenarios like:

  • Can I apply black to XXX repository? A: yep it's in our guidelines so feel free to merge and add a pre-commit hook without needing to ask the team.
  • Can I apply xxx new thing to jupyterhub/? A: jupyterhub is used by many and this isn't part of our team standards guidelines, so you should open an issue to discuss first

And scenarios like:

  • I just set up a new repository, what pre-commit hooks could I use again? Ah I know I'll just check the team-compass to get a quick list"

@minrk
Copy link
Member

minrk commented Sep 28, 2021

Yeah, I think that's a good level to do it at, and write down "we're pretty sold on pre-commit, black, correctness-level flake8, etc." and "we're still trying out pyupgrade"

@manics
Copy link
Member

manics commented Sep 28, 2021

Ok! How does this sound:


JupyterHub projects should take advantage of pre-commit to run autoformatters and code linters. This helps to maintain code standards, and improves maintainability.

The following pre-commit hooks can be added to any repo when convenient, though please communicate with anyone who has an open pull request if it will lead to major conflicts:

  • black
  • prettier
  • flake8
  • standard pre-commit hooks ....

Use your judgement as to whether to apply a strict or lenient configuration for linting code. See REPO-A and REPO-B for example configurations.

Other linters, autoformatters and tools can be added to other repos on an ad-hoc basis if it's not too disruptive- this is a good way to try out new tools. In general big changes should not be made to high profile repos without prior discussion.

If the new tool is useful across the organisation please propose it in a new team-compass GitHub issue, outlining the advantages and disadvantages.

@choldgraf
Copy link
Member Author

That looks nice to me, thanks @manics for writing it up :-)

@consideRatio
Copy link
Member

❤️ thanks for processing this! I'm very happy about a standards section in the team compass to suggest additions to.

Prettier: markdown, yaml, json, js, html - it does a lot!

Note that prettier is a quite complicated hook because it can autoformat a lot. I love it for its markdown and YAML formatting, but I have not used it for .js/.ts formatting, and know it can cause issues when used as a HTML formatter at least in jinja .html template files and helm template .yaml files.

Also note one may consider running prettier specifically for YAML and then once agains specifically for JS etc, because perhaps YAML indentation should be 2 spaces, while JS should be 4.

Where to locate .pre-commit-config.yaml examples?

I'm positive to @manics idea about referencing a repo, but I note that there will be a need to reference multiple repo's as no single repo contains all relevant hooks for all code bases. Due to that, I'm overall also supporting the idea of having a team-compass maintained config file that can be copy-pasted, even though it absolutely will add some maintenance burden. I consider it to be a quite even trade off, so I'm fine with either referencing 2-3 repo's for example configs or adding a dedicated example config in team-compass.

Misc things

I also like these misc hooks, of which end-of-file-fixer and requirements-txt-fixer are autoformatters.

  - repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v4.0.1
    hooks:
      - id: end-of-file-fixer
      - id: check-executables-have-shebangs
      - id: requirements-txt-fixer

@manics
Copy link
Member

manics commented Sep 29, 2021

You can have different configs for different file extensions in your prettier config file, it's not necessary to run it multiple times: https://prettier.io/docs/en/configuration.html

However I'd probably stick with the prettier defaults unless there's a strong reason not to, e.g. I find JavaScript tends to be more nested than some other languages so 2 spaces is better 😄. It looks like https://github.com/jupyterlab/extension-cookiecutter-ts uses 2 spaces.

I don't think we need to get too proscriptive now about the configuration here though- we'll probably have to iterate as we modify more repos.

@betatim
Copy link
Member

betatim commented Oct 1, 2021

I like consistent code and file formatting as it makes life easier when reading. I believe (:D) in the two lines about consistency from the "Zen of Python".

I think the time to introduce and enforce consistency/tooling is when a project is young. Once a project gets older, has more users and more developers it becomes harder and more disruptive to do so. This means I value local consistency over global consistency. As an example: if all the code around where I am editing something uses single quotes for strings, use single quotes (even if I think double quotes are the right thing to do). The cost of reformatting all the quotes or having a mix is much higher than the benefit of improved readability. If a repo uses tabs for indentation, use tabs when sending PRs. Even if the rest of the GH organisation uses spaces. And so on.

This means that I think it is useful to have a recommended standard set of things to add to new repositories, especially because it helps with questions like "so how do I setup pre-commit black again?". For projects with more users and developers I think retroactively introducing rules that are strictly enforced (by a tool) is rarely a good use of our small amount of developer time. These changes are often disruptive, tricky and need a lot of attention to detail to get right. In addition mature and popular projects have kinda proven that despite their inconsistent formatting/mix of code styles/etc they are useful to a large group of users. Otherwise they would not have become large and mature. For me this means more consistency/tooling slide towards a "nice to have" for more mature projects.

TL;DR: having a shared, minimal set of tooling to help us quickly setup new projects and increase consistency across repos is a good thing. Applying them retroactively to existing, large projects is a much more difficult task, so I would not support automatically/by default/without discussion adding what we recommend for new repos to these repos.

@choldgraf
Copy link
Member Author

choldgraf commented Oct 2, 2021

Can you go into more detail how it is a lot of work to apply standards via pre-commit to pre existing codebases ? The only thing I can think of is that it would introduce conflicts for pre-existing PRs (and I agree there we should go through extra efforts to get PRs merged or closed before we applied anything like this). Other than that, it'd just be a single PR that:

  1. Adds pre-commit to the dev deps with the packages we agree on in community standards
  2. Runs it on everything

Then we merge quickly with minimal review before other PRs get opened. Personally, that doesn't seem too bad, as long as there isn't some long-standing PR that you really don't want to close, but that you don't expect to be merged soon.

Note: I think that this does become more complex when the "standard libraries" you adopt are themselves more complex or niche. But if it were just "black, flake8, and isort" or something, I feel like it wouldn't be too bad. Pyupgrade feels more niche to me (though as others have noted it often results in a no-op diff)

@betatim
Copy link
Member

betatim commented Oct 4, 2021

It is more work because in a new project it is nearly no work :)

Looking at and deciding about open PRs, installing the extra CI step, maintaining the extra CI step, updating the docs to reflect a mixture of the possibly existing style/contribution guide and the new one that involves the pre-commit tools, deal with vendored code that should/shouldn't be reformatted, fixing things that the tools flag but can't automatically fix, adding exemptions from flake8 rules, "training" the contributors to write so that the tools are happy (I think it is a bug if the bot has to come and fix up stuff for you on every PR, would suggest to me that it is somehow "unnatural" to code the way the bot wants you to code or that the style changes aren't perceived as providing value), figure out why on different contributors computers the tools produce different results (yes I just spent far too much time with a team that chased prettier config settings ...), etc. Each one isn't a big step or a lot of work but they all need doing and they can be a bit tedious. Most of these don't apply if you are starting a new project because a lot of legacy stuff doesn't exist.


Side-thought: I think you could deal with all the existing PRs by producing a short bit of instructions for them on how to run the pre-commit tooling in their PR. I think after you apply the pre-commit tools to "main" and then also in the PRs ,you should not have any more (formatting related) conflicts and only see the actual changes of that PR in the PR's diff. But that is just an idea, I've not tested it. Would be nice to save on effort dealing with existing PRs.

@manics
Copy link
Member

manics commented Oct 4, 2021

Working autoformatters have only really come about in the past few years. Before that code style was enforced by whoever reviewed the PR, so some people are quite lax, others are strict, but since it was the only option everyone just dealt with it.

You're right that adding pre-commit to existing repositories involves work on our part, and you've correctly pointed out several potential problems. However it's hopefully obvious that several of us are willing to do that work, and my suggestion above of Use your judgement as to whether to apply a strict or lenient configuration for linting code is specifically aimed at this, we can choose what level of strictness to enforce. In a commercial context there's a cost-benefit trade off, but that doesn't really apply here since everyone is pretty much free to choose what they want to work on, so I don't see this additional work as a problem.

Once it's added to CI there should be no or minimal maintenance. The pre-commit config file pins exact versions of tools, so they should behave identically across all systems, if they don't it's a bug. We can choose to bump tool versions or not.

That leaves the problem of ensuring contributors follow the style and lint rules. With an autoformatter it's easy: pre-commit run -a. For linters it can be annoying, but in my experience they generally point to code that can be improved. In any case having CI moan about your code is no worse than someone like me commenting on your PR 😂. The difference is pre-commit has a better chance of automatically fixing it for you, and also it'll tell you within a few minutes instead of a few days later when someone has time to review a PR.

It's also worth bearing in mind that many developers use IDEs that automatically format code, I've seen a few JupyterHub PRs where a whole file has been autoformatted. As you expect in those cases we've had to ask for just the relevant changes to be pulled out, but I think it's fair to say autoformatters are pretty standard these days.

@willingc
Copy link
Contributor

willingc commented Oct 9, 2021

Some additional wording added to @manics (thanks!) suggested text:

JupyterHub projects should take advantage of pre-commit and tools to help maintain consistent formatting within a repo to improve overall code quality, review efficiency, and readability of code. Sensible use of pre-commit to run formatting tools and code linters can add consistency and improve maintainability.

Preferred tools and pre-commit hooks

The team has found the following tools and their pre-commit hooks to be useful. The following pre-commit hooks can be added to any repo when convenient, though please communicate with anyone who has an open pull request if it will lead to major conflicts:

  • black
  • prettier
  • flake8
  • standard pre-commit hooks, such as isort, etc. (To be determined and listed here)

Applying to repos

When creating a new repo, please use any pre-commit hooks and tools that are useful.

When working with an existing repo, please balance the benefits of adding a tool or pre-commit hook with considerations such as 1) the amount of code churn, 2) how it will improve code maintainability, 3) the time it may add to CI runs.

Configuration of a tool

Use your judgment as to whether to apply a strict or lenient configuration for linting code. See REPO-A and REPO-B for example configurations.

Other linters, autoformatters and tools can be added to other repos on an ad-hoc basis if it's not too disruptive- this is a good way to try out new tools. In general, big changes should not be made to high profile repos without prior discussion.

Proposing organization-wide use of a tool

If the new tool is useful across the organisation please propose it in a new team-compass GitHub issue, outlining the advantages and disadvantages.

@choldgraf
Copy link
Member Author

I like @willingc 's version above. I'd be +1 on adding that language in and iterating from there.

Also thanks to @betatim for keeping in mind for all of our sake that lots of little annoyances often do contribute to a lot of total annoyance 🙂

@minrk
Copy link
Member

minrk commented Oct 13, 2021

That looks great, thanks @willingc!

I think you could deal with all the existing PRs by producing a short bit of instructions for them on how to run the pre-commit tooling in their PR. I think after you apply the pre-commit tools to "main" and then also in the PRs ,you should not have any more (formatting related) conflicts and only see the actual changes of that PR in the PR's diff.

I think it's a little more complicated than that, because it matters how/when you apply those changes. I think you'll still get conflicts if you try to rebase a PR, even if it's a single commit that's been autoformatted. However, applying autoformatting and then merging from main ought to have fewer issues. Probably not zero, though.

The way to avoid conflicts, I think, is to 'rebase' manually, without the actual rebase or merge commands, but this is a pretty git-expert-level sequence of operations:

  1. identify the commits on main immediately before and after the autoformatting change (git log --merges)
  2. rebase the PR branch on main immediately before the autoformatting
  3. apply autoformatting to the PR branch (I think this is actually optional, since formatting will run again)
  4. reset the branch to main after autoformatting
  5. checkout files from the version in the PR branch (git checkout pr-branch-ref -- binderhub/)
  6. make a new commit with the 'same' changes (will run pre-commit again, assuming hooks are installed)

This will always reduce a PR to just one new commit, but it's impossible for there to be merge conflicts with this process, since git isn't making any decisions about how to reapply changes. I feel safe doing this kind of history rewriting because I can always dig through the reflog to start over, but I'm not sure how comfortable I am recommending it to others.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants