Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

git: format only files that changed #311

Open
zimbatm opened this issue Jun 3, 2024 · 8 comments
Open

git: format only files that changed #311

zimbatm opened this issue Jun 3, 2024 · 8 comments
Assignees
Labels
enhancement New feature or request

Comments

@zimbatm
Copy link
Member

zimbatm commented Jun 3, 2024

Is your feature request related to a problem? Please describe.

When introducing a "dirty" project to treefmt, it can create a lot of code churn as files that previously weren't formatted are now getting changed. You often end up with a fat commit that formats the whole repo and breaks git blame.

Describe the solution you'd like

When the file backend is git, only format the untrack and changed files by default.

Add a --all-files flag to override that default.

Describe alternatives you've considered

Additional context

@zimbatm zimbatm added the enhancement New feature or request label Jun 3, 2024
@brianmcgee brianmcgee self-assigned this Jul 9, 2024
brianmcgee added a commit that referenced this issue Jul 9, 2024
If the modified time has not changed when compared with the git index we do not emit the file for processing.

This allows users to introduce treefmt to a repository without suffering an initial large formatting commit.

Instead, files can be formatted incrementally as they are changed.

Closes #311

Signed-off-by: Brian McGee <brian@bmcgee.ie>
brianmcgee added a commit that referenced this issue Jul 23, 2024
If the modified time has not changed when compared with the git index we do not emit the file for processing.

This allows users to introduce treefmt to a repository without suffering an initial large formatting commit.

Instead, files can be formatted incrementally as they are changed.

Closes #311

Signed-off-by: Brian McGee <brian@bmcgee.ie>
@brianmcgee
Copy link
Member

Be sure to take into account the ideas from #78

@jfly
Copy link
Collaborator

jfly commented Oct 12, 2024

Enabling incremental adoption is a noble goal! I have a couple of thoughts on the scheme proposed here:

  1. IIUC, it can't be enforced in CI. That feels unfortunate, as it's relying upon every engineer in a large project to just "do the right thing".
  2. If I am the unfortunate soul who is the first one to touch some ancient, large file in a long time, this scheme will cause treefmt to reformat the whole file.

I don't actually know how to fix either of these problems. Thoughts:

  1. To be able to enforce this in CI, we'd need to be able to invoke treefmt in some way where we can tell it about the diff relevant to the PR. This feels tricky for people to set up, and would be tightly coupled with the code review/CI system.
    • Idea: another version of this that's more deterministic/declarative. Basically, try to solve this with some version controlled state. Invent some sort of "grandfathered files" list that treefmt creates when you first set it up in an old git repo, and remove things from the list as people touch files.
  2. This reminds me of when some well-meaning souls at Arista tried to start enforcing that we don't have any trailing whitespace. They went to the effort to only fix the lines that you touched in your change, so you wouldn't trigger a large, unrelated diff.
    • This is finicky and annoying to build, but at least possible to do with trailing whitespace. It's not clear to me that it would be possible to do with the general ast-preserving sort of munging that formatters do though (where 1 line can turn into 3, or vice versa).
    • I haven't thought about how to make this play nicely with 1).

@jfly
Copy link
Collaborator

jfly commented Oct 12, 2024

Sorry for the possibly-negative sounding wall of text. I'm skeptical that the scheme proposed here would actually be useful for the large, active projects. That may not be the goal of this feature, though, in which case, feel free to ignore my comment!

@brianmcgee
Copy link
Member

I am okay with re-formatting a whole file, and I don't think we should ever get into trying to format files partially. However, the issue you highlighted about CI is complicated, which I had yet to consider.

There's no suitable mechanism I can think of that would allow CI to verify that formatting was applied correctly on a given branch unless we make treefmt somehow aware of the last N commits from which to derive the list of files that should be traversed, which is cursed in a few ways.

@zimbatm thoughts?

@zimbatm
Copy link
Member Author

zimbatm commented Oct 16, 2024

Whether you should do a treewide format-everything commit or this kind of slow burn is a matter of preference. In nixpkgs, people are already reformatting things they touch to nixfmt-rfc-style.

Regarding unformatted files slipping by, this is the same currently. You can create an intermediate commit that isn't formatted, and GitHub Actions will not be the wiser. But that's an issue with GHA. Gerrit can run CI on every commit.

The last thing that remains is getting the list of files that changed. For this, the CI should pass a --base <branch|commit> so treefmt can figure out the aggregate file list of the N last commits.

@jfly
Copy link
Collaborator

jfly commented Oct 16, 2024

For this, the CI should pass a --base <branch|commit> so treefmt can figure out the aggregate file list of the N last commits

That makes sense. We should name/document in so that it's clear it's only needed for when treefmt is being used in "transitional mode". On projects that are using treefmt to fully enforce formatting, there's no need to use this.

But that's an issue with GHA. Gerrit can run CI on every commit.

Neat! Is that an additional "feature" for us to build? Specifically: enforce formatting on the most recent commit? Or do you see this as a specific example of how to use --base (something like --base=HEAD~)?


I still suspect that having an explicit, version controlled "burndown list" of files that are "grandfathered" into their current formatting style would be preferable. My current thinking:

  • Pro: We'd have a clear indicator of when treefmt is being used in "transitional mode" or not. If the list of grandfathered files is empty, we can give clear error messages saying stuff like "It looks like you don't have any grandfathered files. Did you intend to add some? If not, you should not use --base as it's only for transitional mode". EDIT: while writing this up I realized that this scheme would save us from having to introduce the --base flag. But there still might be scenarios where there's some useful instruction we can give when the "burndown list" is empty.
  • Pro: Enforcing formatting would still do the right thing even if you extracted a source tarball and used --walk=filesystem.
  • Pro: we wouldn't need to add a new --base flag that people need to configure in CI. Instead, the grandfathered list would include the commit (EDIT: commit id might not be ideal, as they don't survive rebases. perhaps a magic commit message?) where we introduced treefmt to the project. In CI, we can now just do a diff between the current commit and that commit that introduced treefmt. Any files in that diff must not be in the grandfathered list.
    • Related thought: does treefmt have an existing mechanism for ignoring files? I think that would be useful for the "yes, I changed this file, but really, I need treefmt to ignore it". I think that's often a key piece of slowly adopting autoformatting in large repos. Old projects tend to have a few comically large code hotspot files (think pkgs/top-level/all-packages.nix in nixpkgs. it's either going to go away, or someone is going to format it, either way, it would be nice for treefmt to ignore it until its dealt with).
  • To ensure the right thing happens locally, I'd propose that if treefmt is invoked explicitly on a file that's in the "burndown list", we remove it from the burndown list and then format it. This would do the right thing in the "my editor automatically invokes treefmt/nix fmt for me" scenario.

@jfly
Copy link
Collaborator

jfly commented Oct 20, 2024

I am okay with re-formatting a whole file, and I don't think we should ever get into trying to format files partially.

For the record, here's evidence that large projects adopting auto-formatting are interested in partial formatting: NixOS/nixfmt#223.

@brianmcgee
Copy link
Member

brianmcgee commented Oct 20, 2024

🤔

If a walker is capable of identifying line ranges that have changed within a file, we could allow them to set that on walk.File. The question then becomes how do we pass that to a formatter?

We could invoke the formatter once per line range, assuming a formatter API that looks something like --line-start 2 --line-end 12. We could provide the start and end as well known env variables, allowing customisation in the options config for a formatter.

We would have to ensure we do this in a sequential manner per file, one range at a time. That would be an adjustment to the batch key / scheduler, but I don't think a difficult one.

This is just spit balling. Are there any examples of formatters which allow range formatting, it would be good to get a sense of what their apis look like.

Do they tend to accept multiple ranges in one invocation or is it a single range per invocation?

Multiple ranges per invocation means less changes to the scheduling and more work on how we pass parameters to the formatter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants