Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

treesitter: syntax-aware structured diff #15064

Open
krishnakumarg1984 opened this issue Jul 12, 2021 · 11 comments
Open

treesitter: syntax-aware structured diff #15064

krishnakumarg1984 opened this issue Jul 12, 2021 · 11 comments
Labels
Milestone

Comments

@krishnakumarg1984
Copy link

I love neovim and have been using it since 2017 (just prior to version 0.2). The features introduced in the latest 0.5 release (lua, LSP, treesitter etc.) make it even more awesome.

I'd like to request the devs to consider a structure-based diff implementation (to complete the existing suite of line-based diffing algorithms already present i.e. Myers, patience, longest subsequence distance, histogram etc.). I recently came across difftastic, a structure-based diffing tool (even though this is currently a rough, unfinished product). To my untrained eye, structure-based diffing looks like a nice orthogonal idea that is worthwhile to implement. The creator of difftastic further points out some interesting structure-based diff tools in the project's wiki page.

Here are some further related readings/tools (originally linked in the aforementioned wiki):

  1. https://blog.trailofbits.com/2020/08/28/graphtage/
  2. https://fosdem.org/2021/schedule/event/sexpressiondiff/
  3. https://fazzone.github.io/autochrome.html
  4. https://github.com/yinwang0/psydiff
  5. https://github.com/yinwang0/ydiff
  6. https://github.com/VictorCMiraldo/hdiff/tree/v0.0.5
  7. https://github.com/GumTreeDiff/gumtree

I gather that structure-based diffing might be a reasonably hot topic in academia, with a few good PhD theses and papers being published e.g. https://victorcmiraldo.github.io/data/MiraldoPhD.pdf.

I am just a user of neovim, often using it for writing notes, papers and application-level code, and do not have the background in data structures required to implement a sophisticated idea like this. I hope that the community here might be able to have a healthy debate on the pros and cons of considering a structure-based diff implementation within neovim.

@krishnakumarg1984 krishnakumarg1984 added the enhancement feature request label Jul 12, 2021
@bfredl
Copy link
Member

bfredl commented Jul 12, 2021

https://github.com/afnanenayet/diffsitter this also looks interesting as it is based on tree-sitter.

@clason
Copy link
Member

clason commented Jul 12, 2021

Yes, diffsitter looks like the way to go here. It shouldn't be too hard to port that to an nvim-treesitter module (starting from https://github.com/nvim-treesitter/module-template), which would be the right place for this at the moment.

@krishnakumarg1984
Copy link
Author

One thing that is not clear from diffsitter's README.md page is whether it supports a nice-looking split view? All1 the examples in that page return classical diff markers.

@clason
Copy link
Member

clason commented Jul 12, 2021

We wouldn't use diffsitter directly but use the same algorithm to generate a diff using tree-sitter. The display would still use the standard vim functions (although they may have to be adapted to account for such different files being marked identical).

@bfredl bfredl changed the title Consider implementing a syntax-aware structured diff implementation tree-sitter: implement a syntax-aware structured diff Jul 12, 2021
@matu3ba
Copy link

matu3ba commented Jul 13, 2021

It is not sufficient to use the more optimal graph-based implementation, because "Performance. Difftastic scales poorly on files with a large number of changes. This might be solved by A* search."

A* is quite good explained here.
So either difftastic includes A* (or another performant algorithm on too many individual changes) or user experience will suffer on code that others write (and tree-sitter can not recommend when to commit changes).

Unfortunately difftastic does not provide examples how much worse/better their approach works vs the A* algorithm.

Since semantic changes are necessary for optimal incremental compilation (and incremental proof invariant collection etc), the best approach would be to use the editor + lsp for collecting semantic changes between code. However that is even more complex and not worth it for non-compiled languages. It also requires compiler + lsp + editor support (and treesitter for performance to detect changes?).

@justinmk justinmk changed the title tree-sitter: implement a syntax-aware structured diff treesitter: syntax-aware structured diff Sep 24, 2021
@justinmk justinmk added this to the unplanned milestone Sep 24, 2021
@krishnakumarg1984
Copy link
Author

So, this is in the unplanned milestone currently. Any interest from other devs?

@lewis6991
Copy link
Member

No interest from me, and tbh this can easily be implemented via a plugin. There's no point adding this directly in core when the premise/utility of the feature is unproven.

Let's demonstrate the concept via a plugin first, then we can assess whether it's worth baking this in.

@theHamsta
Copy link
Member

Mentioning https://github.com/Wilfred/difftastic for another approach to implement this as CLI tool.

@krishnakumarg1984
Copy link
Author

Well my OP mentions difftastic.

@knubie
Copy link

knubie commented Jun 8, 2022

I think instead of implementing diffing within neovim, it would be better to update diffexpr to somehow handle columns within lines in addition to whole lines. That way users can bring their own diff tools, and neovim can keep up with advances in diffing without having to make any changes to the source code.

@hacker-DOM
Copy link

Fwiw, difftastic is also tree-sitter-based. It looks like the most developed and advanced of these structural differs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

9 participants