-
-
Notifications
You must be signed in to change notification settings - Fork 835
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Would you consider upstreaming the folds.scm and indents.scm files to grammar repos? #3944
Comments
Hi! Nice to hear from you, and thanks for the kind words! I can't speak for the whole team (and invite @theHamsta @vigoux @stsewd to weigh in), but I personally would prefer if queries were owned by the parser devs -- adapting queries to breaking parser changes is a big chunk of work. Some parsers (close to Neovim, like Viml and Vimdoc) already do this, although we sync these updates manually. There are two reasons we aggregate (and probably will keep aggregating) the queries here:
Of course, if tree-sitter would officially recommend this and we could rely on these (and more?) queries being part of the grammar repos, we'd look at automating this. In any case, I'd personally welcome these queries being useful and used more widely, and I'd be fine with grammar maintainer upstreaming them (Apache 2.0 license allows this) -- maybe tag the listed maintainer in the PR for complete transparency. What I'm not going to do is open PRs at the 100+ repos myself, though ;) |
Agree with what @clason said. Maybe with folds and indents the queries could be re-used across editors, but I guess that would depend on the implementation of those features, and if there is like a "core" implementation of those features that we can follow, I think that would be great and would help to standardize/re-use those queries. |
@patrickt And while I have you: It would be great if tree-sitter had an official binary parser and matching(!) queries distribution mechanism -- say, some sort of Github Action grammar maintainers can just add to their workflows ;) (Seriously: One of the biggest pain points for us is the fact that queries are only compatible with a specific parser revision, with no way of telling whether they match before trying to run them and getting a bunch of errors. If parsers had a version field that could be inspected before loading a query -- which could have a corresponding |
perhaps we can add an entry in |
We're very open to upstreaming the query files whenever a repo wants to maintain them. We could just copy them during our parser installation process to provide them to end users or rely on some official distribution mechanism for the parsers. We already reference
@patrickt what were your plans for the next steps? When the official |
Hey, everyone. Happy new year! Sorry for my radio silence, and thank you all for chiming in on this. @clason mentioned that On the other hand, it does strike me that Oh, and @clason, your point about parser compatibility is very much on the money. I’m cc’ing @maxbrunsfeld, the tree-sitter author, to see if he has any thoughts on the issue. It seems like this is an issue that more and more people are going to encounter. @theHamsta: My next steps are to talk with the internal GitHub teams that would have some sort of material interest (there are several aside from us on Semantic Code) in using tree-sitter to extract these data. Tree-sitter parsers and queries scale to GH traffic extremely well, so I think there’s a compelling business case to be made there, but I don’t know what other technologies are under consideration, or if they’ve got working prototypes already. It’s possible this all might not pan out, but it’s worth a try, right? 😄 I’m intrigued by the way Helix does referencing of parent languages… will definitely dig into that, too. (As we offer more features for JS/TS, effective reuse pays dividends, as I’m sure you all know well.) I think I’m going to close this issue out now so that it doesn’t clog up your issues board, but I’ll leave further comments on this issue if and when I have an update. Thank you all very much for being so helpful! |
@patrickt We have been thinking more about centralizing queries, in particular highlight queries. While I still don't believe it's possible to completely share queries verbatim, I'm very interested in minimizing the divergence where possible and collaborating on a single source of truth that can be used by multiple editors with only minimal customization (like adding our As Github is a major player in this field (albeit for a rather limited set of languages, compared with the ~170 we maintain here), I'd be interested in hearing your thoughts and plans on this. I'm also tagging @the-mikedavis for Helix, which has a similarly broad language support strategy (but a somewhat different capture naming scheme, which I would love to synchronize with). (Other editors using tree-sitter I'm aware of are Zed -- which is closed-source, and I couldn't find any info on query and captures for -- and Emacs -- which only just added tree-sitter support in core and doesn't yet bundle queries, while the older https://github.com/emacs-tree-sitter/tree-sitter-langs/tree/master/queries is comparatively limited and has low activity.) |
Being able to share queries would be really nice! I'm not sure how to accomplish it though because they queries tend to be implementation-specific. Indentation is a good example: the Neovim and Helix systems work differently so the queries end up looking different as well. Some features that are straightforward like folding would probably be easy to share though (although folding is not implemented in Helix so I can't say this confidently). There's a big hurdle to sharing queries though: Neovim's queries have reversed precedence compared to Helix, the tree-sitter CLI and I think emacs as well as GitHub's syntax highlighting tool (based on the order of queries here). The last stanza in a nvim-treesitter query file will override any stanzas that come before it and match the same pattern, but in Helix query files, the first stanza that matches overrides any later stanzas that also match. For example, these two stanzas overlap in the Elixir highlights: nvim-treesitter/queries/elixir/highlights.scm Lines 25 to 29 in e9fb90d
In nvim-treesitter, the identifier gets the It would be nice to align on the captures we use for syntax highlighting as well. That would be tricky to do in practice though because any scopes that change would break themes - if Helix switched from |
Well, first step is talking about it seriously and taking stock of what would have to be done (and then do it, if it turns out to be feasible) ;) One thing to keep in mind is that this repo was meant as a prototype for implementations that eventually end up in Neovim core. We are now in the process of (incrementally) moving things that work well into core -- which is a chance for changing things, so this is now a good time to have this discussion. (Tree-sitter integration in Neovim is still marked as experimental, so we are free to make breaking changes for good reasons -- and sharing queries would definitely be one!)
I haven't much used it myself, but my impression is that the system here (not in Neovim core!) is not working too well, at least for languages like Python. If Helix's works better, we should take a look at it and see if we can align when we implement it in core. If nothing else, indents and folds should be writeable in an editor-agnostic way.
That is a good point. If Neovim is the odd one out, that's a strong incentive to change things. I suspect that the reason we did it this way is for easier user customization: we concatenate all queries on runtime path so users can override individual queries in their own config. Does Helix support extending/overriding queries like that? (We use EDIT Turns out that this is incorrect; we are just missing the sort of "early bail" logic from tree-sitter CLI (and Helix, which uses the same code) so we always highlight all matches and rely on draw order to make more specific matches have precedence (see neovim/neovim#22495). Another issue I believe is the way we do injections (needing the EDIT Upstream compatible implementation will be added in neovim/neovim#22518
We already did this once; we have no problem doing it again ;) (But that should be the last time...) I personally don't mind switching (mostly) to Helix' scheme, which I believe aligns (more) closely with Atom/TextMate naming? If you were ready to adapt some captures to ours (for example, I prefer the way we do LaTeX One issue is that we make heavy use of Atom-style fallbacks ( At the very least we could discuss naming and try to be as consistent as possible -- the more names are shared, the less work in adapting queries between us or from upstream! (If you prefer, we could discuss this and other things on Matrix; I can open a chat room and maybe try to find some people from Emacs to loop in, too?) |
(reopening this for more visibility and to get back to it more easily) |
@the-mikedavis another point of divergence is predicates and directives. It's clear that other editors won't support Or is adding new predicates so much harder for you than for us (where it's fairly trivial)? |
Ah excellent! We try to minimize breaking changes in general in Helix but I think we can afford to make some changes on our end for the sake of compatibility. We always end up with a handful of breaking changes per release anyways. Better compatibility of queries, themes and approaches to tree-sitter features seems to me like a win for everybody 🙂
I think that the Python indentations have some rough edges in Helix as well although I don't typically write Python myself. There are also some other improvements we want to make to our indentation like outdenting automatically. It seems like potentially a lot of work but maybe we can collaborate on an indentation approach that we can share? (\cc @Triton171 who authored most of the indentation code for Helix)
We have support for
Yep we try to adhere to textmate when possible. I'm definitely open to changing up some scopes so our queries and themes can be closer, especially some of the lesser-used ones like We also use the fallbacks for captures extensively. I think our systems work the same way:
Matrix would be perfect - we use Matrix for a lot of Helix discussion as well. If you make a room, would you mind inviting me (
Yeah the regular expression ones seem like a hard problem to solve very consistently between editors. Our Coming up with a list we can both support sounds do-able 👍. There are some from here that we already want to support like One other compatibility point - does neovim support combined injections (the |
From what I understand the implementation in neovim is inherently different than the tree sitter rust crate or other helix so even if we have compatible scopes it's not guaranteed to work well across editors. In particular ordering queries from least to most specific vs the other way around Edit: Sorry, I see @the-mikedavis already clarified most of this! In general I think it's OK for editors to use different scopes since features work differently |
This only applies to highlighting. The TS implementation will check queries in the same order we do, but will stop adding highlights if it has already added a highlight for a specific range. Neovim on the other hand will just apply all the matches and will stack the highlights. This allows one match to just set underline whilst another will set the background, whilst another might set the foreground. We've talked about adjusting our highlighter to add the same range check, and use a custom directive to support highlight stacking. |
Yes this is supported. I've also raised a PR to support the same injections formats as what you use. neovim/neovim#22518 |
@archseer yes, just to be clear: the whole purpose of this discussion is to find out which differences are necessary and which are not; the goal is then to remove the latter (and document the former) so that more (most) of the queries can be shared -- ideally, we can have an upstream source of truth we both (and Emacs) can pull from with minor modifications where necessary. Just like we do with Vim, we'd ask contributors to make "common" improvements upstream instead in our repo. We are also happy to make breaking changes in our code to facilitate that. To summarize: the big differences we have identified so far are
|
I think this would be a good idea: we should be able to share the logic (and capture names, of course) even if the implementation is different. A fresh implementation in core would be the ideal occasion.
Same in Neovim.
Yes, I think that will be the biggest design issue -- user extensibility is baked into the DNA of (Neo)vim, so this is a primary concern for us.
This is exactly how it (now) behaves in Neovim as well; we just have an additional
Yep, that's exactly how it works (except that we have an additional implicit fallback
Yeah,
At the risk of beating a dead horse: It will very soon, thanks to Lewis' PR that he linked. This will become our default, so injection queries should be fully compatible going forward.
Done! |
Since I've written a large part of Helix's current indentation system, I'd be very interested in collaborating in order to create a specification & reference implementation for tree-sitter indentation. It's definitely not easy to find a set of rules that work for all languages (especially since it'll be harder to change later if the system is used in multiple editors) but sharing the queries would eliminate a lot of duplicate work. If someone from nvim-treesitter wants to work on this as well, feel free to ping me or message me on Matrix ( |
What do you think of using EcmaScript regex for matches? that's something everybody can agree on, as nvim and emacs could vendorize libregexp from |
Sorry, no. We have Lua for performance and Vim for maximal flexibility; we are not going to add a third regex(-like) engine in core. The real solution is not to use |
How does https://github.com/bellard/quickjs/blob/master/libregexp.c compare to vim regex? Would it be possible to take the union? Surely the basics |
Yes, I think the vast majority of captures will only require basic "standard" regex features, which should be documented. (A common design document for queries -- including documentation of the remaining necessary divergences as a Rosetta Stone -- is precisely one of the goals of this discussion.) |
|
@jcs090218 (https://github.com/emacs-tree-sitter/ts-fold) and @meain (https://github.com/meain/evil-textobj-tree-sitter/) are working on emacs tree-sitter integration, would you be interested in aligning on #3944 (comment) ? Are there others in the emacs space we should contact? |
Just hit me up on Matrix if you want to be part of the (ongoing) discussion! |
I've been passively following this thread. It would be great to have this upstreamed(though distribution might still have to happen within each repo, unless we can sort out something better). Let me ping on Matrix (@meain:matrix.org) . Tagging a few other folks from Emacs working on tree-sitter: @casouri @ubolonton |
👋 I wrote Tree-sitter and I now work on the Zed code editor, which is not not yet open-source, but will be (probably by the end of this year). I'd be interested in an effort to standardize some queries, though I think it makes sense to be somewhat conservative about what gets included in the main Tree-sitter grammar repositories. All of these editors' approaches to syntax highlighting, code folding, and auto-indent are still evolving, and so it would be a shame to codify a flawed or limited system into the main language repos. HighlightingSyntax highlighting seems like the lowest-hanging fruit for standardization. Having now implemented Tree-sitter-based syntax highlighting twice (once in the
I'm not sure how to move forward on changing any of these things, but I'd be curious to hear other people's opinions on these issues. AFAIK, GitHub is the biggest stakeholder in the current system. InjectionFor what it's worth, Zed's language injection scheme uses the same query format as the upstream Folding, Auto-indentThese systems are still evolving in Zed, and from what I can tell, they're still evolving in the other editors as well. Personally, I'd be reluctant to try to define a standardized format for indents and folds at this point, especially when even syntax highlighting queries are still not shared between different applications. Possibly I'm being too conservative though 🤷 . |
Thanks for chiming in! It's always great to hear that Neovim Did It Right (first try \o/) :) More seriously, it's very interesting to hear about Zed's approach, which is a) the closest thing to "author intent" and b) a closed book to the general community right now. I agree that we should not ossify things for the sake of it right now. The point is more to share ideas and experiences so people don't have to reinvent the wheel. It's also clear that significant effort has gone into creating queries for new parsers, and the more ecosystems profit from that effort, the better for everyone involved -- like LSP, economy of scale is one of the major (albeit not only) value propositions of using tree-sitter in an editor. So the goal is less the standardization and more the communication (and documentation across editor silos). (Also, reopening -- again -- since the good comments just keep coming.) |
@maxbrunsfeld Another related issue is parser versioning. As far as I understand it, the parsers under the So my question to you (and the rest of the Tree-sitter Team): Would you consider decoupling parser releases -- and making more frequent releases -- for the grammars you maintain? Ideally, using semantic versioning along the lines of
This could be automated by requiring conventional commits ( |
Should there just be a page in tree-sitter documentation that describes expected attributes and behavior? I don't see anything in the upstream tree-sitter but I may be able to start one (@Triton171 maybe you had something here, from #3944 (comment)). Fold seems pretty straightforward with Indenting is trickier. Helix's indent queries are much better documented, based on the comments above it seems like maybe this could be a good starting point? A universal text syntax would also be great so it's easy to test implementations against one another (similar to what exists highlighting). Roughly something like:
Other random thoughts:
|
No, mixing locals and highlights is a mistake; leave semantic highlighting to language servers (which can actually do it properly). Locals are used for selection and movement in Neovim, and this will not change.
They need to be manually set, and from a brief check, many parsers don't bother with that and just have the default (incorrect) file. If we can't 100% rely on these, they're not really useful. That would be nice if it worked, though!
No, that is wrong. The maintainer here is the person responsible for the queries and the Neovim integration, which often is not the same person as the grammar maintainer (who may care little for Neovim).
I'm not sure that's feasible; how do you tell which parser is "active"? I'm already tracking more than 400 parsers, and I'm sure I'm missing quite a few. |
Hi there 👋🏻 I work on tree-sitter’s core, as well as various language grammars, as part of my work on GitHub’s Semantic Code team. I was really blown away by where the treesitter-nvim community has gone with tree-sitter as a technology: it hadn’t occurred to me to use tree-sitter queries for indentation and code folding, but it makes just so much sense. Well done.
I write because I and the team would love to bring the indentation and folding features into Tree-sitter as a first class citizen. We at GitHub are hoping to use queries like these to power more advanced features in our new code view, and I, though I use the other editor that shall not be named here, would love to bring what you’ve done to my editor ecosystem. I could very plausibly see a world where these powered
tree-sitter indent
andtree-sitter fold
commands to the CLI.My question to the team is this: is such an undertaking possible? I would love to see the folds.scm and indents.scm files living in their official grammar repositories, but looking at your project hierarchy, it’s clear that you already have a lot of processes in place. Is this an insurmountable amount of work? (We on the tree-sitter side of things would of course help however we can.) Would it be too much to ask of you as a maintenance team, or would it slow your development too much? (I see in the README that languages are divvied up across contributors; we would of course hand out commit bits to the relevant repositories.) What would you think about trying it with, say, Ruby and seeing to what degree that would affect your workflow?
Again, thanks for using tree-sitter so excitingly, and I hope we can put your work in even more people’s hands!
The text was updated successfully, but these errors were encountered: