Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

treesitter distribution strategy (tree-sitter) #22313

Open
justinmk opened this issue Feb 18, 2023 · 4 comments
Open

treesitter distribution strategy (tree-sitter) #22313

justinmk opened this issue Feb 18, 2023 · 4 comments
Labels
distribution packaging and distributing Nvim to users enhancement feature request needs:discussion For PRs that propose significant changes to some part of the architecture or API treesitter
Milestone

Comments

@justinmk
Copy link
Member

justinmk commented Feb 18, 2023

Problem

Current situation: Nvim ships a few parsers (C, Lua, vimdoc, vimscript) in its runtime. If user wants more parsers they must build the parser and put it on their 'runtimepath' , or use a project like https://github.com/nvim-treesitter which tries to automatically build parsers on the user's machine.

Nvim can't ship hundreds of parsers in its runtime because

  1. the total file size approaches gigabytes (GB)
  2. undue burden on package maintainers
  3. updating parsers should not require updating Nvim itself?

Ideal case

Ideally, tree-sitter upstream would solve some problems for all tree-sitter consumers by:

  • provide makefiles
  • introspectible parser version that is set through tree-sitter generate
  • parser authors maintaining their own queries and bumping said version every time a parser update requires changes to them

Potential Solutions

Do nothing

Distribute queries

The main problem is lack of query and parser versioning.

  • Ship queries, but not parsers. Queries are relatively tiny text files.
    • Problem: parsers and queries are tightly coupled, so a new parser version could break an existing query.
      • tree-sitter upstream does not provide tools that could help us (like parser introspection, query introspection (version or metadata))
  • Enforce versioned parser names
    • Problem: how?
    • Right now, we only have the commit hash => can't reason about version range.

Distribute parsers (.so/.dll)

  • Develop CI that builds .so/.dll files for every OS. Then Nvim can fetch those on-demand.
    • Benefit: useful for all text editors, not just Nvim.
    • Problem: Where to put (200 * 3) build artifacts? Could use Github packages like homebrew?
    • Problem: similar maintenance burden as nvim-treesitter.
      • Mitigation: strictly refuse to support parsers that don't easily build.
        • Users to nudge the parser maintainer to "fix" their build steps.
  • Develop CI that builds "universal" libs via cosmopolitan c
    • Problem: "fat" libraries are costly: TS .so files are 90%+ data and 10% actual code (just the scanner part). Converting that 10% to WASM is less invasive.
  • Integrate nvim-treesitter's logic for "build the parser locally and put it into rtp"
    • Benefit: gives us a "happy path" answer for users to avoid needing nvim-treesitter.
    • Problem: Nvim becomes a package manager, which is a slippery slope.
      • Mitigation: strictly refuse to support anything but the happy path.
        • Don't try to find compilers in weird places.
        • Don't support configuration.
    • Problem: maintenance burden: many parsers have quirky build steps! May require C++ compiler.
      • Mitigation: strictly refuse to support parsers that don't easily build.
        • Users to nudge the parser maintainer to "fix" their build steps.
      • Alternative: distribute zig binary as a compiler and use that as the toolchain to build on the user's machine.
  • Outsource the problem to installers like mason.
  • Wait for upstream to support WASM parsers
@justinmk justinmk added enhancement feature request distribution packaging and distributing Nvim to users treesitter labels Feb 18, 2023
@justinmk justinmk added this to the backlog milestone Feb 18, 2023
@justinmk justinmk added the needs:discussion For PRs that propose significant changes to some part of the architecture or API label Feb 18, 2023
@theHamsta
Copy link
Member

theHamsta commented Feb 18, 2023

Nvim can't ship hundreds of parsers in its runtime because
the total file size approaches gigabytes (GB)

That's not entirely true. Shipping all parsers of nvim-treesitter nvim-treesitter/nvim-treesitter#3688 ~150 requires depending on OS from 9 to 15MB. Helix ships with all of its parsers in its release https://github.com/helix-editor/helix/releases The linux release is 10.6MB big. But parsers will inflate to >100MiB when decompressed (binaries have a very repetitive structure). I had the idea for a long time to have parsers compressed on disk and only decompress when needed. So that nvim could transparently load compressed parsers (Since dlopen from memory is a big complicated, you'd probably create a tempfile for that). This is without any judgement whether this is good idea just a experiment of mine.

Distributing parsers (.so/.dll)

Making parsers distributable is since a long time on tree-sitters 1.0 list tree-sitter/tree-sitter#930. It would be great if most of the challenges you're mentioning could be solved by https://github.com/tree-sitter providing infrastructure to parser repos so that editors can consume them. Offering release workflows for parser repos was one of the ideas (could be parsers or parser+queries). Parser repos could offer dedicated editor specific queries.

I discussed with @clason to move more maintenance of queries out of nvim-treesitter to parser repos nvim-treesitter/nvim-treesitter#4279 (comment). Of course, this only works for repositories whose maintainer care about Neovim support. I was thinking as a first step to at least to have the built-in parsers vim/lua/help

Could use Github packages like homebrew?

It seems that at the moment, GH packages only supports container images, Ruby gems, pip packages, cargo crates. I suspect homebrew might use Ruby gems. I didn't find a way to store versioned binary blobs without the need of a package manager (might also be missing knowledge by me). Since tree-sitter is associated with GH, they might extend this to support tree-sitter parsers or plan binary blobs with versions and meta-data.

Installation via curl would be my favorite. If the tree-sitter organization could standardize parser packages somehow with a central registry, then Neovim could provide a API function that curls a parser given it's name, version tag and the current OS/arch combination. Contributors to Neovim, Helix, Emacs with good ideas will probably need to get active to contribute to solution to avoid having to much complexity in editor repos or end-users machines. On the long run the installer logic in nvim-treesitter should become obsolete.

EDIT: https://pkg-containers.githubusercontent.com/ghcr1/blobs/sha256:ae5d8e9148068e001b5ca7bbc2aa8663aa13b9995245f7655772725add67454c?se=2023-02-19T14%3A20%3A00Z&sig= these URLs looks like homebrew is using the container registry storage to store binary blobs.

@clason
Copy link
Member

clason commented Feb 18, 2023

Just to make this obvious:

  1. Much of this is an issue for all editors using tree-sitter (including Zed), so the way forward here is definitely more coordination with other editors and with upstream. (This includes capture names so we can use upstream highlight queries; custom stuff like @conceal and @spell can live in separate query extension files.)
  2. But Neovim is pretty unique in treating parsers/queries as user-swappable runtime files rather than fixed parts of the editor release, so some difficulties are our own and may need custom solutions; the challenge will be to minimize those.

@justinmk
Copy link
Member Author

justinmk commented Jul 1, 2023

Plan

Notes on treesitter plan ("migration from legacy vim syntax") from chat with @clason :

Short-term (around 0.10):

  • we already have treesitter highlighting by default for query files
  • we can enable treesitter highlighting by default for vimdoc files, and possibly for Lua files
  • we could default foldexpr to treesitter folding for filetypes with a parser
  • I wouldn't go further yet.

Medium-term (around 0.11):

Long-term (1-2 years, not 3+ years...):

  • Fully switch to treesitter: vim upstream runtime files are mostly ignored.
    • We should not spend time thinking about migrating upstream files from vim9script, except specific critical cases. (Basically never for "syntax" files.)
  • Performance issues: workaround by providing an auto-degraded mode which disables slow features if the file is too big / long lines / etc.
  • Work on "bridge" for :syntax and friends. They are still useful for ad-hoc highlights, matching, etc. We just don't want to care about the treadmill of upstream vim9script runtime files.
    • Better composability (example: running legacy syntax for plugin features without legacy highlighting).

@ychin

This comment was marked as resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
distribution packaging and distributing Nvim to users enhancement feature request needs:discussion For PRs that propose significant changes to some part of the architecture or API treesitter
Projects
None yet
Development

No branches or pull requests

4 participants