Skip to content

treesitter distribution strategy (tree-sitter) #22313

@justinmk

Description

@justinmk

Problem

Current situation: Nvim ships a few parsers (C, Lua, vimdoc, vimscript) in its runtime. If user wants more parsers they must build the parser and put it on their 'runtimepath' , or use a project like https://github.com/nvim-treesitter which tries to automatically build parsers on the user's machine.

Nvim can't ship hundreds of parsers in its runtime because

  1. the total file size approaches gigabytes (GB)
  2. undue burden on package maintainers
  3. updating parsers should not require updating Nvim itself?

Ideal case

Ideally, tree-sitter upstream would solve some problems for all tree-sitter consumers by:

  • provide makefiles
  • introspectible parser version that is set through tree-sitter generate
  • parser authors maintaining their own queries and bumping said version every time a parser update requires changes to them

Potential Solutions

Do nothing

Distribute queries

The main problem is lack of query and parser versioning.

  • Ship queries, but not parsers. Queries are relatively tiny text files.
    • Problem: parsers and queries are tightly coupled, so a new parser version could break an existing query.
      • tree-sitter upstream does not provide tools that could help us (like parser introspection, query introspection (version or metadata))
  • Enforce versioned parser names
    • Problem: how?
    • Right now, we only have the commit hash => can't reason about version range.

Distribute parsers (.so/.dll)

  • Develop CI that builds .so/.dll files for every OS. Then Nvim can fetch those on-demand.
    • Benefit: useful for all text editors, not just Nvim.
    • Problem: Where to put (200 * 3) build artifacts? Could use Github packages like homebrew?
    • Problem: similar maintenance burden as nvim-treesitter.
      • Mitigation: strictly refuse to support parsers that don't easily build.
        • Users to nudge the parser maintainer to "fix" their build steps.
  • Develop CI that builds "universal" libs via cosmopolitan c
    • Problem: "fat" libraries are costly: TS .so files are 90%+ data and 10% actual code (just the scanner part). Converting that 10% to WASM is less invasive.
  • Integrate nvim-treesitter's logic for "build the parser locally and put it into rtp"
    • Benefit: gives us a "happy path" answer for users to avoid needing nvim-treesitter.
    • Problem: Nvim becomes a package manager, which is a slippery slope.
      • Mitigation: strictly refuse to support anything but the happy path.
        • Don't try to find compilers in weird places.
        • Don't support configuration.
    • Problem: maintenance burden: many parsers have quirky build steps! May require C++ compiler.
      • Mitigation: strictly refuse to support parsers that don't easily build.
        • Users to nudge the parser maintainer to "fix" their build steps.
      • Alternative: distribute zig binary as a compiler and use that as the toolchain to build on the user's machine.
  • Outsource the problem to installers like mason.
  • ✅ Wait for upstream to support WASM parsers

Metadata

Metadata

Assignees

No one assigned

    Labels

    distributionpackaging and distributing Nvim to usersneeds:discussionissue needs attention from an expert, or PR proposes significant changes to architecture or APItreesitter

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions