Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate tree-sitter parsers into the toplevel build system #29042

Open
thestr4ng3r opened this issue May 27, 2024 · 19 comments
Open

Integrate tree-sitter parsers into the toplevel build system #29042

thestr4ng3r opened this issue May 27, 2024 · 19 comments
Labels
build building and installing Neovim using the provided scripts enhancement feature request

Comments

@thestr4ng3r
Copy link

thestr4ng3r commented May 27, 2024

Problem

During packaging for MacPorts (macports/macports-ports#24120 (comment)), the question came up why the cmake build system (under cmake.deps/) is strictly separate from the toplevel CMakeLists.txt, requiring either multiple manual steps or using the additional wrapper Makefile. This has become especially cumbersome since as of v0.10.0, neovim expects multiple builtin tree-sitter parsers to be available at <prefix>/lib/nvim/parser or otherwise will throw runtime errors on common tasks like opening :help.

Even worse is the fact that the toplevel cmake silently does or does not use the parsers in .deps depending on whether they were built manually before, so forgetting to build them can easily go through unnoticed, which is what initially happened in MacPorts. Using the wrapper Makefile is not a good option there.

CMake has extensive builtin functionality for embedding other cmake roots, from e.g. a simple add_subdirectory to ExternalProject/FetchContent. Having cmake.deps/ as a standalone-buildable project makes sense because you can point multiple neovim builds to it. But I was not able to find reasons in the codebase or past issues why there had to be a wrapper-Makefile around the toplevel cmake for tasks that cmake itself should be able to handle just fine.
So my question is: Was this a deliberate design decision and what was the reasoning behind it?

Expected behavior

I could imagine a build system like this:

  • cmake.deps/ stays mostly as-is: A standalone project for bundled dependencies
  • The toplevel CMakeLists.txt can (using options) be manually pointed to a .deps folder, told to not use bundled dependencies at all, or it will embed cmake.deps/ itself using something like ExternalProject and build and manage a specified subset of dependencies in its own build dir. It should fail the build if building any of the dependencies fails.
  • Makefile can be removed, or at least stripped down a lot to only go through the toplevel cmake.
@thestr4ng3r thestr4ng3r added the enhancement feature request label May 27, 2024
@wookayin wookayin added the build building and installing Neovim using the provided scripts label May 27, 2024
@dundargoc
Copy link
Member

This is basically impossible without nasty hacks. An attempt of this was made in #22054 and the problem is that FetchContent only works with projects who use cmake and we have many dependencies who don't use cmake. We can't use ExternalProject either as that is run during build time instead of configure time which will make the main build error as dependencies (that now don't exist) need to be resolved in configure time. It's a nasty situation cmake has explicitly expressed they will not try to fix as "cmake is not a package manager". A workaround for this is to fork all our dependencies that don't use cmake and add a top level Cmakelists.txt file or something, but it didn't seem popular when I brought this up. The best hope I can give is that we may change the build system in the future: #28344.

With that background out of the way: I don't quite understand why this is a problem. Distros are usually expected to package everything themselves and should able to just run the main build since all dependencies are already available. I don't quite get what blocks you from just using the main file (assuming you've already packaged all dependencies).

@thestr4ng3r
Copy link
Author

Thank you for linking the pr, this is what I was looking for. Good to see this was already approached.

We can't use ExternalProject either as that is run during build time instead of configure time which will make the main build error as dependencies (that now don't exist) need to be resolved in configure time.

This can probably be worked around by predicting the exact paths of installed files by the external project and putting dependencies on the target created by ExternalProject_Add instead of using a find module. But I understand this may be considered hacky and it also adds another code path to resolve each dependency.

With that background out of the way: I don't quite understand why this is a problem. Distros are usually expected to package everything themselves and should able to just run the main build since all dependencies are already available. I don't quite get what blocks you from just using the main file (assuming you've already packaged all dependencies).

This is the case for MacPorts, except for the tree-sitter parsers, which are expected to be installed in a neovim-specific directory. In that sense:

  • Should those parsers really be considered an external dependency or rather a part of neovim itself?
  • Could generic packages of those parsers, like https://ports.macports.org/port/tree-sitter-c/details/ be used reliably with neovim or is the ABI/API too unstable so neovim needs the exact version from its own build system?
  • Or, since they are already being built with a custom injected CMakeLists.txt, would it be reasonable to only integrate those into the main cmake somehow?

@clason
Copy link
Member

clason commented May 27, 2024

Thank you for asking these questions. Yes, these parsers should be considered part of Neovim, just like the other runtime files, and you cannot rely on "semantic version" stability (the exact same version should work, but no guarantees -- and there is a virtual guarantee that no two consumers (Helix, Emacs, Neovim) will depend on the same version.

@dundargoc
Copy link
Member

dundargoc commented May 27, 2024

This can probably be worked around by predicting the exact paths of installed files by the external project and putting dependencies on the target created by ExternalProject_Add instead of using a find module.

We don't use a find module for these because they are runtime dependencies. It is not cmake that locates these but neovim itself. Neovim relies on these being in the correct location during runtime.

Should those parsers really be considered an external dependency or rather a part of neovim itself?

I don't know if this is a philosophical question or a practical one. They are dependencies because we don't maintain them. If you mean that we should treat them as they are part of neovim (like with runtime files) then that is a different question. It's probably not a popular suggestion as other distros will throw a hissy fit, although I don't remember exactly why. They don't like this though.

Could generic packages of those parsers, like ports.macports.org/port/tree-sitter-c/details be used reliably with neovim or is the ABI/API too unstable so neovim needs the exact version from its own build system?

Exact

@dundargoc
Copy link
Member

Or, since they are already being built with a custom injected CMakeLists.txt, would it be reasonable to only integrate those into the main cmake somehow?

I'll need to think about this. Normally this would be out of the question for dependencies that have other dependencies relying on them (such as luajit), but it might be possible for dependencies that don't have other dependencies relying on them (such as parsers IIRC). I will need to experiment first to say for sure.

@dundargoc
Copy link
Member

I still don't understand why packaging the parsers are not an option. Is it too much work to package the parsers or what's the problem?

@dundargoc dundargoc added the needs:response waiting for reply from the author label May 29, 2024
@thestr4ng3r
Copy link
Author

I don't know if this is a philosophical question or a practical one. They are dependencies because we don't maintain them. If you mean that we should treat them as they are part of neovim (like with runtime files) then that is a different question.

The latter. It seems more logical to me that in the specific case of the parsers, which need to be an exact version and installed in an nvim-specific location, they are part of the core nvim build itself.
It would also be more convenient if I build manually since right now I am forced to build cmake.deps/ for the parsers, even if I try to supply all dependencies from my distribution because the parsers can't be reliably used from generic packages.

It's probably not a popular suggestion as other distros will throw a hissy fit, although I don't remember exactly why. They don't like this though.

They usually want a library to be installed in a single location by a single package, so it can be patched and updated for bug and security fixes without having to dig through all downstream packages that use it and where it might be bundled somehow with an outdated version.
This is absolutely reasonable for most dependencies, but not so much for the neovim parsers as those have to be an exact version. Of course one could split it into e.g. two packages neovim and neovim-parsers, but that does not bring much benefit as neovim-parsers has no purpose other than being a dependency of neovim and can't be updated independently anyway.

I still don't understand why packaging the parsers are not an option. Is it too much work to package the parsers or what's the problem?

It is an option, but it is not perfect. Basically there are two possibilities:

@github-actions github-actions bot removed the needs:response waiting for reply from the author label May 29, 2024
@dundargoc
Copy link
Member

I think it's possible to move the parsers download/installation to main build. Only question is if it makes sense to do in general and if other distributions have a problem with this (there are others).

@neovim/automation thoughts?

@jamessan
Copy link
Member

Speaking with my Debian hat on, I'm packaging the parsers independently. This is in the (possibly misguided) hope that the tree-sitter story around parser distribution / versioning / interface stability improves in the long run, so there doesn't need to be such strict association between parsers and users of the parsers.

It seems more logical to me that in the specific case of the parsers, which need to be an exact version and installed in an nvim-specific location, they are part of the core nvim build itself.

The parser packages in Debian aren't shipping pre-built parsers, partly for the reasons you mention here. The "build" of the parser packages re-generates the src/ directory using tree-sitter-cli and then builds a package from that. This package contains the code needed to build the .so as well as a helper Makefile to hide the details of the build itself. The intent is that from the perspective of packages needing parsers, they have a standard mechanism to build any of the packaged parsers.

Then neovim (and eventually other packages) have build dependencies on, e.g., tree-sitter-c-src, and use the respective helper Makefile to build the version of the parser they need. They then install the resulting shared object where needed (in &rtp/parser/<foo>.so for neovim).

It's all structured so that, if needed, multiple versions of a parser can be available and each dependent package can build the one it needs, although neovim is currently the only consumer so I haven't actually started providing multiple versions.

Yes, this was a lot of up front work, but it ensures Debian can regenerate all parsers, if needed, and provides a standard interface (which I can simplify going forward) that any package in Debian can use. Adding new parsers is pretty simple. Updating the tree-sitter tooling is much more involved.


All that being said, I also understand that given the current state of the tree-sitter ecosystem, it's much more attractive to just make Neovim build everything itself. At a minimum, that would require vendoring the parsers, since many distributions don't allow network access during the build.

@dundargoc
Copy link
Member

dundargoc commented May 30, 2024

We can't vendor parsers atm as they'd be system dependent. If vendoring parsers is the approach we wanna take then we need wasmtime up and running first.

@thestr4ng3r
Copy link
Author

We can't vendor parsers atm as they'd be system dependent. If vendoring parsers is the approach we wanna take then we need wasmtime up and running first.

If I understand the initial request right, this would be about vendoring the parser sources, not the binaries. Only to avoid the download.
I could imagine this to happen as part of an automated build process of a release source tarball for neovim that the parser sources are also bundled in, and the build system takes these pre-cached sources if available and skips the download.

@clason
Copy link
Member

clason commented May 30, 2024

If I understand the initial request right, this would be about vendoring the parser sources, not the binaries. Only to avoid the download.

Have you seen the size of some of these parsers? We do not want them in our source tree and blowing up the repo.

@thestr4ng3r
Copy link
Author

Have you seen the size of some of these parsers? We do not want them in our source tree and blowing up the repo.

Absolutely. This is why I suggested to put them into release tarballs only, not into the repo. If you have the tarball and want to build it, you would then need to download them sooner or later anyway so there it makes no difference anymore.

@clason
Copy link
Member

clason commented May 30, 2024

Our tarballs are (intentionally!) identical to the source tree, though. (Insert xz reference here.)

@dundargoc
Copy link
Member

dundargoc commented May 30, 2024

Absolutely. This is why I suggested to put them into release tarballs only, not into the repo.

Then that is not vendoring in neovim(?) lingo, vendoring is including external source code into the neovim repo. Your suggestion is also not any different than just moving the parser installation from the deps cmake build to the main cmake build, which is basically what I proposed. This approach would help macports, but not tryhard distros like debian and fedora as they can't use internet connection.

@jamessan's suggestion of vendoring parsers would overall help more distros (including macports), but is also more ambitious. That is probably a decent long term goal/workaround.

@thestr4ng3r
Copy link
Author

Then that is not vendoring in neovim(?) lingo, vendoring is including external source code into the neovim repo. Your suggestion is also not any different than just moving the parser installation from the deps cmake build to the main cmake build, which is basically what I proposed. This approach would help macports, but not tryhard distros like debian and fedora as they can't use internet connection.

There are two suggestions here. We should not mix them:

  • Move parser installation from deps cmake to main cmake: helps macports and the overall build process with other dependencies provided by the system
  • Include parser sources in neovim release tarballs (not git): helps building without allowing network access

@clason
Copy link
Member

clason commented May 31, 2024

Include parser sources in neovim release tarballs (not git): helps building without allowing network access

Completely out of question. Don't waste your time trying to argue about that.

@thestr4ng3r
Copy link
Author

Alright, then I think this entire issue now boils down to just the question whether moving the tree-sitter parser handling from cmake.deps to the main cmake build is possible and sensible.

@thestr4ng3r thestr4ng3r changed the title Properly integrate cmake.deps into toplevel build system Integrate tree-sitter parsers into the toplevel build system May 31, 2024
@thestr4ng3r
Copy link
Author

As a PoC, macports now uses this patch to integrate the parsers into the toplevel cmake: https://github.com/macports/macports-ports/blob/77f8d99816f5e9ac3ef7869a772f79f1d26f8f24/editors/neovim/files/embed-parsers-build.diff
This patch is made to be small and easily adaptable to future neovim versions, so not meant to be upstreamed as-is of course, but at least it shows that it can theoretically work.
Also, if possible FetchContent + add_subdirectory may be better than the current ExternalProject since this would avoid having to manually pass through cmake args.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build building and installing Neovim using the provided scripts enhancement feature request
Projects
None yet
Development

No branches or pull requests

5 participants