Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Tree-sitter step 1: vendor runtime lib + add lua API #10124

Open
wants to merge 43 commits into
base: master
from

Conversation

@bfredl
Copy link
Member

commented Jun 5, 2019

#9219 has already grown pretty large. To get something reviewable and mergable in a resonable timeframe, I will try to divide it into smaller PRs. My suggestion for a MVP is the following: vendor the tree-sitter runtime lib into our source-tree, and expose syntax trees as a lua API. More specific features such as syntax highlighting and multi-language parsing will be follow up PRs.

Vendoring the runtime allows us to always build the lua tree-sitter bindings and support code (there will be no FEAT_TREESITTER), without making the build any more complicated for distributors. The runtime lib without parsers is not comparatively large, about as big as nvim's syntax.c in its entirety. Though I expect libutf8proc to be common in distros and "native binary" package solutions already, so we should be able to treat that as an ordinary dependency.

On the other hand, the tree-sitter parsers for specific languages will initially not be built by the nvim build system. Instead we expect those who experiment with the MVP to e.g. build the parsers they want themselves for their respective platform, and nvim will provide an interface to load them as dynamic libraries. This means we can discuss and work on a reasonable final distribution strategy in parallel with other tasks (as mentioned above).

CC @maxbrunsfeld if you have any comments of our vendoring strategy.

@bfredl bfredl changed the title [WIP] Tree-sitter step 1: vendor tree-sitter runtime + add lua API [WIP] Tree-sitter step 1: vendor runtime lib + add lua API Jun 5, 2019

@marvim marvim added the WIP label Jun 5, 2019

@maxbrunsfeld

This comment has been minimized.

Copy link

commented Jun 5, 2019

Vendoring the runtime

nvim will provide an interface to load them as dynamic libraries.

👍 This makes sense to me.

In case it's useful as a reference, the Tree-sitter CLI has logic for automatically compiling Tree-sitter parsers into dynamic libraries and dynamically loading them when you run the tree-sitter test and tree-sitter parse commands.

Great to see this happening! Let me know if I can help with anything.

@bfredl

This comment has been minimized.

Copy link
Member Author

commented Jun 5, 2019

In case it's useful as a reference, the Tree-sitter CLI has logic for automatically compiling Tree-sitter parsers into dynamic libraries and dynamically loading them when you run the tree-sitter test and tree-sitter parse commands.

That sounds nice. Maybe we can use it to automate building of parsers for our own binary packages (like nvim.appimage) later on.

@phodge

This comment has been minimized.

Copy link
Contributor

commented Jun 5, 2019

Does this main tree-sitter will be doing all its work in the main thread? And if so, would it be hard to push this processing work out into a child process using buffer update events and RPC in the future?

@bfredl

This comment has been minimized.

Copy link
Member Author

commented Jun 5, 2019

@phodge the idea is is that most parses would be really quick. Thus we benefit from the simplicity of accessing the tree-sitter node API in-process with a thin lua wrapper. I think atom uses a hybrid method, it parses in the main thread up to an operation limit, but then pushes the remaining parsing job to a separate thread if it is too slow. It would be interesting to see if we can do something similar.

@phodge

This comment has been minimized.

Copy link
Contributor

commented Jun 5, 2019

the idea is is that most parses would be really quick. Thus we benefit from the simplicity of accessing the tree-sitter node API in-process with a thin lua wrapper.

I understand that Tree-sitter is designed to be fast enough for running in the main thread, and that it makes the APIs much simpler if we do it in the main thread. But I think we should be cautious about this approach because these are both reasons why Vim failed us and we had to fork Neovim. Too many things competing for a single CPU core, and no way to distribute the workload because IPC is hard. But tree-sitter in a separate process is probably a problem for the future.

If it's not too much to ask, I'd love to see a help text explaining how to use the Lua functions to access the syntax tree. It looks like currently all I can do is turn on tree-sitter for a buffer and then ask TS to highlight the node under the cursor?

@bfredl

This comment has been minimized.

Copy link
Member Author

commented Jun 5, 2019

If it's not too much to ask, I'd love to see a help text explaining how to use the Lua functions to access the syntax tree. It looks like currently all I can do is turn on tree-sitter for a buffer and then ask TS to highlight the node under the cursor?

Sure, apart form fixing the build system, the plan is to add docs and tests for part of the API we will expose in this first PR :)

@maxbrunsfeld

This comment has been minimized.

Copy link

commented Jun 5, 2019

it parses in the main thread up to an operation limit, but then pushes the remaining parsing job to a separate thread if it is too slow. It would be interesting to see if we can do something similar.

Yeah, Atom uses this API to set a timeout of ~1 millisecond for main thread parsing. And if that timeout expires, it resumes parsing on a background thread. The syntax trees are persistent, so they can be cheaply copied for use an a background thread.

I'm curious how this would translate to Neovim. In Atom's case, it relies on the text buffer having a snapshot API that allows you to safely read from a certain version of the buffer in a background thread, even if the user is mutating the buffer on the main thread. Does the vim buffer have a way to do something like that?

@bfredl

This comment has been minimized.

Copy link
Member Author

commented Jun 5, 2019

I'm curious how this would translate to Neovim. In Atom's case, it relies on the text buffer having a snapshot API that allows you to safely read from a certain version of the buffer in a background thread, even if the user is mutating the buffer on the main thread. Does the vim buffer have a way to do something like that?

There is some form of "locking" mechanism to lock parts of the buffer in the backing storage (memline.c), but I don't think it was ever intended for safe multithreaded usage :P We probably have to make a straight copy of the modified part of the buffer in that case. But ts_tree_copy would still be a reason to consider multi-threading and not only multi-processing.

@justinmk

This comment has been minimized.

Copy link
Member

commented Jun 6, 2019

Too many things competing for a single CPU core, and no way to distribute the workload because IPC is hard

#9943 is exploring ways to make IPC easier.

@bfredl bfredl force-pushed the bfredl:tree-sitter-api branch 2 times, most recently from 9e974ff to d611726 Jun 6, 2019

@teto teto referenced this pull request Jun 7, 2019

Merged

vimPlugins.semshi: init at 2018-12-05 #62787

6 of 10 tasks complete

@bfredl bfredl force-pushed the bfredl:tree-sitter-api branch 2 times, most recently from 808ad4d to 1c600f6 Jun 7, 2019

@justinmk

This comment has been minimized.

Copy link
Member

commented Jun 9, 2019

bikeshed: would be nice if "tree sitter" is spelled treesitter everywhere (filenames etc.), so that searching for it is more predictable.

@bfredl bfredl force-pushed the bfredl:tree-sitter-api branch 2 times, most recently from a8607a4 to ce4d57d Jun 9, 2019

@bfredl

This comment has been minimized.

Copy link
Member Author

commented Jun 9, 2019

Update: basic tree-sitter test now works on travis (64-bit linux and OS X). windows/appveyor still requires some work...

@bfredl bfredl force-pushed the bfredl:tree-sitter-api branch 2 times, most recently from e876bae to 46a3f4e Jun 15, 2019

@bfredl

This comment has been minimized.

Copy link
Member Author

commented Jun 15, 2019

Building the parser and running tests now work on 32-bit Appveyor (MINGW_64 nvim also compiles, but the parser .dll built by tree-sitter-cli is 32-bit and won't load).

@bfredl bfredl force-pushed the bfredl:tree-sitter-api branch 2 times, most recently from 272bb26 to 4ae02cb Jun 17, 2019

@bfredl

This comment has been minimized.

Copy link
Member Author

commented Jun 21, 2019

@justinmk would it be possible to install 64-bit node instead on the 64-bit appveyor builds? Or should we just skip tree-sitter tests on MINGW_64 for now? (We can look into compiling the languages by ourselves later, node is not really required).

@maxbrunsfeld

This comment has been minimized.

Copy link

commented Jun 22, 2019

the parser .dll built by tree-sitter-cli is 32-bit and won't load

would it be possible to install 64-bit node instead on the 64-bit appveyor builds?

If for some reason you don't want to change the Node installation, you could also download the 64-bit tree-sitter binary directly from GitHub, instead of installing it with npm. When you install it with npm, it just detects the platform and architecture and downloads the appropriate binary from the GitHub release.

@bfredl

This comment has been minimized.

Copy link
Member Author

commented Jun 22, 2019

@maxbrunsfeld Thanks. that would cover all travis/appveyor builds except 32-bit linux. I guess we could bulid that binary ourselves and keep it in the cached dependencies, for now.

@bfredl bfredl force-pushed the bfredl:tree-sitter-api branch from d4278d0 to 8f734a3 Jun 22, 2019

@bfredl bfredl force-pushed the bfredl:tree-sitter-api branch from 45d4ef0 to 32aeb0f Aug 5, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.