Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] tree-sitter step 2: query API and basic highlighing #11113

Open
wants to merge 35 commits into
base: master
from

Conversation

@bfredl
Copy link
Member

bfredl commented Sep 28, 2019

This PR will wrap the query API tree-sitter/tree-sitter#444 which will form the basis of tree-sitter highlighting, and will be simpler and cleaner than the previous approach #9219. More of the matching/cursor logic is now handled by tree-sitter itself, but what we need to implement ourselves is the handling of predicates tree-sitter/tree-sitter#444 (comment) , as it involves equality/regex matching of the buffer text.

@bfredl

This comment has been minimized.

Copy link
Member Author

bfredl commented Sep 28, 2019

currently this implements a very limited API, you can parse a query and iterate over all captured nodes:

vim.treesitter.add_language("/path/to/bin/c.so", "c")
parser = vim.treesitter.get_parser(1) -- assuming buffer 1 is a C buffer
root = parser:parse():root()

-- match any function with a storage specifier like "static"
query = vim.treesitter.parse_query("c", "(function_definition (storage_class_specifier) @funcclass) @funcdef")
for k,v in root:query(query) do
  print(k, vim.inspect({v:range()}))
  --print(v:sexpr())
end
@oblitum

This comment has been minimized.

Copy link

oblitum commented Sep 28, 2019

How many steps are expected?

@bfredl

This comment has been minimized.

Copy link
Member Author

bfredl commented Sep 28, 2019

As many as turn out to be needed. I've come to dislike long-running feature branches, now that we have a separate "stable" branch, we should use master more for long run in-progress feature development. I will try as much as possible to work with smaller, incremental PRs, of course using the full battery of regression testing and such.

That being said, this PR will also contain a first working prototype of syntax highlighting, as the most important constraint for getting the query API right is that it works effectively for syntax highlighting (though expected features such as efficient multi-lang highlighting, tight performance optimization, etc, might come with later PRs).

@oblitum

This comment has been minimized.

Copy link

oblitum commented Sep 28, 2019

Ah, OK. Thanks for the response. I asked because I got the impression "steps" were part of a definite step plan already.

@bfredl bfredl changed the title [WIP] tree-sitter step 2: query API [WIP] tree-sitter step 2: query API and basic highlighing Sep 29, 2019
@bfredl bfredl force-pushed the bfredl:tree-sitter-query branch from de91cd9 to 25f4fbb Oct 1, 2019
int nextpred = 1;
int nextitem = 1;
for (size_t k = 0; k < len; k++) {
if (step[k].type == TSQueryPredicateStepTypeDone) {

This comment has been minimized.

Copy link
@resolritter

resolritter Oct 4, 2019

what do you think about coding this for loop as a switch statement? I don't usually suggest "equivalent replacements", but this block of code looks a lot like a switch statement on the matches it's doing. A switch might make it more intuitive in its intentions.

This comment has been minimized.

Copy link
@bfredl

bfredl Oct 4, 2019

Author Member

Perhaps, but I will consider matter of style after the complete implementation is done. It might turn out lua is to slow for this, then we will do predicate matching in C instead.

src/nvim/lua/treesitter.c Outdated Show resolved Hide resolved
@bfredl bfredl force-pushed the bfredl:tree-sitter-query branch 3 times, most recently from 962518f to 22a1323 Oct 5, 2019
@bfredl

This comment has been minimized.

Copy link
Member Author

bfredl commented Oct 6, 2019

Just as a heads-up: query based highlighting is now basically working. Only (eq? @capture "text") and (eq? @aa @bb) predicates have been implemented, but I think is good enough to get started. Though it doesn't react correctly/efficiently to changes always (we need ts_tree_get_changed_ranges for that)

I've temporarily included a test file e5c8d0d#diff-fe66df02e13447bca8b7d16efe1a114e to try out a simple query for current C buffer (the usual caveats apply, you have to compile and load the c parser yourself). I will soon get this to a state good enough for dogfooding and some basic benchmarking.

The main open question if if we can get it "fast enough" with a lua based implementation, or if we need to rewrite parts/most of it in C. If we merge anything lua-based, we will need to include the nvim_buf_set_luahl interface (callbacks into lua code during win_update/win_line). Though we can communicate clearly that this interface is subject to change (or even removal) during the 0.5 cycle, in order to make it efficient.

@resolritter

This comment has been minimized.

Copy link

resolritter commented Oct 6, 2019

@bfredl

Though it doesn't react correctly/efficiently to changes always (we need ts_tree_get_changed_ranges for that).

I had a look on where Atom handles updates and found those places:

bufferDidChange({ oldRange, newRange, oldText, newText }) {
https://github.com/atom/atom/blob/master/src/tree-sitter-language-mode.js#L89

handleTextChange
https://github.com/atom/atom/blob/master/src/tree-sitter-language-mode.js#L657

const rangesWithSyntaxChanges = this.tree.getChangedRanges(tree);
https://github.com/atom/atom/blob/master/src/tree-sitter-language-mode.js#L772

_treeEditForBufferChange(start, oldEnd, newEnd, oldText, newText) {
https://github.com/atom/atom/blob/master/src/tree-sitter-language-mode.js#L922

Just for clarification on the behalf of other people trying to follow like me, is this API not conforming or not benefiting from with the changes in tree-sitter/tree-sitter#444 ? Because I notice the PR was merged in Sep 18, but the last update in that Atom file is from Aug 1.

@bfredl

This comment has been minimized.

Copy link
Member Author

bfredl commented Oct 6, 2019

@resolritter I think they are mostly orthogonal (from an API design standpoint). ts_tree_get_changed_ranges tells us what lines have syntax changes after a reparse. tree-sitter/tree-sitter#444 offers an API to calculate the highlights for a given line range, which can be influenced by the former, but also other events, like scrolling (which already is handled in this PR).

@bfredl bfredl force-pushed the bfredl:tree-sitter-query branch from e5c8d0d to a317811 Oct 6, 2019
@bfredl bfredl force-pushed the bfredl:tree-sitter-query branch 3 times, most recently from 9d54067 to 8de8f46 Oct 25, 2019
@bfredl

This comment has been minimized.

Copy link
Member Author

bfredl commented Oct 26, 2019

Update: redrawing highlights to changes to multi-line constructs like /* ... */ now works properly for basic cases (like if the start /* is removed, the entire region needs to be redrawn).

This should now be in a good enough shape that I can dogfood it as the highlighter for C buffers at least. Though there are many smaller bugs and glitches that need to be fixed before I can recommend it for more general use (like highlights disappear temporarily after a long jump, need to investigate).
Screenshot from 2019-10-26 20-04-56

The screenshot shows some examples of tree-based highlighting, like highlighting user types, marking a binary expression with identical operands in red color, and highlighting the name of static functions specifically.

@bfredl bfredl force-pushed the bfredl:tree-sitter-query branch from 8de8f46 to 414113e Nov 3, 2019
@bfredl

This comment has been minimized.

Copy link
Member Author

bfredl commented Nov 3, 2019

Did some initial profiling with unchanged code only. Of the total time spent in the win_line lua call and below, 90% is spent in tree-sitter C code, i e maximally 10% overhead of using lua, which should be acceptable (though it would become more with more involved predicate logic).

However, to fairly benchmark against regex highlighting we need to also include reparsing cost on change, by emulating changes to the buffer by some systemic process or perhaps by logging change events from a real edit session and replay it.

@bfredl bfredl force-pushed the bfredl:tree-sitter-query branch 2 times, most recently from b7210dd to 5cf1653 Nov 4, 2019
@bfredl bfredl force-pushed the bfredl:tree-sitter-query branch 2 times, most recently from e89ddbb to 4684d02 Nov 15, 2019
@bfredl bfredl force-pushed the bfredl:tree-sitter-query branch from 4684d02 to 38460dd Nov 24, 2019
@bfredl bfredl force-pushed the bfredl:tree-sitter-query branch from 57a3f94 to 7de231a Dec 1, 2019
@lgtm-com

This comment has been minimized.

Copy link

lgtm-com bot commented Dec 1, 2019

This pull request introduces 1 alert when merging 7de231a into a17ccb0 - view on LGTM.com

new alerts:

  • 1 for Comparison of narrow type with wide type in loop condition
@neovim neovim deleted a comment from lgtm-com bot Dec 1, 2019
@lgtm-com

This comment has been minimized.

Copy link

lgtm-com bot commented Dec 1, 2019

This pull request introduces 1 alert when merging b442c46 into 70b6061 - view on LGTM.com

new alerts:

  • 1 for Comparison of narrow type with wide type in loop condition
@maxbrunsfeld

This comment has been minimized.

Copy link

maxbrunsfeld commented Dec 10, 2019

@bfredl Just a heads up that you may want to upgrade to the latest version of the Tree-sitter library on this branch, because I've published new versions of all of the parsers that provide a new ABI version which is incompatible with older versions of the library, so the new parsers will fail to load with the old library. Fortunately, the new version of the library is still capable of loading old versions of the parsers, so there should be no downside to upgrading.

The main benefit of the new ABI is that parsers have much smaller binary sizes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants
You can’t perform that action at this time.