Skip to content

Commit

Permalink
fix(treesitter): correctly handle query quantifiers (#24738)
Browse files Browse the repository at this point in the history
Query patterns can contain quantifiers (e.g. (foo)+ @bar), so a single
capture can map to multiple nodes. The iter_matches API can not handle
this situation because the match table incorrectly maps capture indices
to a single node instead of to an array of nodes.

The match table should be updated to map capture indices to an array of
nodes. However, this is a massively breaking change, so must be done
with a proper deprecation period.

`iter_matches`, `add_predicate` and `add_directive` must opt-in to the
correct behavior for backward compatibility. This is done with a new
"all" option. This option will become the default and removed after the
0.10 release.

Co-authored-by: Christian Clason <c.clason@uni-graz.at>
Co-authored-by: MDeiml <matthias@deiml.net>
Co-authored-by: Gregory Anders <greg@gpanders.com>
  • Loading branch information
4 people committed Feb 16, 2024
1 parent 1ba3500 commit bd5008d
Show file tree
Hide file tree
Showing 9 changed files with 787 additions and 219 deletions.
14 changes: 13 additions & 1 deletion runtime/doc/news.txt
Expand Up @@ -427,6 +427,18 @@ The following changes to existing APIs or features add new behavior.

|nvim_buf_call()| and |nvim_win_call()| now preserves any return value (NB: not multiple return values)

• Treesitter
|Query:iter_matches()|, |vim.treesitter.query.add_predicate()|, and
|vim.treesitter.query.add_directive()| accept a new `all` option which
ensures that all matching nodes are returned as a table. The default option
`all=false` returns only a single node, breaking captures with quantifiers
like `(comment)+ @comment; it is only provided for backward compatibility
and will be removed after Nvim 0.10.
|vim.treesitter.query.add_predicate()| and
|vim.treesitter.query.add_directive()| now accept an options table rather
than a boolean "force" argument. To force a predicate or directive to
override an existing predicate or directive, use `{ force = true }`.

==============================================================================
REMOVED FEATURES *news-removed*

Expand Down Expand Up @@ -480,7 +492,7 @@ release.

`vim.loop` has been renamed to |vim.uv|.

• vim.treesitter.languagetree functions:
• vim.treesitter functions:
- |LanguageTree:for_each_child()| Use |LanguageTree:children()| (non-recursive) instead.

• The "term_background" UI option |ui-ext-options| is deprecated and no longer
Expand Down
109 changes: 89 additions & 20 deletions runtime/doc/treesitter.txt
Expand Up @@ -223,6 +223,10 @@ The following predicates are built in:
((identifier) @variable.builtin (#eq? @variable.builtin "self"))
((node1) @left (node2) @right (#eq? @left @right))
<
`any-eq?` *treesitter-predicate-any-eq?*
Like `eq?`, but for quantified patterns only one captured node must
match.

`match?` *treesitter-predicate-match?*
`vim-match?` *treesitter-predicate-vim-match?*
Match a |regexp| against the text corresponding to a node: >query
Expand All @@ -231,15 +235,28 @@ The following predicates are built in:
Note: The `^` and `$` anchors will match the start and end of the
node's text.

`any-match?` *treesitter-predicate-any-match?*
`any-vim-match?` *treesitter-predicate-any-vim-match?*
Like `match?`, but for quantified patterns only one captured node must
match.

`lua-match?` *treesitter-predicate-lua-match?*
Match |lua-patterns| against the text corresponding to a node,
similar to `match?`

`any-lua-match?` *treesitter-predicate-any-lua-match?*
Like `lua-match?`, but for quantified patterns only one captured node
must match.

`contains?` *treesitter-predicate-contains?*
Match a string against parts of the text corresponding to a node: >query
((identifier) @foo (#contains? @foo "foo"))
((identifier) @foo-bar (#contains? @foo-bar "foo" "bar"))
<
`any-contains?` *treesitter-predicate-any-contains?*
Like `contains?`, but for quantified patterns only one captured node
must match.

`any-of?` *treesitter-predicate-any-of?*
Match any of the given strings against the text corresponding to
a node: >query
Expand All @@ -265,6 +282,32 @@ The following predicates are built in:
Each predicate has a `not-` prefixed predicate that is just the negation of
the predicate.

*lua-treesitter-all-predicate*
*lua-treesitter-any-predicate*
Queries can use quantifiers to capture multiple nodes. When a capture contains
multiple nodes, predicates match only if ALL nodes contained by the capture
match the predicate. Some predicates (`eq?`, `match?`, `lua-match?`,
`contains?`) accept an `any-` prefix to instead match if ANY of the nodes
contained by the capture match the predicate.

As an example, consider the following Lua code: >lua

-- TODO: This is a
-- very long
-- comment (just imagine it)
<
using the following predicated query:
>query
(((comment)+ @comment)
(#match? @comment "TODO"))
<
This query will not match because not all of the nodes captured by @comment
match the predicate. Instead, use:
>query
(((comment)+ @comment)
(#any-match? @comment "TODO"))
<

Further predicates can be added via |vim.treesitter.query.add_predicate()|.
Use |vim.treesitter.query.list_predicates()| to list all available predicates.

Expand Down Expand Up @@ -923,36 +966,50 @@ register({lang}, {filetype}) *vim.treesitter.language.register()*
Lua module: vim.treesitter.query *lua-treesitter-query*

*vim.treesitter.query.add_directive()*
add_directive({name}, {handler}, {force})
add_directive({name}, {handler}, {opts})
Adds a new directive to be used in queries

Handlers can set match level data by setting directly on the metadata
object `metadata.key = value`, additionally, handlers can set node level
object `metadata.key = value`. Additionally, handlers can set node level
data by using the capture id on the metadata table
`metadata[capture_id].key = value`

Parameters: ~
{name} (`string`) Name of the directive, without leading #
{handler} (`function`)
• match: see |treesitter-query|
• node-level data are accessible via `match[capture_id]`

• pattern: see |treesitter-query|
• match: A table mapping capture IDs to a list of captured
nodes
• pattern: the index of the matching pattern in the query
file
• predicate: list of strings containing the full directive
being called, e.g. `(node (#set! conceal "-"))` would get
the predicate `{ "#set!", "conceal", "-" }`
{force} (`boolean?`)
{opts} (`table<string, any>`) Optional options:
• force (boolean): Override an existing predicate of the
same name
• all (boolean): Use the correct implementation of the
match table where capture IDs map to a list of nodes
instead of a single node. Defaults to false (for backward
compatibility). This option will eventually become the
default and removed.

*vim.treesitter.query.add_predicate()*
add_predicate({name}, {handler}, {force})
add_predicate({name}, {handler}, {opts})
Adds a new predicate to be used in queries

Parameters: ~
{name} (`string`) Name of the predicate, without leading #
{handler} (`function`)
• see |vim.treesitter.query.add_directive()| for argument
meanings
{force} (`boolean?`)
{opts} (`table<string, any>`) Optional options:
• force (boolean): Override an existing predicate of the
same name
• all (boolean): Use the correct implementation of the
match table where capture IDs map to a list of nodes
instead of a single node. Defaults to false (for backward
compatibility). This option will eventually become the
default and removed.

edit({lang}) *vim.treesitter.query.edit()*
Opens a live editor to query the buffer you started from.
Expand Down Expand Up @@ -1102,18 +1159,25 @@ Query:iter_matches({node}, {source}, {start}, {stop}, {opts})
Iterate over all matches within a {node}. The arguments are the same as
for |Query:iter_captures()| but the iterated values are different: an
(1-based) index of the pattern in the query, a table mapping capture
indices to nodes, and metadata from any directives processing the match.
If the query has more than one pattern, the capture table might be sparse
and e.g. `pairs()` method should be used over `ipairs`. Here is an example
iterating over all captures in every match: >lua
for pattern, match, metadata in cquery:iter_matches(tree:root(), bufnr, first, last) do
for id, node in pairs(match) do
local name = query.captures[id]
-- `node` was captured by the `name` capture in the match
indices to a list of nodes, and metadata from any directives processing
the match.

local node_data = metadata[id] -- Node level metadata
WARNING: Set `all=true` to ensure all matching nodes in a match are
returned, otherwise only the last node in a match is returned, breaking
captures involving quantifiers such as `(comment)+ @comment`. The default
option `all=false` is only provided for backward compatibility and will be
removed after Nvim 0.10.

-- ... use the info here ...
Example: >lua
for pattern, match, metadata in cquery:iter_matches(tree:root(), bufnr, 0, -1, { all = true }) do
for id, nodes in pairs(match) do
local name = query.captures[id]
for _, node in ipairs(nodes) do
-- `node` was captured by the `name` capture in the match

local node_data = metadata[id] -- Node level metadata
... use the info here ...
end
end
end
<
Expand All @@ -1129,9 +1193,14 @@ Query:iter_matches({node}, {source}, {start}, {stop}, {opts})
• max_start_depth (integer) if non-zero, sets the maximum
start depth for each match. This is used to prevent
traversing too deep into a tree.
• all (boolean) When set, the returned match table maps
capture IDs to a list of nodes. Older versions of
iter_matches incorrectly mapped capture IDs to a single
node, which is incorrect behavior. This option will
eventually become the default and removed.

Return: ~
(`fun(): integer, table<integer,TSNode>, table`) pattern id, match,
(`fun(): integer, table<integer, TSNode[]>, table`) pattern id, match,
metadata

set({lang}, {query_name}, {text}) *vim.treesitter.query.set()*
Expand Down
4 changes: 2 additions & 2 deletions runtime/lua/vim/treesitter/_meta.lua
Expand Up @@ -39,15 +39,15 @@ local TSNode = {}
---@param start? integer
---@param end_? integer
---@param opts? table
---@return fun(): integer, TSNode, any
---@return fun(): integer, TSNode, TSMatch
function TSNode:_rawquery(query, captures, start, end_, opts) end

---@param query TSQuery
---@param captures false
---@param start? integer
---@param end_? integer
---@param opts? table
---@return fun(): integer, any
---@return fun(): integer, TSMatch
function TSNode:_rawquery(query, captures, start, end_, opts) end

---@alias TSLoggerCallback fun(logtype: 'parse'|'lex', msg: string)
Expand Down
28 changes: 15 additions & 13 deletions runtime/lua/vim/treesitter/_query_linter.lua
Expand Up @@ -122,28 +122,30 @@ local parse = vim.func._memoize(hash_parse, function(node, buf, lang)
end)

--- @param buf integer
--- @param match table<integer,TSNode>
--- @param match table<integer,TSNode[]>
--- @param query Query
--- @param lang_context QueryLinterLanguageContext
--- @param diagnostics Diagnostic[]
local function lint_match(buf, match, query, lang_context, diagnostics)
local lang = lang_context.lang
local parser_info = lang_context.parser_info

for id, node in pairs(match) do
local cap_id = query.captures[id]
for id, nodes in pairs(match) do
for _, node in ipairs(nodes) do
local cap_id = query.captures[id]

-- perform language-independent checks only for first lang
if lang_context.is_first_lang and cap_id == 'error' then
local node_text = vim.treesitter.get_node_text(node, buf):gsub('\n', ' ')
add_lint_for_node(diagnostics, { node:range() }, 'Syntax error: ' .. node_text)
end
-- perform language-independent checks only for first lang
if lang_context.is_first_lang and cap_id == 'error' then
local node_text = vim.treesitter.get_node_text(node, buf):gsub('\n', ' ')
add_lint_for_node(diagnostics, { node:range() }, 'Syntax error: ' .. node_text)
end

-- other checks rely on Neovim parser introspection
if lang and parser_info and cap_id == 'toplevel' then
local err = parse(node, buf, lang)
if err then
add_lint_for_node(diagnostics, err.range, err.msg, lang)
-- other checks rely on Neovim parser introspection
if lang and parser_info and cap_id == 'toplevel' then
local err = parse(node, buf, lang)
if err then
add_lint_for_node(diagnostics, err.range, err.msg, lang)
end
end
end
end
Expand Down
28 changes: 18 additions & 10 deletions runtime/lua/vim/treesitter/languagetree.lua
Expand Up @@ -784,7 +784,7 @@ end
---@private
--- Extract injections according to:
--- https://tree-sitter.github.io/tree-sitter/syntax-highlighting#language-injection
---@param match table<integer,TSNode>
---@param match table<integer,TSNode[]>
---@param metadata TSMetadata
---@return string?, boolean, Range6[]
function LanguageTree:_get_injection(match, metadata)
Expand All @@ -796,14 +796,16 @@ function LanguageTree:_get_injection(match, metadata)
or (injection_lang and resolve_lang(injection_lang))
local include_children = metadata['injection.include-children'] ~= nil

for id, node in pairs(match) do
local name = self._injection_query.captures[id]
-- Lang should override any other language tag
if name == 'injection.language' then
local text = vim.treesitter.get_node_text(node, self._source, { metadata = metadata[id] })
lang = resolve_lang(text)
elseif name == 'injection.content' then
ranges = get_node_ranges(node, self._source, metadata[id], include_children)
for id, nodes in pairs(match) do
for _, node in ipairs(nodes) do
local name = self._injection_query.captures[id]
-- Lang should override any other language tag
if name == 'injection.language' then
local text = vim.treesitter.get_node_text(node, self._source, { metadata = metadata[id] })
lang = resolve_lang(text)
elseif name == 'injection.content' then
ranges = get_node_ranges(node, self._source, metadata[id], include_children)
end
end
end

Expand Down Expand Up @@ -844,7 +846,13 @@ function LanguageTree:_get_injections()
local start_line, _, end_line, _ = root_node:range()

for pattern, match, metadata in
self._injection_query:iter_matches(root_node, self._source, start_line, end_line + 1)
self._injection_query:iter_matches(
root_node,
self._source,
start_line,
end_line + 1,
{ all = true }
)
do
local lang, combined, ranges = self:_get_injection(match, metadata)
if lang then
Expand Down

0 comments on commit bd5008d

Please sign in to comment.