Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(query): extracts get_range_text from get_node_text #20158

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

AckslD
Copy link
Sponsor Contributor

@AckslD AckslD commented Sep 11, 2022

Currently we have get_node_text which gets the text corresponding to a node using the values from node:start() and node:end_(). When #offset! is used in a query (for example for injections or text objects) the range of the match will not equal the range of the node.

In nvim-treesitter/nvim-treesitter#3486 for example the metadata is now included in the matches of the query. So with this change here one could then simply do get_range_text(match.metadata.range) to get the text using the same logic as get_node_text.

Tagging @clason since I briefly asked about this on matrix.

@clason
Copy link
Member

clason commented Sep 11, 2022

I think we want to park this until after 0.8, when we get a cleaner picture of which utilities are generally useful and which only in the context of nvim-treesitter.

This should also be a part of a more general refactor on how ranges are handled; ideally, we can reduce the surface API of these utility functions, not increase them (e.g., by marking some of them as private, or combining them).

In any case, this behavior should be controlled by an option as you may want the original node range.

return M.get_range_text({start_row, start_col, start_byte, end_row, end_col, end_byte}, source, opts)
end

--- Gets the text corresponding to a given rangen
Copy link
Member

@justinmk justinmk Sep 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • should there be some (extremely brief) hint about the fact that a "range" maps to the input text (which is usually a buffer)? Or is that understood? Maybe we need a tag like treesitter-range to link to?
  • typo : "rangen"

Copy link
Sponsor Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it would be good to also specify why this is different than nvim_buf_get_text, at least if source is a number? I wonder if it's more clear if instead there is a get_vim_range as in nvim-treesitter and then we use that with nvim_buf_get_text directly?

Fixed the typo at least.

@AckslD
Copy link
Sponsor Contributor Author

AckslD commented Sep 12, 2022

Thanks for your feedback! After sleeping on this, I'm wondering if the better approach (at least for the usecase I had) would be to support a third optional argument to eg get_node_text which is the metadata of the node. If this is not nil and has an entry range, this is then used instead of node:start and node:end_?

Then one could do for example:

for id, node, metadata in query:iter_captures(tree:root(), bufnr, first, last) do
  local text = get_node_text(node, bufnr, metadata) -- text here then takes #offset! into account
end

How does that sound? Then the user does not need to explicitly touch the ranges anyway.

@clason
Copy link
Member

clason commented Sep 12, 2022

We usually treat optional arguments as an opts keyword-value table. This should be the pattern here if we go that route.

But, yes, adding options is preferable to adding another function.

@AckslD
Copy link
Sponsor Contributor Author

AckslD commented Sep 12, 2022

Why make it only the pattern? Could be both but seems like it would be useful to also include the metadata since it already have the offset range computed. Ie pattern would be just

{ "offset!", 1, "0", "1", "0", "-1" }

whereas metadata would include

{range = { 14, 7, 14, 14 }}

@clason
Copy link
Member

clason commented Sep 12, 2022

I meant "design pattern".

@AckslD
Copy link
Sponsor Contributor Author

AckslD commented Sep 12, 2022

Does it makes sense if I make an update to this MR with a suggestion for an optional opts which can include the metadata to get_node_text, or would you rather not make such changes at this point?

@clason
Copy link
Member

clason commented Sep 12, 2022

This would be necessary in any case, but I don't think we want to merge this before 0.8 (end of month) either way. (Full transparency: which doesn't guarantee that we'll merge it afterwards, although I'm not opposed to this PR on principle.)

@AckslD
Copy link
Sponsor Contributor Author

AckslD commented Sep 15, 2022

Based on what we discussed in nvim-treesitter/nvim-treesitter#3487 I no longer think it makes sense to add metadata as part of an optional opts to get_node_text. The reason is that this would get the text corresponding to a capture and not the node as the name of the function indicates.

@justinmk
Copy link
Member

I wonder if it's more clear if instead there is a get_vim_range as in nvim-treesitter and then we use that with nvim_buf_get_text directly?

That sounds like a potentially more powerful/generalized approach if it will avoid lots of extra functions.

@clason
Copy link
Member

clason commented Oct 16, 2022

Now that Nvim 0.8 is released and the dust has settled somewhat, we should circle back to this (in particular because the next step is to update nvim-treesitter to use the newly upstreamed core APIs, and I want to do this only once, so any significant refactoring should happen before).

But before we make any changes, we need to take a step back and discuss what interface we want here. The various node_or_range functions are mixing abstractions to a degree I'm uncomfortable with. There's also some commits from nvim-treesitter/playground#99 which should be included in this discussion.

Basically, what we want is to be able to extract the text corresponding to some tree-sitter object. For this, we need to find the buffer positions corresponding to the object. In particular, this requires translating from "tree-sitter coordinates" to "buffer coordinates". The last is related to similar functions for LSP and Lua in general (especially involving multibyte characters), so we should treat this as part of a general-purpose utility ("find buffer range from coordinates", extending vim.region -- naming should be made consistent and, if differences are important (e.g., contiguous vs. block-shaped), documented).

Regarding objects, we want to handle (at least) these individually:

  1. nodes (which are wrappers for objects held by the tree-sitter library);
  2. captures (which are Lua tables extracted from nodes via a query and can be modified by directives stored in its metadata field);
  3. matches (which are mappings from node to capture, as one node could be part of multiple captures and vice versa) -- although iter_matches only seems used once in the code and shares much code with iter_captures, so maybe this level is not necessary (and can in fact be removed?)
  4. patterns?

The goal would be to compose the current features via a minimal set of functions (and possibly fill out the "feature matrix"). For commonly used functions (like get_{node,captures}_at_cursor) that users may want to call often, we could still add a convenient wrapper.

In general, refactoring everything to return/accept a range (tuple of {start_row,start_col,end_row,end_col}) only is good; whether these ranges should be corrected using metadata attached to a matching capture should be either done through a separate function or controlled through an opts field. A useful first step would be to see how these functions are used in the nvim-treesitter ecosystem to make sure we are able to cover the important use cases without undue friction.

@neovim/treesitter

@clason
Copy link
Member

clason commented Oct 16, 2022

(patterns/matches are related to predicates and directives, so that's more of question towards generalizing those @lewis6991 )

@zeertzjq zeertzjq removed the lua stdlib label Mar 22, 2023
@justinmk
Copy link
Member

justinmk commented Jul 4, 2023

@AckslD do you plan to continue this?

@justinmk justinmk added the needs:response waiting for reply from the author label Jul 4, 2023
@github-actions github-actions bot requested a review from lewis6991 July 4, 2023 12:55
@AckslD
Copy link
Sponsor Contributor Author

AckslD commented Jul 5, 2023

@justinmk, I'd be happy to continue this but I'm not fully sure if there are more recent views on what this api should look like. Happy to discuss this though. However, I won't really have any time the coming month or so.

@github-actions github-actions bot removed the needs:response waiting for reply from the author label Jul 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants