Skip to content

Conversation

@TheLeoP
Copy link
Contributor

@TheLeoP TheLeoP commented Sep 15, 2025

Also addresses #2006

This is an alternative to #2009. Instead of querying all of the language_trees, the parent language tree is queried until a textobject is found or the root language_tree has been queried. Just like #2009, it can be modified to skip the querying unless the current language is inside of an allow list.

Copy link
Member

@echasnovski echasnovski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I like this approach more than #2009. Mostly because querying all language trees every time seems like a bad design. Yes, this approach might not work for next / last search method when it comes to not related injections (like separate code blocks), but I think it is more expected (i.e. each injected tree should be treated as if it is self contained).

After addressing comments for 'mini.ai', would you mind also making as similar as possible changes to 'mini.surround (gen_spec.input.treesitter()`)? They should have as much common code structure as possible.

@TheLeoP TheLeoP changed the title feat(ai): query parent language_tree until a textobject is found feat: query parent LanguageTree until a textobject is found Sep 16, 2025
@TheLeoP
Copy link
Contributor Author

TheLeoP commented Sep 16, 2025

Ok, all tests are finally passing. Let me know if there is some other change that you would like me to make and/or if you would like this PR to also address adding a configuration option for allowing only certain languages for each filetypes @echasnovski

@TheLeoP TheLeoP marked this pull request as ready for review September 16, 2025 16:22
@echasnovski
Copy link
Member

Thanks!

If this works for markdown use case (like "select next code block" when inside regular text) right now, then probably no allow list needed yet.

I'll daily drive it for a couple of days and will then take a closer look (probably Thursday).

@TheLeoP
Copy link
Contributor Author

TheLeoP commented Sep 16, 2025

I squashed the commits into two, one for each module affected.

If this works for markdown use case (like "select next code block" when inside regular text) right now, then probably no allow list needed yet.

Yes, this is working for #2006 . The code block textobject work when the cursor is inside a python, markdown or markdown_inline tree.

While working on this, I also found a weird edge case when using treesitter to get textobjects. If the textobject's end is at the end of the file, its to is { line = last_line + 1, col = 0 } (where last_line is the last line of the buffer), instead of being { line = last_line, col = last_line_length }. This causes the example on #2006 to fail for vac, because the range described by treesitter is considered to be outside of the region (unless a blank line is added after the code block).

I tried to look into the treesitter documentation, but there doesn't seem to be any formal definition on how the start/end positions of a node behaves. :h treesitter metions that they are 0-based, but that's about it. The first mention of this in the Neovim docs comes from neovim/neovim#10124 .

https://github.com/tree-sitter/tree-sitter/blob/master/lib/include/tree_sitter/api.h is the official C api for treesitter, it also doesn't mention anything specific about the ranges (although the implementation makes them 0-based).

I'm not well-versed in C, so take the following information with a grain of salt. The C implementation of :range (https://github.com/neovim/neovim/blob/cbfa7f0d7b55c5329e6ffd36451b41b7f41b645c/src/nvim/lua/treesitter.c#L981-L1005) uses ts_node_end_point for the end of the node. ts_node_end_point's implementation (https://github.com/tree-sitter/tree-sitter/blob/339bad2de4696e79aaec629b9ef569f7b0b22f0e/lib/src/node.c#L453-L455) gets the end point by adding some offset to the starting point. Said offset comes from ts_subtree_size (https://github.com/tree-sitter/tree-sitter/blob/339bad2de4696e79aaec629b9ef569f7b0b22f0e/lib/src/subtree.h#L286-L293), which seems to always create a point with { row = 0, col = size_bytes}. For two points a and b, if b.row is 0 (which it should be when getting the end point of a node), point_add seems to create the new point by { row = a.row, col = a.col + b.col }. That's as far as I could get, but I don't understand how the range of the root node (and also the code block node) in the example of #2006 has {row = 7, col = 0} as its end (which is functionally the same as {row = 6, col = 3}).

For reference, this is the text from #2006

# Some document

This document contains some text:

```python
x = 10
```

Fixing this issue would be a small change. Changing

  local ts_range_to_region = function(r)
    -- The `master` branch of 'nvim-treesitter' can return "range four" format
    -- if it uses custom directives, like `#make-range!`. Due ot the fact that
    -- it doesn't fully mock the `TSNode:range()` method to return "range six".
    -- TODO: Remove after 'nvim-treesitter' `master` branch support is dropped.
    local offset = #r == 4 and -1 or 0
    return { from = { line = r[1] + 1, col = r[2] + 1 }, to = { line = r[4 + offset] + 1, col = r[5 + offset] } }
  end

to

  local ts_range_to_region = function(r)
    local last_line = vim.fn.line('$')

    -- The `master` branch of 'nvim-treesitter' can return "range four" format
    -- if it uses custom directives, like `#make-range!`. Due to the fact that
    -- it doesn't fully mock the `TSNode:range()` method to return "range six".
    -- TODO: Remove after 'nvim-treesitter' `master` branch support is dropped.
    local offset = #r == 4 and -1 or 0
    local reg = { from = { line = r[1] + 1, col = r[2] + 1 }, to = { line = r[4 + offset] + 1, col = r[5 + offset] } }
    if reg.to.line == last_line + 1 then
      reg.to.line = last_line
      reg.to.col = #vim.fn.getline('$')
    end
    return reg
  end

but this would be an ad-hoc solution without some kind of formal specification for how treesitter ranges are supposed to behave.

Actually, I could find neovim/neovim#29762 in the Neovim repo. So, I guess this is an issue with the parser (?. This seems to happen in all of the parsers that I've tried, though (lua, vim, markdown and query)

@TheLeoP
Copy link
Contributor Author

TheLeoP commented Sep 16, 2025

I asked in the "Neovim treesitter" matrix channel and clason explained to me that, if a node contains a trailing newline, it's end_ will change from row-inclusive, col-exclusive to row-exclusive, col-0. So, I'll open a different PR with the changes I mentioned earlier

@echasnovski echasnovski changed the base branch from main to backlog September 18, 2025 08:59
@echasnovski echasnovski merged commit fbf24dc into nvim-mini:backlog Sep 18, 2025
11 checks passed
@echasnovski
Copy link
Member

echasnovski commented Sep 18, 2025

Thanks again for the PR!

I've made some small tweaks and merged into main. Here is a summary for future:

  • Adjusted comments to be shorter. Generally try to fit comments within 79 column width.

  • Moved H.append_ranges() after it is used and formatted to use H.append_ranges = function() form.

  • In H.error_treesitter() added sorting languages for more robust testing (as it is the error code path, performance doesn't matter much here) and used vim.inspect instead of function(lang) return string.format('"%s"', lang) end.

  • Directly mentioned issue in commit messages and replaced "textobjects"->"surroundings" for 'mini.surround' commit.

Here is the diff:

'mini.ai' adjustments
diff --git a/lua/mini/ai.lua b/lua/mini/ai.lua
index d155d4e9..f85a8511 100644
--- a/lua/mini/ai.lua
+++ b/lua/mini/ai.lua
@@ -1552,21 +1552,6 @@ H.get_matched_ranges_plugin = function(captures)
   return res
 end
 
-function H.append_ranges(res, buf_id, query, captures, lang_tree)
-  -- Compute ranges of matched captures
-  local capture_is_requested = vim.tbl_map(function(c) return vim.tbl_contains(captures, '@' .. c) end, query.captures)
-
-  for _, tree in ipairs(lang_tree:trees()) do
-    -- TODO: Remove `opts.all`after compatibility with Neovim=0.10 is dropped
-    for _, match, metadata in query:iter_matches(tree:root(), buf_id, nil, nil, { all = true }) do
-      for capture_id, nodes in pairs(match) do
-        local mt = metadata[capture_id]
-        if capture_is_requested[capture_id] then table.insert(res, H.get_nodes_range_builtin(nodes, buf_id, mt)) end
-      end
-    end
-  end
-end
-
 H.get_matched_ranges_builtin = function(captures)
   -- Get buffer's parser (LanguageTree)
   local buf_id = vim.api.nvim_get_current_buf()
@@ -1580,7 +1565,7 @@ H.get_matched_ranges_builtin = function(captures)
 
   local missing_query_langs = {}
   local res = {}
-  -- Recursively query parent LanguageTree as fallback (important for injected languages)
+  -- Maybe go up parent trees to work with injected languages
   while vim.tbl_isempty(res) and lang_tree ~= nil do
     local lang = lang_tree:lang()
     -- Get query file depending on the local language
@@ -1590,7 +1575,7 @@ H.get_matched_ranges_builtin = function(captures)
     if query == nil then missing_query_langs[lang] = true end
 
     -- `LanguageTree:parent()` was added in Neovim<0.10
-    -- TODO: Change to `lang_tree:parent()` after compatibility with Neovim=0.9 is dropped
+    -- TODO: Drop extra check after compatibility with Neovim=0.9 is dropped
     lang_tree = lang_tree.parent and lang_tree:parent() or nil
   end
   if vim.tbl_isempty(res) and not vim.tbl_isempty(missing_query_langs) then
@@ -1600,6 +1585,21 @@ H.get_matched_ranges_builtin = function(captures)
   return res
 end
 
+H.append_ranges = function(res, buf_id, query, captures, lang_tree)
+  -- Compute ranges of matched captures
+  local capture_is_requested = vim.tbl_map(function(c) return vim.tbl_contains(captures, '@' .. c) end, query.captures)
+
+  for _, tree in ipairs(lang_tree:trees()) do
+    -- TODO: Remove `opts.all`after compatibility with Neovim=0.10 is dropped
+    for _, match, metadata in query:iter_matches(tree:root(), buf_id, nil, nil, { all = true }) do
+      for capture_id, nodes in pairs(match) do
+        local mt = metadata[capture_id]
+        if capture_is_requested[capture_id] then table.insert(res, H.get_nodes_range_builtin(nodes, buf_id, mt)) end
+      end
+    end
+  end
+end
+
 H.get_nodes_range_builtin = function(nodes, buf_id, metadata)
   -- In Neovim<0.10 `Query:iter_matches()` has `match` map to single node.
   -- TODO: Remove `opts.all`after compatibility with Neovim=0.9 is dropped
@@ -1620,12 +1620,13 @@ end
 H.error_treesitter = function(failed_get, langs)
   local buf_id, ft = vim.api.nvim_get_current_buf(), vim.bo.filetype
   if langs == nil then
-    local ok, ft_lang = pcall(vim.treesitter.language.get_lang, ft)
-    -- `vim.treesitter.language.get_lang()` defaults to `ft` only on Neovim>0.11
-    -- TODO: Remove `and ft_lang ~= nil` after compatibility with Neovim=0.10 is dropped
-    langs = (ok and ft_lang ~= nil) and { ft_lang } or { ft }
+    local has_lang, ft_lang = pcall(vim.treesitter.language.get_lang, ft)
+    -- `vim.treesitter.languagnvime.get_lang()` defaults to `ft` on Neovim>0.11
+    -- TODO: Drop check after compatibility with Neovim=0.10 is dropped
+    langs = (has_lang and ft_lang ~= nil) and { ft_lang } or { ft }
   end
-  local langs_str = table.concat(vim.tbl_map(function(lang) return string.format('"%s"', lang) end, langs), ', ')
+  table.sort(langs)
+  local langs_str = table.concat(vim.tbl_map(vim.inspect, langs), ', ')
   local langs_noun = #langs == 1 and 'language' or 'languages'
   local msg = string.format('Can not get %s for buffer %d and %s %s.', failed_get, buf_id, langs_noun, langs_str)
   H.error(msg)
diff --git a/tests/test_ai.lua b/tests/test_ai.lua
index 9bb5ee1a..61079c1e 100644
--- a/tests/test_ai.lua
+++ b/tests/test_ai.lua
@@ -920,15 +920,15 @@ T['gen_spec']['treesitter()']['validates builtin treesitter presence'] = functio
     '%(mini%.ai%) Can not get parser for buffer 3 and language "my_aaa"%.'
   )
 
-  if child.fn.has('nvim-0.10') == 0 then return end
   -- - Should show each language
+  if child.fn.has('nvim-0.10') == 0 then return end
   child.cmd('enew')
   child.bo.filetype = 'help'
   set_lines({ '>vim', '    set cursorline', '<' })
   set_cursor(2, 0)
   expect.error(
     function() child.lua('MiniAi.find_textobject("a", "F")') end,
-    '%(mini%.ai%) Can not get query for buffer 3 and languages "vimd?o?c?", "vimd?o?c?"%.'
+    '%(mini%.ai%) Can not get query for buffer 3 and languages "vim", "vimdoc"%.'
   )
 
   -- - Should show each language once
'mini.surround' adjustments
diff --git a/lua/mini/surround.lua b/lua/mini/surround.lua
index db0b448f..3c1b0803 100644
--- a/lua/mini/surround.lua
+++ b/lua/mini/surround.lua
@@ -1539,6 +1539,7 @@ H.get_matched_range_pairs_builtin = function(captures)
 
   local missing_query_langs = {}
   -- Compute matched ranges for both outer and inner captures
+  -- Maybe go up parent trees to work with injected languages
   local outer_ranges, inner_ranges = {}, {}
   while (vim.tbl_isempty(inner_ranges) or vim.tbl_isempty(outer_ranges)) and lang_tree ~= nil do
     local lang = lang_tree:lang()
@@ -1555,7 +1556,7 @@ H.get_matched_range_pairs_builtin = function(captures)
     if query == nil then missing_query_langs[lang] = true end
 
     -- `LanguageTree:parent()` was added in Neovim<0.10
-    -- TODO: Change to `lang_tree:parent()` after compatibility with Neovim=0.9 is dropped
+    -- TODO: Drop extra check after compatibility with Neovim=0.9 is dropped
     lang_tree = lang_tree.parent and lang_tree:parent() or nil
   end
 
@@ -1618,12 +1619,13 @@ end
 H.error_treesitter = function(failed_get, langs)
   local buf_id, ft = vim.api.nvim_get_current_buf(), vim.bo.filetype
   if langs == nil then
-    local ok, ft_lang = pcall(vim.treesitter.language.get_lang, ft)
-    -- `vim.treesitter.language.get_lang()` defaults to `ft` only on Neovim>0.11
-    -- TODO: Remove `and ft_lang ~= nil` after compatibility with Neovim=0.10 is dropped
-    langs = (ok and ft_lang ~= nil) and { ft_lang } or { ft }
+    local has_lang, ft_lang = pcall(vim.treesitter.language.get_lang, ft)
+    -- `vim.treesitter.language.get_lang()` defaults to `ft` on Neovim>0.11
+    -- TODO: Drop check after compatibility with Neovim=0.10 is dropped
+    langs = (has_lang and ft_lang ~= nil) and { ft_lang } or { ft }
   end
-  local langs_str = table.concat(vim.tbl_map(function(lang) return string.format('"%s"', lang) end, langs), ', ')
+  table.sort(langs)
+  local langs_str = table.concat(vim.tbl_map(vim.inspect, langs), ', ')
   local langs_noun = #langs == 1 and 'language' or 'languages'
   local msg = string.format('Can not get %s for buffer %d and %s %s.', failed_get, buf_id, langs_noun, langs_str)
   H.error(msg)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants