Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: empty metadata caused treesitter directives like offset to be ignored #80

Merged
merged 2 commits into from
Jan 15, 2024
Merged

fix: empty metadata caused treesitter directives like offset to be ignored #80

merged 2 commits into from
Jan 15, 2024

Conversation

rollincuberawhide
Copy link
Contributor

@rollincuberawhide rollincuberawhide commented Jan 12, 2024

this commit fixes the vim.treesitter.get_node_text from ignoring the offset and possibly other directives like gsub as well.

the reason for this change is that metadata in query:iter_captures was only available if the capture_id is 1 and was reset to an empty table for the captures after that but wasn't replenished and remained empty.

assume that you have a treesitter injection in a rust macro. here's what it would look like

(macro_invocation
(scoped_identifier
path: (identifier) @path (#eq? @path "sqlx")
name: (identifier) @name (#match? @name "^query.*")
)

(token_tree
(raw_string_literal) @injection.content
(#set! injection.language "sql")
(#set! injection.include-children)
)
(#offset! @injection.content 0 3 0 -2)
)

a raw string in rust looks like this r#"select * from "user";"#

in the #offset! directive raw string start and end identifiers (#r" and "#) are offsetted and what remains is the inner sql.

this offset information is set to the metadata for query.get_node_text to use.

however, it's not available when it needs to be used. in query:iter_captures you get @path capture first then you get @name capture and finally @injection.content capture. metadata is only available in @path capture. which is when the capture_id is 1

this commit sets this to an outside variable and makes it always available.

maybe treesitter needs to fix this as a bug. I don't know honestly.

makes #79 redundant.
probably fixes #72 and #37

…nored

this commit fixes the vim.treesitter.get_node_text from ignoring the
offset and possibly other directives like gsub as well.

the reason for this change is that metadata in query:iter_captures was
only available if the capture_id is 1 and was reset to an empty table
for the captures after that but wasn't replenished and remained empty.

assume that you have a treesitter injection in a rust macro. here's what
it would look like

(macro_invocation
 (scoped_identifier
    path: (identifier) @path (#eq? @path "sqlx")
    name: (identifier) @name (#match? @name "^query.*")
 )

 (token_tree
    (raw_string_literal) @injection.content
    (#set! injection.language "sql")
    (#set! injection.include-children)
 )
 (#offset! @injection.content 0 3 0 -2)
)

a raw string in rust looks like this r#"select * from "user";"#

in the #offset! directive raw string start and end identifiers
(#r" and "#) are offsetted and what remains is the inner sql.

this offset information is set to the metadata for query.get_node_text
to use.

however, it's not available when it needs to be used.
in query:iter_captures you get @path capture first then you get @name
capture and finally @injection.content capture. metadata is only
available in @path capture. which is when the capture_id is 1

this commit sets this to an outside variable and makes it always available.

maybe treesitter needs to fix this as a bug. I don't know honestly.
@rollincuberawhide rollincuberawhide changed the title fix: empty metadata caused treesitter directives like offset to be ig… fix: empty metadata caused treesitter directives like offset to be ignored Jan 12, 2024
@jmbuhr
Copy link
Owner

jmbuhr commented Jan 12, 2024

Good catch, and thank you for your contribution! I think we are close to getting the complete fix.

It looks like the node that has the metadata does not necessarily have id 1. For example, if I take the test.ts example from the other issue:

const vert = glsl`
 void main() {
   gl_Position = vec4(0.0, 0.0, 0.0, 1.0);
 }
`;

const foo = html`
 <div>
   <p>hello</p>
 </div>
`;

const bar = css`
 .foo {
   color: rgb(0, 0, 0);
 }
`;

const lua = lua`
local foo = "bar"
print(foo)
`;

with :lua require'otter'.activate({'html', 'css', 'lua'}) and print in each loop iteration

    vim.print(id)
    vim.print(name)
    vim.print(maybemetadata)
    vim.print("----")

:messages                                                                                                
3                                                                                                        
injection.language                                                                                       
{                                                                                                        
  [2] = {                                                                                                
    range = { 0, 18, 4, 0 }                                                                              
  }                                                                                                      
}                                                                                                        
----                                                                                                     
2                                                                                                        
injection.content                                                                                        
{}                                                                                                       
----                                                                                                     
3                                                                                                        
injection.language                                                                                       
{                                                                                                        
  [2] = {                                                                                                
    range = { 6, 17, 10, 0 }                                                                             
  }                                                                                                      
}                                                                                                        
----                                                                                                     
2                                                                                                        
injection.content                                                                                        
{}                                                                                                       
----                                                                                                     

@jmbuhr
Copy link
Owner

jmbuhr commented Jan 12, 2024

using the metadata any time the table is not empty with

    if next(maybemetadata) ~= nil then
      metadata = maybemetadata
    end

instead of checking the id works for the example, but like you pointed out on reddit may be a problem for cases where it actually has to reset to be empty. Need to check if we have access to those groups of captures that should share the same metadata.

@rollincuberawhide
Copy link
Contributor Author

I think we can reset the metadata on id = 1 and keep adding each key from maybemetadata to metadata.

@jmbuhr
Copy link
Owner

jmbuhr commented Jan 12, 2024

I think I got it, in the nvim docs there is Query:iter_matches() right below iter_caputures. So if we go by matches we should have the captures belonging to the same match in one iteration of the outer loop and the captures in the inner loop of the example in the docs:

        for pattern, match, metadata in cquery:iter_matches(tree:root(), bufnr, first, last) do
          for id, node in pairs(match) do
            local name = query.captures[id]
            -- `node` was captured by the `name` capture in the match

            local node_data = metadata[id] -- Node level metadata

            -- ... use the info here ...
          end
        end

@rollincuberawhide
Copy link
Contributor Author

I think every match starts with id=1 and goes on. we can just reset at id = 1. I might be wrong though.

@jmbuhr
Copy link
Owner

jmbuhr commented Jan 12, 2024

In the example the id with the metadata seems to be 3, followed by the node that needs the data with id 2 🙈

@rollincuberawhide
Copy link
Contributor Author

  local metadata = {}
  for id, node, maybemetadata in query:iter_captures(root, main_nr) do
    if id == 1 then
      metadata = maybemetadata
    end
    vim.tbl_extend("force", metadata, maybemetadata)

could you add the last line and try that test again?

@jmbuhr
Copy link
Owner

jmbuhr commented Jan 12, 2024

maybe using the matches properly will also resolve some hacks I have in determine_language, where I'm getting elements from the metadata with a hardcoded 2, which feels brittle.

@jmbuhr
Copy link
Owner

jmbuhr commented Jan 12, 2024

that does not work, as there is never and id 1 in the example, it starts with 3 :D

@rollincuberawhide
Copy link
Contributor Author

well that's odd :D

@jmbuhr
Copy link
Owner

jmbuhr commented Jan 12, 2024

It this section from the default nvim-treesitter js/ts injections.scm

; html(`...`), html`...`, sql(...) etc
(call_expression
 function: ((identifier) @injection.language)
 arguments: [
             (arguments
              (template_string) @injection.content)
             (template_string) @injection.content
            ]
     (#offset! @injection.content 0 1 0 -1)
     (#not-eq? @injection.language "svg"))

but I have no idea how this leads to the weird numbering. It does tell me that should be wary of those numbers, though.

@rollincuberawhide
Copy link
Contributor Author

rollincuberawhide commented Jan 12, 2024

I think I've fixed it. not much by editing existing code in otter.nvim but rather rewriting treesitter's iter_captures method.

edit: i've re-rewrite it and this time it works in my sqlx injection and in that ts file. but i'm getting an lsp warning in the ts file. I don't know why. may be unrelated to this.(edit: tried it with current version of otter and getting the same error so i'm ignoring it). so here's how I've changed iter_captures method:

local function iter_captures(node, source, query)
  if type(source) == "number" and source == 0 then
    source = api.nvim_get_current_buf()
  end

  local raw_iter = node:_rawquery(query.query, true, 0, -1)
  local metadata = {}
  local function iter(end_line)
    local capture, captured_node, match = raw_iter()

    if match ~= nil then
      local active = query:match_preds(match, match.pattern, source)
      match.active = active
      if not active then
        return iter(end_line) -- tail call: try next match
      else
        -- if it has an active match, reset the metadata. 
        -- apply_directives can fill the metadata
        metadata = {}
      end
      query:apply_directives(match, match.pattern, source, metadata)
    end
    return capture, captured_node, metadata
  end
  return iter
end

and here's how it can be used

  for id, node, metadata in iter_captures(root, main_nr, query) do

it's not a query method but a function that takes query as input.

@jmbuhr
Copy link
Owner

jmbuhr commented Jan 14, 2024

very nice. I have also confirmed that it still works for the existing cases. Is this ready to be merged from your side?

@rollincuberawhide
Copy link
Contributor Author

I'd say yes, I'm okay with it if you're okay with it. It's been working in embedded sql in rust pretty good so far. Is there any changes you want me to do?

@jmbuhr
Copy link
Owner

jmbuhr commented Jan 15, 2024

ok, I'll merge then and maybe refactor some code later to be less reliant on treesitter internals under the assumption that those are less stable under future releases than the public api.

@jmbuhr jmbuhr merged commit 0fd09ca into jmbuhr:main Jan 15, 2024
3 checks passed
@jmbuhr
Copy link
Owner

jmbuhr commented Jan 15, 2024

oh and can you send me an example rust file with the embeds working, such that I can also validate it when I do a refactor?

@rollincuberawhide
Copy link
Contributor Author

rollincuberawhide commented Jan 15, 2024

well rust doesn't really have an official sql embed but I added this injection:

;queries/rust/injections.scm

(macro_invocation
 (scoped_identifier
    path: (identifier) @path (#eq? @path "sqlx")
    name: (identifier) @name (#match? @name "^query.*")
 )
 (token_tree
   (raw_string_literal) @injection.content
 ) 
 (#set! injection.language "sql")
 (#set! injection.include-children)
 (#offset! @injection.content 0 3 0 -2)
)

and although this is not a real rust file, treesitter is happy enough for otter to work:(provided that you set sqls with a database table named "user")

fn main()
    let username = "otter";
    let email = "nvim";
    let password_hash = "pw";

    let user_id = sqlx::query_scalar!(
        r#"
        INSERT INTO "user" (username, email, password_hash)
            VALUES ($1, $2, $3)
        RETURNING
            user_id;
        "#,
        username,
        email,
        password_hash
    )
}

but I think you can write a unit test that just sees if the input and output is as expected, output being the file with just the embedded part and rest just white space.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Hidden buffers sometimes include extra characters from source buffer
2 participants