Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(treesitter): add filetype -> lang API #22207

Merged
merged 1 commit into from
Feb 21, 2023

Conversation

lewis6991
Copy link
Member

@lewis6991 lewis6991 commented Feb 10, 2023

Problem

vim.treesitter does not know how to map a specific filetype to a parser.

This creates problems since in a few places (including in vim.treesitter itself), the filetype is incorrectly used in place of lang.

Solution

Add an API to enable this:

  • Add vim.treesitter.language.add() as a replacement for vim.treesitter.language.require_language().
    • Optional arguments are now passed via an opts table.
    • Also takes a filetype (or list of filetypes) so we can keep track of what filetypes are associated with which langs.
    • Deprecated vim.treesitter.language.require_language().
  • Add vim.treesitter.language.get_lang() which returns the associated lang for a given filetype.
  • Add vim.treesitter.language.register() to associate filetypes to a lang without loading the parser.

Note: This change works on the principle that filetype -> lang is a many to one relationship.

nvim-treesitter downstream PR: nvim-treesitter/nvim-treesitter#4296

local fname = 'parser/' .. lang .. '.*'
local paths = a.nvim_get_runtime_file(fname, false)
if #paths == 0 then
for _, l in ipairs {lang, symbol_name} do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather have the user decide what they want instead of nvim trying out multiple paths. If the SO-filename should be configurable, it could be an extra parameter.

As I understood nvim-treesitter/nvim-treesitter#4296 (comment), this to enable ft to lang mapping like the Lua table that we have in nvim-treesitter. Please note that when you're moving the functionality of mapping ft to parser to the treesitter.c hashmap, you will load the same parser multiple times as the same SO-parser file can be loaded for multiple langs (lang being the key for a loaded parser). Isn't this something that the caller of require_language can handle via a simple Lua table mapping? At nvim-treesitter, we always tried to have ft and lang be separate concepts.

Copy link
Member Author

@lewis6991 lewis6991 Feb 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather have the user decide what they want instead of nvim trying out multiple paths. If the SO-filename should be configurable, it could be an extra parameter.

It already is as the second path argument.

Please note that when you're moving the functionality of mapping ft to parser to the treesitter.c hashmap, you will load the same parser multiple times as the same SO-parser file can be loaded for multiple langs (lang being the key for a loaded parser).

I'm aware of this, and this is just a small implementation deficiency that should be easy enough to resolve. In practice, this will just mean that users with a bash and zsh buffer open in the same session will have the sh parser loaded multiple times. For the majority, this shouldn't be an issue since filetype usually maps directly to lang.

At nvim-treesitter, we always tried to have ft and lang be separate concepts.

I've seen no benefit in keeping this distinction, and for me has only ever caused problems.

For example, I've currently had to introduce some awkward logic in #22191 because get_parser defaults to filetype for lang despite a parser already being associated for a buffer.

The end goal I'm working to is that the whole filetype -> lang relationship is encapsulated to parser installation and management. Everywhere else in any treesitter logic, only filetype should be considered.

In this PR I've only looked in both paths (for when path isn't provided) just to maintain some compatibility for users who will continue to pass lang as the first argument. Once downstream has adapted, which we won't give much time for, we then only search for parsers in parser/{symbol_name}.so

@theHamsta
Copy link
Member

theHamsta commented Feb 12, 2023

I think the distinction betweenlang and ft is meaningful. lang is the decided by the parser developers outside of the Neovim-context and ft is decided by Neovim. We had many weird bugs in the beginning of nvim-treesitter when parts of our code referred to ft and other to lang. This was especially useful when the file type was dynamically changed but could still use the same parser (js->react or bash ->PKGBUILD, long loading times when plugins changed 3x ft on file open). Multiple ft could have the same lang or you could switch the lang for one ft. Also injections could be something like markdown_inline which doesn't correspond to any ft. This would map nicely to the internal C hashmap which holds one parser for each <lang>

I think the problem here could be the understanding what lang refers to. For me, lang refers to

  • the grammar repo name tree-sitter-<lang>
  • the grammar name in grammar.js and therefore the SO symbol
  • parser/<lang>.so and queries/<lang>/*.scm

So basically the parser identification goes by the name the grammar author has chosen and have nvim-treesitter a configurable mapping from ft to lang. Of course you could also have

  • parser/<ft>.so and queries/<ft>/*.scm

and configure a mapping to the .SO symbol name which seems to be your understanding. If we want to to make <lang> and <ft> the same thing it would avoid confusion when we renamed lang in documentation and codebase. I think both approaches would work. It just has to be clear what lang means.

I would be glad in nvim-treesitter when I wouldn't need to care about filetypes. I don't want to create ; inherit xxx for every alternative name for a filetypes.

@lewis6991
Copy link
Member Author

lewis6991 commented Feb 12, 2023

  • parser/<ft>.so and queries/<ft>/*.scm

I've not been clear about the direction yet, but this is not the intention. The lang is strictly tied to the parser and the symbol is actually completely configurable but 100% of the time is the same as lang currently. Most treesitter applications do not need to care or even have the information of lang and get_parser() is usually only used with a buffer number, and should always work. However right now it is broken because it defaults lang to the buffers filetype. To resolve this I believe we need require_language() to also be passed the filetypes the language is expected to handle.

Initially I planned to keep this filetype knowledge in the treesitter.c map, however on reflection (with help of your comment) I realised we can maintain this mapping in Lua and just tweak the signature of require_language.

I've updated the PR to add a get_lang(filetype) function which uses an internal map populated in require_language.

This means get_parser can default lang correctly and users of require_language can do:

add('bash', { filetype = { 'sh', 'csh', 'tcsh', 'bash', 'zsh' } }

As opposed to just:

require_language('bash')

@lewis6991 lewis6991 changed the title feat(treesitter): try symbol name for lang when loading parsers feat(treesitter): make get_parser default lang correctly Feb 12, 2023
@lewis6991 lewis6991 changed the title feat(treesitter): make get_parser default lang correctly feat!(treesitter): make get_parser default lang correctly Feb 12, 2023
@lewis6991 lewis6991 force-pushed the feat/ts_ft_lang branch 7 times, most recently from 94e3b4e to 4db8621 Compare February 13, 2023 15:52
@lewis6991 lewis6991 changed the title feat!(treesitter): make get_parser default lang correctly feat(treesitter): add vim.treesitter.language.add() Feb 13, 2023
@lewis6991 lewis6991 force-pushed the feat/ts_ft_lang branch 5 times, most recently from ad7a572 to a068813 Compare February 13, 2023 16:19
@clason
Copy link
Member

clason commented Feb 14, 2023

Should the new function be used by vim.treesitter.start()? That is the point where an automatic filetype->lang lookup would be most useful.

Also, is the plan to deprecate the "old" functions?

@lewis6991
Copy link
Member Author

Should the new function be used by vim.treesitter.start()? That is the point where an automatic filetype->lang lookup would be most useful.

Sure.

Also, is the plan to deprecate the "old" functions?

The plan is to at least deprecated require_language. Not sure about inspect_language.

@lewis6991
Copy link
Member Author

I've added a register function so filetypes can be linked to a lang without loading a parser. This will means frameworks like nvim-treesitter can register the filetypes on setup/config without having to maintain their own API on top of core (which is aware of these mappings).

@lewis6991 lewis6991 force-pushed the feat/ts_ft_lang branch 3 times, most recently from 85bd563 to 38ee119 Compare February 14, 2023 15:17
@lewis6991 lewis6991 changed the title feat(treesitter): add vim.treesitter.language.add() feat(treesitter): add filetype -> lang API Feb 14, 2023
@lewis6991 lewis6991 marked this pull request as ready for review February 14, 2023 15:25
@lewis6991
Copy link
Member Author

lewis6991 commented Feb 14, 2023

Marking ready, but I'm sure there are some aspects that might need iterating based on feedback.

@lewis6991 lewis6991 force-pushed the feat/ts_ft_lang branch 2 times, most recently from f4b4740 to 36f4088 Compare February 14, 2023 16:56
@lewis6991 lewis6991 force-pushed the feat/ts_ft_lang branch 4 times, most recently from 6ca05f8 to 6487a8b Compare February 21, 2023 16:42
Problem:

  vim.treesitter does not know how to map a specific filetype to a parser.

  This creates problems since in a few places (including in vim.treesitter itself), the filetype is incorrectly used in place of lang.

Solution:

  Add an API to enable this:

  - Add vim.treesitter.language.add() as a replacement for vim.treesitter.language.require_language().
    - Optional arguments are now passed via an opts table.
    - Also takes a filetype (or list of filetypes) so we can keep track of what filetypes are associated with which langs.
    - Deprecated vim.treesitter.language.require_language().
  - Add vim.treesitter.language.get_lang() which returns the associated lang for a given filetype.
  - Add vim.treesitter.language.register() to associate filetypes to a lang without loading the parser.
@clason
Copy link
Member

clason commented Feb 27, 2023

@lewis6991 I think this introduced a regression in :checkhealth:

vim.treesitter: require("vim.treesitter.health").check()

- Nvim runtime ABI version: 14
- ERROR Parser "c" failed to load (path: /usr/local/lib/nvim/parser/c.so): ?
- ERROR Parser "help" failed to load (path: /usr/local/lib/nvim/parser/help.so): ?
- ERROR Parser "lua" failed to load (path: /usr/local/lib/nvim/parser/lua.so): ?
- ERROR Parser "vim" failed to load (path: /usr/local/lib/nvim/parser/vim.so): ?

(The parsers work, and show up as OK after they are loaded by vim.treesitter.start().)

Can you reproduce this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants