Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TreesitER #279

Draft
wants to merge 6 commits into
base: master
Choose a base branch
from
Draft

TreesitER #279

wants to merge 6 commits into from

Conversation

adimit
Copy link

@adimit adimit commented Mar 14, 2023

Emacs 29 will bring treesit—a new way to incrementally parse files using treesitter grammars. It's fantastic 🎉

This is supposed to bring treesit-based expansions to expand-region. Since we can use the AST directly, there is no need for more than one expansion rule, which makes the implementation almost trivial.

The only problem (and the reason this PR is still a draft) is that I can't figure out how to add the grammar files to the Emacs runtime used by ecukes. Any pointers are appreciated, but I'll try to tackle it over the weekend.


TODO:

  • Don't enumerate modes
  • Keep old expansions
  • Fix over-eager treesitter expansions for treesitter nodes that are delimited by brackets
  • Fix ecukes tests (probably a Cask configuration issue)

@adimit adimit changed the title Treesit er TreesitER Mar 15, 2023
@magnars
Copy link
Owner

magnars commented Mar 15, 2023

That is indeed fantastic! This looks great. Two points:

  • If the blocker is tests, I suggest we consider skipping them for now. Please do give it a fair try, but let us not be blocker by it.

  • Is there not some overarching mode or hook we could use, so that we don't have to enumerate all the ts-modes ourselves?

Cheers for doing this!

@adimit
Copy link
Author

adimit commented Mar 15, 2023

That is indeed fantastic! This looks great. Two points:

  • If the blocker is tests, I suggest we consider skipping them for now. Please do give it a fair try, but let us not be blocker by it.

I'd like to give it a shake, but I'll stop if it complicates the test setup too much (which it may).

  • Is there not some overarching mode or hook we could use, so that we don't have to enumerate all the ts-modes ourselves?

Yeah, I was looking for one, but couldn't find anything. Things are still in flux so perhaps it's just not there yet.

We could advise treesit-major-mode-setup but I opted to keep it simple.

@ckruse
Copy link

ckruse commented Mar 15, 2023

Sadly, there isn't. Maybe it is worth to propose it to emacs-devel? Dunno if it is to late for 29, though.

@adimit
Copy link
Author

adimit commented Mar 15, 2023

I've opened a bug to see if we can't get some sort of hook for treesit.

@daanturo
Copy link

daanturo commented Mar 18, 2023

  • About the exhaustive enumeration on all of the -ts-mode major modes, I don't think that nor the need of treesit-major-mode-hook is necessary, we can just add er/treesit-er-parent-node as one of the default expansions globally. er--expand-region-1 already has built-in ignore-errors guard to suppress expansion errors, in fact I have tested that approach just fine on Emacs 28 after forking your branch: adimit@a6ac1d0.
    Or checking (and (fboundp #'treesit-available-p) (treesit-available-p)), or (require 'treesit nil t) first before adding globally.

  • Also, I don't think we can aggressively discard other expansions and use only er/treesit-er-parent-node (https://github.com/adimit/expand-region.el/blob/8090469c65269b3b5f940aca952b584cecad12cd/treesit-er-expansions.el#L54), as it doesn't work inside (most) strings and comments. I can think of 2 approaches:

    • Keep old non-treesit expansions
    • Use the old non-treesit expansions for strings and comments, and the treesit one for code, but that's probably too complicated
      And it's generally undesirable to override users's config (if any).
  • Currently unlike other expansions, er/treesit-er-parent-node also can't mark the region inside (and excluding) brackets:

(❚arg: String❚)

I think we can improve the current implementation by prefer marking the region between the first named child's beginning and the last named child's end first, compared to the entire parent node. (I may come up with that later, if possible.)

@adimit
Copy link
Author

adimit commented Mar 19, 2023

  • About the exhaustive enumeration on all of the -ts-mode major modes, I don't think that nor the need of treesit-major-mode-hook is necessary, we can just add er/treesit-er-parent-node as one of the default expansions globally. er--expand-region-1 already has built-in ignore-errors guard to suppress expansion errors, in fact I have tested that approach just fine on Emacs 28 after forking your branch: adimit@a6ac1d0.
    Or checking (and (fboundp #'treesit-available-p) (treesit-available-p)), or (require 'treesit nil t) first before adding globally.

Well, no global ts-mode-hook is going to come out of my feature request. However, we could have our cake and eat it, too, by hooking into prog-mode (or some other general hook, as text-mode could conceivably be parsed by tree-sitter, too) and checking for tree-sitter there. Something like that:

(defmacro add-ts-mode-hook (f)
  "Add mode hook that only executes in ts-enabled modes"
  (when (and (fboundp 'treesit-available-p)
             (treesit-available-p))
    `(add-hook 'prog-mode-hook
               (lambda ()
                 (when (treesit-language-at (point))
                   (,f))))))

Note that checking treesit-available-p is not enough, as that isn't buffer-local. It just tells you whether Emacs was compiled with this feature. That's why the above macro expands to nothing when the feature isn't there. I'm using (treesit-language-at (point)) here to check whether it's enabled in the buffer, though there's bound to be a better way.

In general, I'd like to find a way to avoid having to make a check on every invocation, and only install the expander when necessary. It makes things clearer to the user: imagine the user inspecting er/try-expand-hook and noticing our treesitter expansion, only to wonder why it doesn't actually do anything (or worse, to think it's misbehaving).

It's also the only way to remove other expansions when tree-sitter is enabled.

  * Keep old non-treesit expansions

I'll try that out, if it works, it might be the simplest option.

  * Use the old non-treesit expansions for strings and comments, and the treesit one for code, but that's probably too complicated

Finding strings and comments could be difficult, because AFAIU, not all grammars give the same kind of node names to these. E.g. in rust-ts-mode running treesit-node-type gives " and line_comment and in tsx-ts-mode you'll get string_fragment and comment.

Also, some languages might have symbols, or atoms etc, on which these expansions should also work.

    And it's generally undesirable to override users's config (if any).

I agree in principle, though we're overriding in any case, by adding ours.

* Currently unlike other expansions, `er/treesit-er-parent-node` also can't mark the region inside (and excluding) brackets:

Yes, that's true, thanks for calling it to my attention. I'll try to think of a fix.

I think we can improve the current implementation by prefer marking the region between the first named child's beginning and the last named child's end first, compared to the entire parent node. (I may come up with that later, if possible.)

IIUC, we'd have to check whether we're on a delimiter, like ( [ { to do this, because otherwise it wouldn't be useful in all other cases (word expansions).

This has a drawback: the user could have the feature
enabled (i.e. Emacs compiled with treesitter support) but never
actually load a treesitter mode during their editing session. We would
still execute (require 'treesit-er-expansions)! I don't know of a good
way to avoid that.
Instead of enumerating treesitter modes, we just ask every buffer
whether it's a treesitter buffer in its hook.
@adimit
Copy link
Author

adimit commented Mar 20, 2023

@daanturo Thanks for your input. I've decided to append treesitter expansions to er/try-expand-list. I think this behaviour would be more expected by users of expand-region. In case someone wants the pure tree-sitter experience, setting er/try-expand-list manually is possible.

In my trials, this mostly solved the problems with brackets. The exception are brackets that the existing expansions don't find, but treesitter's expansion does find. I can't think of a mode-independent fix for that. At the very least you'd have to specify which brackets your ts-sub-mode expects.


@magnars I've removed the enumeration of different modes. Sadly, my feature request for a hook for ts-modes has been met with little enthusiasm. Currently, the code has two weaknesses:

  • we can't hold back on (require treesit-er-expansions) until a ts-mode is actually loaded. We need to require the file on Emacs startup. That's because:
  • we have to check whether treesitter is available at all in every text-mode and prog-mode buffer, and then conditionally add tressitter expansions. Without a hook to subsume all ts-modes, I think that's the only option. Eli Zaretskii seemed to agree (see my above feature request).

I've not yet solved the ecukes problem. That one's next on the list.

@magnars
Copy link
Owner

magnars commented Mar 21, 2023

Seems like a good middle ground. 👍

@adimit
Copy link
Author

adimit commented Mar 21, 2023

@magnars Yeah, as expected, running the tests is a mess. I didn't push the commit here, but made it available here on a separate branch.

tl;dr: it's not really too difficult, but there's a hard requirement to point Emacs to a binary file (I chose libtree-sitter-rust.so). But that's an ELF 64-bit LSB shared object according to file. Very much not portable.

The test works, it's just that Emacs won't actually load tree-sitter grammars without binary blobs, which aren't (yet) included in the executable or default Emacs distribution and perhaps never will be.

@daanturo
Copy link

daanturo commented Apr 1, 2023

Thank you for taking my previous comment to account , after a while I have noticed a few more things.

  • There is a little problem with the current er/treesit-er--get-node-between: because it searchs for a parent node whose ends extends beyond the region-end, it will fail to find an appropriate parent node whose end is the same but with a beginning extends before region-beginning.

For example

fn some_func(arg0: String, arg1: b|ool)

In this case, after marking "bool", its parent node to mark after that should be "arg1: bool" instead of the whole parameters list including the brackets.

  • To check if treesit is available in the buffer, IMO treesit-parser-list maybe a better choice than treesit-language-at, since the latter makes use of the former so it should be more lightweight. treesit-parser-list also doesn't need querying and passing a state-sensitive information such as (point).

  • Maybe add expansions to conf-mode too to cover all text files? Although currently there I haven't know a -ts-mode who derives from it yet.

  • About marking the "inner list", I have written one expander namely er/treesit-mark-bigger-list-or-node at daanturo@31a0556 (in another file just as a suggestion so that it doesn't conflict with yours) on branch expand-region.el/tree/treesit. The recorded preview is for Rust and Elixir. Note that in addition to marking inside brackets, using tree sitter also helps us marking list of statements between "begin/do ... end". Hope we can discuss more about this because currently my implementation depends on a bunch of heuristics.

er-treesit.webm

@LionyxML
Copy link

I am thrilling to see this being developed!

Just manually added the https://github.com/magnars/expand-region.el/pull/279/files#diff-698e32988bb3d04dacdb2cb8f9b74ffe5371bec409c5fe594ce14d68371dd7f2 changes to my local package.

Working beautifully on emacs 30. Specially good with those JSX (html inside javascript) tags.

@EgorDuplensky
Copy link

I have also applied the patch locally.
Working perfectly, great improvement for the example-region!

@twmr
Copy link
Contributor

twmr commented Aug 18, 2023

Note that there is also a new package called https://github.com/casouri/expreg (I guess it will soon be added to ELPA, because its author asked for adding this package to ELPA)

@wyuenho
Copy link

wyuenho commented Sep 16, 2023

There's also https://github.com/mickeynp/combobulate too

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
8 participants