Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: find closest pairs using tree-sitter #8294

Merged
merged 2 commits into from Apr 24, 2024

Conversation

woojiq
Copy link
Contributor

@woojiq woojiq commented Sep 15, 2023

Closes #8325
closes #3615
closes #3432
closes #4475
closes #7012

Finding the closest surround pairs (mim, mam) was done without being aware of the syntax. This PR adds a tree-sitter implementation that may be faster (untested), but most important it's more correct.
If the syntax distinct strings than mim and mam will work inside strings (didn't work with plaintext impl), matching should also work inside injections.

OLD `find_nth_closest_pairs_pos` produces unsorted tuple. The tuple must be sorted before passing it to the change Transaction. Otherwise, this statement will fail: https://github.com/helix-editor/helix/blob/13d4463e4146473391e37b7f75e99b731aa55878/helix-core/src/transaction.rs#L582-L585

MRE:

  1. hx
  2. repeat exact keys (two spaces after "("): ( <esc>hvhmdm. Make sure auto-pairs is turn on.

@woojiq woojiq marked this pull request as draft September 15, 2023 10:28
@woojiq

This comment was marked as outdated.

@woojiq woojiq marked this pull request as ready for review September 15, 2023 10:47
@pascalkuthe
Copy link
Member

pascalkuthe commented Sep 15, 2023

I haven't revied this too much yet but it strikes me as a bit odd that mdm actually uses the codepath for mi<x> instead of the codepath of mim/mam (which uses TS).

Perhpahs instead of fixing the broken code here it might be more worthwhile to change that? Thaughts @the-mikedavis

@the-mikedavis
Copy link
Member

I think it makes sense historically: surround_* functions were all introduced together before textobjects: #320 vs #728, but I think it makes more sense to use tree-sitter if possible (ideally the same code path as mm?)

@woojiq
Copy link
Contributor Author

woojiq commented Sep 21, 2023

mim/mam (which uses TS).

Under the hood, both commands use

'm' => textobject::textobject_pair_surround_closest(
which uses function that doesn't use TS
None => surround::find_nth_closest_pairs_pos(slice, range, count),

Though, as @the-mikedavis mentioned, mm uses TS to find the matching brackets

I think we need to rewrite find_nth_closest_pairs_pos to use/reuse something like find_pair (but n times) when TS is present and fall back to the current implementation otherwise. Is it a good one?

@woojiq woojiq marked this pull request as draft September 24, 2023 15:43
@woojiq woojiq force-pushed the fix-panic-mdm branch 2 times, most recently from 7c6d268 to 844baa1 Compare September 28, 2023 14:39
@woojiq
Copy link
Contributor Author

woojiq commented Sep 28, 2023

I think I'm done. I've created a new function find_nth_closest_pairs_ts which under the hood uses find_matching_bracket_fuzzy - a tree-sitter version of finding nearest closing bracket. I added some tests:

  • surround_delete checks for crashes (linked issue) on mdm.
  • surround_replace_ts tests a new version of finding nearest pairs using tree-sitter.

Now "mim" or "mrm" seems more natural than before. If you have a selection that crosses multiple brackets, a pair of brackets will be found that completely covers the selection, which was not the case with the plain implementation (which assumed that the selection was just the cursor head).

@woojiq woojiq changed the title fix: crash on surround delete when direction is backward Add tree-sitter version of finding closest brackets Sep 28, 2023
@woojiq woojiq marked this pull request as ready for review September 28, 2023 18:57
@pascalkuthe pascalkuthe added E-medium Call for participation: Experience needed to fix: Medium / intermediate S-waiting-on-review Status: Awaiting review from a maintainer. A-command Area: Commands labels Sep 30, 2023
Copy link
Member

@pascalkuthe pascalkuthe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still need to do some testing but from reading trough the changes this looks great.

helix-core/src/surround.rs Outdated Show resolved Hide resolved
@pascalkuthe pascalkuthe linked an issue Oct 12, 2023 that may be closed by this pull request
pascalkuthe
pascalkuthe previously approved these changes Oct 13, 2023
Copy link
Member

@pascalkuthe pascalkuthe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look go to me. I will do some testing in my daily driver branch (I use mim/mam quite a bit)

@woojiq
Copy link
Contributor Author

woojiq commented Oct 13, 2023

I found one discrepancy. Since I combined the pairs into one list, quotes are now considered as pairs too. But if you use a file without the tree-sitter syntax, it will still skip mim|mam for quotes:

Try mim on this text { " cursor here " }

because current impl of nearest brackets first checks if the char is open and continues to look for a closed char if it is true.

if is_open_pair(ch) {

The code used to work correctly because we had a different set of pairs (without the quotes). But now that the pairs are merged into one, I believe it should also accept quotes as pairs so that all pairs from PAIRS are treated the same (they do with tree-sitter, this "bug" only applies to non-tree-sitter files). The solution is simple - give the check of the closing element a higher priority. Do I need to do this?

@pascalkuthe
Copy link
Member

Hmm good qustion we might wnat to just not support quotes for plaintext files since its not clear whether we are currently "inside" the pair or not. We do the same for match_bracket.

@woojiq
Copy link
Contributor Author

woojiq commented Oct 13, 2023

we might wnat to just not support quotes for plaintext

If so then it works as expected, but for clarity I might add something like if is_quote(ch) to the plain implementation to make it more obvious to readers what's going on.

@pascalkuthe
Copy link
Member

we might wnat to just not support quotes for plaintext

If so then it works as expected, but for clarity I might add something like if is_quote(ch) to the plain implementation to make it more obvious to readers what's going on.

I think a comment would be good 👍

@woojiq
Copy link
Contributor Author

woojiq commented Oct 14, 2023

So I took a closer look at the behavior of the code with gdb and found that this is not true:

We do the same for match_bracket.

On the master (this is not a side effect of my PR🙂), it's still going through all the text trying to find closing quote. But it just never finds it (redundant cpu cycles), because there is no opposite state for quotes. I don't want to do it in this PR, but I'll probably make another one when I have some free time to clean up match_brackets.rs and surround.rs a bit.
A few things need to be done:

  1. Name functions and variables accordingly. Some have "bracket" in the name but actually check all pairs, etc.
  2. Distinguish between quotes and brackets so there is no hidden flows when matching on plaintext.

@pascalkuthe
Copy link
Member

hmm I think you are right reading the code again. I think that is an oversight tough the original PR stated it worked that way.

I think I assumed from the naming of is_valid_bracket that it would exclude quotes (and therefor bail here:

if !is_valid_bracket(bracket) {
) during review so it would be good to clean that up.

It doesn't actually work tough if you test it (pressing mm on quotes in plaintext files does nothing).

@pascalkuthe
Copy link
Member

would really love to see this land and think its almost there. Are you interested in finishing this @woojiq?

@pascalkuthe pascalkuthe added this to the next milestone Apr 8, 2024
@woojiq
Copy link
Contributor Author

woojiq commented Apr 8, 2024

would really love to see this land and think its almost there. Are you interested in finishing this @woojiq?

Sure, afair this pr is almost ready, I'm going to look again and rebase in 1-2 days.

@woojiq
Copy link
Contributor Author

woojiq commented Apr 9, 2024

I've rebased and changed this PR a bit. To me it looks much better than before. I wrote few tests for mam.
Let me know if I need to explain more in the description or commit messages what this PR does (as it looks pretty obvious to me because I implemented this😀).

@woojiq woojiq changed the title Add tree-sitter version of finding closest brackets feat: find closest pairs using tree-sitter Apr 9, 2024
pascalkuthe
pascalkuthe previously approved these changes Apr 9, 2024
Copy link
Member

@pascalkuthe pascalkuthe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for finishing that this will be great to have!

helix-core/src/match_brackets.rs Show resolved Hide resolved
helix-term/tests/test/commands.rs Outdated Show resolved Hide resolved
the-mikedavis
the-mikedavis previously approved these changes Apr 24, 2024
Copy link
Member

@the-mikedavis the-mikedavis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, looks great! Could you rebase / merge master to resolve the conflicts?

@woojiq woojiq dismissed stale reviews from the-mikedavis and pascalkuthe via c5d5d9e April 24, 2024 19:35
@woojiq woojiq force-pushed the fix-panic-mdm branch 2 times, most recently from c5d5d9e to d9ba95d Compare April 24, 2024 19:42
@the-mikedavis the-mikedavis merged commit 839ec4a into helix-editor:master Apr 24, 2024
6 checks passed
@woojiq woojiq deleted the fix-panic-mdm branch April 25, 2024 05:56
@thomasaarholt
Copy link
Contributor

thomasaarholt commented Apr 26, 2024

@woojiq, I found that this doesn't work for quotation symbols (") in Python, and only on the opening quotation mark in Rust (for let foo = "hello world", mim only works with the cursor positioned on the first ", not the last one).

If you have the time and energy, would you mind taking a look?
See my comment here for more details.

@pascalkuthe
Copy link
Member

#4475 (comment)

Only the rudt case is abug either this PR, the pythpm case is an existing bug with mm that I wanted to fix for a while and you don't need to look into.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-command Area: Commands E-medium Call for participation: Experience needed to fix: Medium / intermediate S-waiting-on-review Status: Awaiting review from a maintainer.
Projects
None yet
4 participants