Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent quote matching behaviour with multiple selections #7012

Closed
aral opened this issue May 10, 2023 · 9 comments · Fixed by #8294
Closed

Inconsistent quote matching behaviour with multiple selections #7012

aral opened this issue May 10, 2023 · 9 comments · Fixed by #8294
Labels
C-bug Category: This is a bug

Comments

@aral
Copy link
Contributor

aral commented May 10, 2023

Summary

See screenshot:

This is the result of a ' on a multiple selection.

On some lines, only a single quote character is added (correct). On some, two.

(It looks like if the line ends with a dot, two single quotes are added. If there is no dot, just one.)

image

Reproduction Steps

See summary.

Helix log

n/a

Platform

Linux

Terminal Emulator

WezTerm

Helix Version

22.12-489-g786c8c8b

@aral aral added the C-bug Category: This is a bug label May 10, 2023
@pascalkuthe
Copy link
Member

cc @dead10ck

I skimmed the code and it looks like auto pairs doesn't actually look for pairs in case of same-char pairs (like ') and instead simply always inserts a pair for non-alphanumeric chars (which matches your example where sentences that end with . receive a closing char). I ma not sure if we can do better without syntax aware matching (so TS based). I wonder how other editors handle this

@mmaslyukov
Copy link

What if count the number of "same-char" symbols in a line? If a number is odd, then put only one char.

@pascalkuthe
Copy link
Member

That would be a pretty bad heuristic and would often endup to autpairs not firing when they should.

@dead10ck
Copy link
Member

Yeah this is unfortunately behaving as designed currently. It intentionally does not auto pair when the preceding char is an alphanumeric in order to handle apostrophes in prose. Using TS for more language aware heuristics might help in some cases, but would be a maintenance heavy approach, since every grammar will have a different set of nodes to look for. I'm not sure what the solution is here.

@pascalkuthe
Copy link
Member

pascalkuthe commented May 15, 2023

There seems to be some prior art around using TS in nvim: https://github.com/windwp/nvim-autopairs#treesitter

It seems they implemented the following rules:

  • for quotes (or really all evenly matched pairs) they use the rule proposed by @mmaslyukov above, I might have been a bit too quick to dismiss that. I played around with it a bit and it actually works better than I expected. Although their handling is a bit more complex (for example you need to somehow handle liftimes 'foo for rust, they do this with a simple regex that matches [^&]' but that's not a great solution since &'f' is valid rust and represents a char reference).
  • slightly smarter regex patterns. These are somewhat language specific tough. (see https://github.com/windwp/nvim-autopairs/blob/7747bbae60074acf0b9e3a4c13950be7a2dff444/lua/nvim-autopairs/rules/basic.lua#L39)
  • they simply specify a list of TS nodes inside of which autoparis are disabled. These are typically the string literal node of the corresponding language. This works pretty well because an unterminated string literal usually spans the entire file (or at least until the next ").

So my idea for solving this would be:

  • Add a language-specific config option that contains a list of nodes inside which auto pairs are disabled. This should not be too much work to maintain since its usually just string or string-literal
  • Add the heuristic suggested by @mmaslyukov but with one twist: also ignore quotes inside the previously specified TS scopes. This would allow us to specify liftime and reference_type as an ignored scope for rust (and make ' pairs work as a result)

@dead10ck
Copy link
Member

The problem with TS is that it works very inconsistently depending upon whether it can recognize the node you're starting to type when it's incomplete. Specifying to ignore auto pairs inside line-wise comments or an already-complete string literal would work fine, but not for lifetime nodes. For example, if we wanted to type this line:

let foo: &'static str = "foo";

The complete line parses to:

(let_declaration
   pattern: (identifier)
   type: (reference_type
     (lifetime
       (identifier))
     type: (primitive_type))
   value: (string_literal))

But as we're typing it, e.g. once we get here:

let foo: &'

It does not recognize that we're starting a lifetime specifier

  (ERROR
    (identifier))

We should also consider that this problem also affects plaintext files, as in the example given in this issue, where arguably the omission of the pair is more frequently desired than it is in code, since plaintext is more often prose.

Moreover, I suspect the simple even/odd check will not work well in cases of plaintext prose since any apostrophes will throw off the count, and the result would be that sometimes it omits the auto pair and sometimes it doesn't, seemingly at random depending on how many apostrophes are used within that sentence.

@pascalkuthe
Copy link
Member

Hmm yeah for plaintext I think we can't do much better than we are currently doing. Altough the even/odd matching could still be used for double quotes since those are not really used like an apastrope. ' is just overused I gues :D

For programming languages tree sitter could just be added as an enhancement (basically do better if we can). Regarding error nodes TS actually gets this right most of the time:

fn bar(){
    let foo : &
    let foo = 2;
}

here & does have the reference_type node. It only doesn't work for the last let binding in a function:

fn bar(){
    let foo = 2;
    let foo : &
}

we don't have an auto pair for ' in rust at all right now so we could just leave it disabled. But for string literals, this tactic should work really well tough since those will greedily parse all text following them.

@dead10ck
Copy link
Member

Oh interesting. Yeah agreed, I think it makes sense to use TS where it can concretely improve the situation, like comments.

But yeah, I share your fear that we can't really do much better for plaintext than we do now. There's simply not enough information to divine when a single quote should be paired and when it's just an apostrophe.

@pascalkuthe
Copy link
Member

I think its fair to say that #8294 closes this. There are cases where having a plaintext version is useful so I would like to keep the current mi" unchanegd.

For most cases you will be able to use mam/mim with #8294 this will work both on and in quotes. It will also be unambigous since there are usually not other TS nodes inside string literals (so there wouldn't be brackets that would be selected first instead for example)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants