Inconsistent quote matching behaviour with multiple selections #7012

aral · 2023-05-10T15:37:32Z

Summary

See screenshot:

This is the result of a ' on a multiple selection.

On some lines, only a single quote character is added (correct). On some, two.

(It looks like if the line ends with a dot, two single quotes are added. If there is no dot, just one.)

Reproduction Steps

See summary.

Helix log

n/a

Platform

Linux

Terminal Emulator

WezTerm

Helix Version

22.12-489-g786c8c8b

pascalkuthe · 2023-05-10T15:45:21Z

cc @dead10ck

I skimmed the code and it looks like auto pairs doesn't actually look for pairs in case of same-char pairs (like ') and instead simply always inserts a pair for non-alphanumeric chars (which matches your example where sentences that end with . receive a closing char). I ma not sure if we can do better without syntax aware matching (so TS based). I wonder how other editors handle this

mmaslyukov · 2023-05-12T07:50:37Z

What if count the number of "same-char" symbols in a line? If a number is odd, then put only one char.

pascalkuthe · 2023-05-12T10:09:18Z

That would be a pretty bad heuristic and would often endup to autpairs not firing when they should.

dead10ck · 2023-05-15T14:23:10Z

Yeah this is unfortunately behaving as designed currently. It intentionally does not auto pair when the preceding char is an alphanumeric in order to handle apostrophes in prose. Using TS for more language aware heuristics might help in some cases, but would be a maintenance heavy approach, since every grammar will have a different set of nodes to look for. I'm not sure what the solution is here.

pascalkuthe · 2023-05-15T15:02:40Z

There seems to be some prior art around using TS in nvim: https://github.com/windwp/nvim-autopairs#treesitter

It seems they implemented the following rules:

for quotes (or really all evenly matched pairs) they use the rule proposed by @mmaslyukov above, I might have been a bit too quick to dismiss that. I played around with it a bit and it actually works better than I expected. Although their handling is a bit more complex (for example you need to somehow handle liftimes 'foo for rust, they do this with a simple regex that matches [^&]' but that's not a great solution since &'f' is valid rust and represents a char reference).
slightly smarter regex patterns. These are somewhat language specific tough. (see https://github.com/windwp/nvim-autopairs/blob/7747bbae60074acf0b9e3a4c13950be7a2dff444/lua/nvim-autopairs/rules/basic.lua#L39)
they simply specify a list of TS nodes inside of which autoparis are disabled. These are typically the string literal node of the corresponding language. This works pretty well because an unterminated string literal usually spans the entire file (or at least until the next ").

So my idea for solving this would be:

Add a language-specific config option that contains a list of nodes inside which auto pairs are disabled. This should not be too much work to maintain since its usually just string or string-literal
Add the heuristic suggested by @mmaslyukov but with one twist: also ignore quotes inside the previously specified TS scopes. This would allow us to specify liftime and reference_type as an ignored scope for rust (and make ' pairs work as a result)

dead10ck · 2023-05-15T15:45:23Z

The problem with TS is that it works very inconsistently depending upon whether it can recognize the node you're starting to type when it's incomplete. Specifying to ignore auto pairs inside line-wise comments or an already-complete string literal would work fine, but not for lifetime nodes. For example, if we wanted to type this line:

let foo: &'static str = "foo";

The complete line parses to:

(let_declaration
   pattern: (identifier)
   type: (reference_type
     (lifetime
       (identifier))
     type: (primitive_type))
   value: (string_literal))

But as we're typing it, e.g. once we get here:

let foo: &'

It does not recognize that we're starting a lifetime specifier

  (ERROR
    (identifier))

We should also consider that this problem also affects plaintext files, as in the example given in this issue, where arguably the omission of the pair is more frequently desired than it is in code, since plaintext is more often prose.

Moreover, I suspect the simple even/odd check will not work well in cases of plaintext prose since any apostrophes will throw off the count, and the result would be that sometimes it omits the auto pair and sometimes it doesn't, seemingly at random depending on how many apostrophes are used within that sentence.

pascalkuthe · 2023-05-15T16:21:50Z

Hmm yeah for plaintext I think we can't do much better than we are currently doing. Altough the even/odd matching could still be used for double quotes since those are not really used like an apastrope. ' is just overused I gues :D

For programming languages tree sitter could just be added as an enhancement (basically do better if we can). Regarding error nodes TS actually gets this right most of the time:

fn bar(){
    let foo : &
    let foo = 2;
}

here & does have the reference_type node. It only doesn't work for the last let binding in a function:

fn bar(){
    let foo = 2;
    let foo : &
}

we don't have an auto pair for ' in rust at all right now so we could just leave it disabled. But for string literals, this tactic should work really well tough since those will greedily parse all text following them.

dead10ck · 2023-05-15T20:57:12Z

Oh interesting. Yeah agreed, I think it makes sense to use TS where it can concretely improve the situation, like comments.

But yeah, I share your fear that we can't really do much better for plaintext than we do now. There's simply not enough information to divine when a single quote should be paired and when it's just an apostrophe.

pascalkuthe · 2024-04-14T01:26:36Z

I think its fair to say that #8294 closes this. There are cases where having a plaintext version is useful so I would like to keep the current mi" unchanegd.

For most cases you will be able to use mam/mim with #8294 this will work both on and in quotes. It will also be unambigous since there are usually not other TS nodes inside string literals (so there wouldn't be brackets that would be selected first instead for example)

aral added the C-bug Category: This is a bug label May 10, 2023

pascalkuthe mentioned this issue Apr 14, 2024

feat: find closest pairs using tree-sitter #8294

Merged

pascalkuthe mentioned this issue Apr 14, 2024

Improve Matching Behavior on Quotation Marks #8253

Closed

the-mikedavis closed this as completed in #8294 Apr 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent quote matching behaviour with multiple selections #7012

Inconsistent quote matching behaviour with multiple selections #7012

aral commented May 10, 2023 •

edited

pascalkuthe commented May 10, 2023

mmaslyukov commented May 12, 2023

pascalkuthe commented May 12, 2023

dead10ck commented May 15, 2023

pascalkuthe commented May 15, 2023 •

edited

dead10ck commented May 15, 2023

pascalkuthe commented May 15, 2023

dead10ck commented May 15, 2023

pascalkuthe commented Apr 14, 2024

Inconsistent quote matching behaviour with multiple selections #7012

Inconsistent quote matching behaviour with multiple selections #7012

Comments

aral commented May 10, 2023 • edited

Summary

Reproduction Steps

Helix log

Platform

Terminal Emulator

Helix Version

pascalkuthe commented May 10, 2023

mmaslyukov commented May 12, 2023

pascalkuthe commented May 12, 2023

dead10ck commented May 15, 2023

pascalkuthe commented May 15, 2023 • edited

dead10ck commented May 15, 2023

pascalkuthe commented May 15, 2023

dead10ck commented May 15, 2023

pascalkuthe commented Apr 14, 2024

aral commented May 10, 2023 •

edited

pascalkuthe commented May 15, 2023 •

edited