Skip to content

Conversation

@jrwats
Copy link
Contributor

@jrwats jrwats commented Oct 10, 2020

The old regex was missing a significant number of legitimate spaces per the JSON spec: https://spec.json5.org/#white-space. In effect this meant, we wouldn't strip trailing commas that were followed by these valid characters. Make the regex exhaustive.

I'm sprinkling all the relevant space characters in the tests as defined by the spec to ensure our regex catches them all.

NOTE: if you try manually testing dap-launch-sanitize-json in json-mode or js-mode, you're gonna have a bad time. Test this stuff in fundamental-mode if you find yourself so inclined...

@nbfalcon
Copy link
Member

Here is the relevant spec: https://spec.json5.org/#white-space.

@nbfalcon
Copy link
Member

Note also "any other character in the unicode whitespace category".

@kiennq
Copy link
Member

kiennq commented Oct 11, 2020

The regex for [:space:] states that

[:space:] a whitespace character, as defined by the syntax table, but typically [ \t\r\n\v\f], which includes the newline character

If just space doesn't match new line, perhaps we can manually include those characters then?

@jrwats
Copy link
Contributor Author

jrwats commented Oct 11, 2020

If just space doesn't match new line...

Yeah so... it does match newline in the unit test. It doesn't when I try running it manually, and similarly when testing interactively with simpler regexes like [[:space:]\n] - I haven't had time to debug why it's behaving that way.

Ahaaaa.... ok - figured it out partially. It's definitely context-dependent. I happened to be in js-mode(as this is what I have setup for the auto-mode-alist for json files). When in fundamental-mode, [[:space:]]+ works just like we want, matching newlines and contingent spaces. When in js-mode, it doesn't include newlines in its match. Now, what js-mode is doing to cause this, I don't quite know yet...

I'll update to remove the gratuitous ?\C-j.

, perhaps we can manually include those characters then?

I'm hesitant to try hand-coding ranges for unicode. [:blank:] looks like it catches all the ones mentioned
https://www.compart.com/en/unicode/category/Zs h/t @nbfalcon

It looks like both :blank: and :space: miss vertical tab "\v" (ASCII 11), so that will have to be hard-coded.

@jrwats jrwats force-pushed the better_regex branch 3 times, most recently from f297db4 to 6056c84 Compare October 11, 2020 04:17
@jrwats
Copy link
Contributor Author

jrwats commented Oct 11, 2020

Okay... Lesson learned - I should have RTFM, :P
From https://www.emacswiki.org/emacs/RegularExpression:

The syntax table depends on the current mode

@jrwats
Copy link
Contributor Author

jrwats commented Oct 11, 2020

So [:space:] covers line feed (0xA), form feed (0xC), and carriage return (0xD), which are all missed by [:blank:].
[:blank:] covers a number of unicode spaces not covered by [:space:]: nbsp, Ogham, NNBSP, MMSP, ideographic.

@kiennq
Copy link
Member

kiennq commented Oct 11, 2020

Btw, the json-mode by MELPA is kind of broken since it's using js-mode syntax table, which doesn't allow - as a symbol in property.
You should probably use https://github.com/kiennq/json-mode for json, which will provide both json-mode and jsonc-mode which supports correct syntax table.

@jrwats
Copy link
Contributor Author

jrwats commented Oct 12, 2020

@kiennq I see you accepted... are you able to merge?

@kiennq
Copy link
Member

kiennq commented Oct 12, 2020

@nbfalcon @yyoncho

@yyoncho yyoncho merged commit 2d861a3 into emacs-lsp:master Oct 13, 2020
@yyoncho
Copy link
Member

yyoncho commented Oct 13, 2020

@jrwats thank you!

@jrwats jrwats deleted the better_regex branch October 13, 2020 13:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants