Support comments in JSON #2049

kurtmckee · 2022-01-25T04:49:28Z

This adds support for comments in JSON. Valid JSON documents are not affected, but if comments appear in the document they will be parsed as comments now.

This added support doesn't claim to support any JSON supersets like JSONC or JSON5. It's just comments. 😄

Example, rendered in Monokai:

Please let me know if there's anything that I need to cleanup or improve. Thanks!

This deviates from the JSON specification but doesn't affect valid JSON.

not-my-profile · 2022-01-26T06:35:25Z

I would expect the JSON lexer to parse strictly according to the JSON spec (with comments resulting in Error tokens). So I think it would be better to introduce a new JSON+comments lexer, based on the JSON lexer.

Anteru · 2022-01-26T08:19:35Z

I'd like to better understand why. We're not here to educate people on correct JSON -- what's the advantage of splitting it out in yet another flavor of JSON? If people don't know if it's JSON or not, they can always use the JavaScript lexer, but having to select between 5 flavors of JSON hardly seems like the most user-friendly solution (and it would require us to implement JSON5, JSON C, etc.) I'd like to understand your motivation and why you think we should flag JSON with comments as Error.

not-my-profile · 2022-01-26T09:09:48Z

We're not here to educate people on correct JSON

Sure but I'd argue that we're also not here to misguide people (into thinking that comments are part of JSON).

it would require us to implement JSON5, JSON C, etc

This PR turns the lexer for JSON into a lexer for "JSON with Comments" (as established by Microsoft's VS Code). Introducing "JSON with comments" as a separate lexer changes nothing about whether or not we need to support JSON5. In any case supporting JSON5 would require a dedicated lexer. (Unless you're really advocating that the JSON lexer should handle all extensions of JSON ... but I don't think you are).

having to select between 5 flavors of JSON hardly seems like the most user-friendly solution

I agree but I think the proper way to address this would be to implement proper lexer guessing. I suggested a solution for that in #2005 (btw haven't heard back from you yet regarding my last reply).

kurtmckee · 2022-01-26T13:53:20Z

@not-my-profile,

I'd argue that we're also not here to misguide people (into thinking that comments are part of JSON).

I hear you. Many years ago I bought an MP3 player and eagerly started testing to confirm that it was following the MP3 spec...but I discovered it was able to play an animated GIF as cover art. That offended me, so I threw it in the trash. Other people might not have been so lucky, and might have assumed that ~~comments are part of JSON~~ animated GIFs are part of the MP3 spec.

Look, I'm not going to be drawn into a pharisaical discussion about spec purity. People, companies, and software products are adding comments to JSON documents, and this patch will address those real-world situations. It's okay if the patch isn't merged, but I'm not going to defend the lexer on this one point while you gloss over the fact that the existing lexer also doesn't validate that there are no trailing commas after final array items.

not-my-profile · 2022-01-26T14:24:33Z

Thanks for the sarcasm, I think it really adds to the discussion.

I never said that we shouldn't address the fact that people use extensions of JSON, I just think that there's a better way to do so.

while you gloss over the fact that the existing lexer also doesn't validate that there are no trailing commas after final array items

I indeed wasn't aware of that. I also think that emitting error tokens for such trailing commas would be preferable.

Anteru · 2022-01-26T16:07:48Z

Hey, please don't get too wound up about this. I very much appreciate the contributions both of you have made and I would not want this to end in bad blood.

Ultimately, the way Pygments is working is that it's not validating any input language, nor was it ever a goal of Pygments to do so (unless @birkenfeld has different goals in mind.) We have plenty of lexers which accept slightly more relaxed code for the sake of user convenience. It's also a question of discoverability -- a naive user might try JSON, see it works, move along, but not know how to find JSON+comments or select it if it doesn't. I'm leaning towards accepting this change as-is to reduce user pain at no (obvious) downside. I do hear your concern about highlighting invalid JSON but I'm afraid that ship has sailed unfortunately. As with every project used in the real-world, we'll have to make some decisions which will leave folks unhappy. I hope you can accept that and continue to contribute (and yes, I'm aware I didn't reply on the other thread yet -- I don't have a good solution for that either, nor did I spent sufficient time on said issue to give a good reply :/.)

not-my-profile · 2022-01-26T16:15:29Z

I hope you can accept that

Of course, no worries. If the JsonLexer handles more than JSON, I think it should be documented in its docstring.

birkenfeld · 2022-01-26T16:17:03Z

@Anteru is correct - this library is not meant to validate. Marking unrecognized things as error is not an attempt at validation, it is a refusal to guess more than anything else.

So since we have a good, and in many cases valid, interpretation of this syntax, it is worth supporting it without having another slightly different lexer, IMO.

Of course, it should be prominent in the docstring.

kurtmckee · 2022-01-26T16:27:16Z

@not-my-profile I apologize for the sarcasm. That stemmed from something unrelated to this discussion, and I'm sorry for it.

@birkenfeld I'll update the doc string to reflect that comments are also supported.

Anteru · 2022-01-26T19:11:46Z

Thanks everyone!

kurtmckee · 2022-01-27T22:06:26Z

It looks like some random input triggered an issue in the Elpi lexer:

! KeyboardInterrupt: interrupted ElpiLexer.get_tokens(): test_content='v{+MKj-AwTQIn:HAZiJ|23C;BwQHBoG)bOjlWg0ts5\'vRzu2+<i=v.fddOcMj+lQ8Bp$-V9rsephX(osSZPkMMX[#H_OLY\\GB]bVN2Df((TOqx}$&xF%@~I8Z\';&}{:uv_6`PGRbaVp/,fvK!;}t&D:Nx&0~Enn`a6\x7f3l(wP>-3w#%G]:#\x7f)-4F?^{@D3$1YUUt6kJ>_z@egLZ[jp6Y9&{^w`yKmy<sC*<cNS?x=#7q\x7f!/4|3j?11P5dN@GimkS?\'90^e=7Ba$2iU@fRu[\\W0IcJn*^)$\x7f\\ZY}C|\\pAaU<bulP]!H|L6;S/m.+,inoaE*\'C="Ho5?T"V5hAK)fWF5\'QM>JocLF`A+z\x7f~.Ju_q-_8.F%Szhy^R"q9`zk(9CUr%s!y=mEy4~\\grT)D1hh,e78/Wkldr|,"/>E2:%E"<8gQq#KdWctXTg7]x*1VNYt04;[DrR!bLOX7[m}~IIX4*e{]>.,\n' !

I don't think this is related to the JSON changes but I'll try to get this addressed.

kurtmckee · 2022-01-28T13:21:18Z

Got it. The Elpi lexer has pathological backtracking that can be triggered with the following minimal input:

from pygments.lexers.elpi import ElpiLexer

list(ElpiLexer().get_tokens("a" * 30))

Only lowercase characters will trigger pathological backtracking. Uppercase characters and digits are not a problem.

I'll open a ticket regarding this issue, but would like to address it in a separate PR.

Edit: Opened #2053 to track this.

Anteru · 2022-01-28T17:09:40Z

Thanks (for both!)

kurtmckee added 4 commits January 24, 2022 08:38

Support comments in JSON

832113a

This deviates from the JSON specification but doesn't affect valid JSON.

Update the JSON parser and tests to reach 100% code and branch coverage

233b452

Note work on JSON lexer

4461c4e

Fix a roundtrip error identified by a random test input in CI

325ccbc

This was referenced Jan 25, 2022

Request: lex JSON even if it contains comments #1739

Closed

Lexer request: add jsonc lexer #1680

Closed

Note that comments are supported, and that no validation is performed

f19c171

Anteru merged commit 22dca20 into pygments:master Jan 28, 2022

Anteru added this to the 2.12.0 milestone Jan 28, 2022

Anteru added the changelog-update Items which need to get mentioned in the changelog label Jan 28, 2022

Anteru self-assigned this Jan 28, 2022

kurtmckee deleted the json-comments branch January 31, 2022 14:42

Anteru removed the changelog-update Items which need to get mentioned in the changelog label Feb 20, 2022

birkenfeld mentioned this pull request Mar 7, 2022

Comments in JSON cannot be highlighted correctly #2083

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support comments in JSON #2049

Support comments in JSON #2049

kurtmckee commented Jan 25, 2022

not-my-profile commented Jan 26, 2022

Anteru commented Jan 26, 2022

not-my-profile commented Jan 26, 2022

kurtmckee commented Jan 26, 2022

not-my-profile commented Jan 26, 2022

Anteru commented Jan 26, 2022 •

edited

not-my-profile commented Jan 26, 2022

birkenfeld commented Jan 26, 2022

kurtmckee commented Jan 26, 2022 •

edited

Anteru commented Jan 26, 2022

kurtmckee commented Jan 27, 2022

kurtmckee commented Jan 28, 2022 •

edited

Anteru commented Jan 28, 2022

Support comments in JSON #2049

Support comments in JSON #2049

Conversation

kurtmckee commented Jan 25, 2022

not-my-profile commented Jan 26, 2022

Anteru commented Jan 26, 2022

not-my-profile commented Jan 26, 2022

kurtmckee commented Jan 26, 2022

not-my-profile commented Jan 26, 2022

Anteru commented Jan 26, 2022 • edited

not-my-profile commented Jan 26, 2022

birkenfeld commented Jan 26, 2022

kurtmckee commented Jan 26, 2022 • edited

Anteru commented Jan 26, 2022

kurtmckee commented Jan 27, 2022

kurtmckee commented Jan 28, 2022 • edited

Anteru commented Jan 28, 2022

Anteru commented Jan 26, 2022 •

edited

kurtmckee commented Jan 26, 2022 •

edited

kurtmckee commented Jan 28, 2022 •

edited