-
Notifications
You must be signed in to change notification settings - Fork 677
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support comments in JSON #2049
Support comments in JSON #2049
Conversation
This deviates from the JSON specification but doesn't affect valid JSON.
I would expect the JSON lexer to parse strictly according to the JSON spec (with comments resulting in |
I'd like to better understand why. We're not here to educate people on correct JSON -- what's the advantage of splitting it out in yet another flavor of JSON? If people don't know if it's JSON or not, they can always use the JavaScript lexer, but having to select between 5 flavors of JSON hardly seems like the most user-friendly solution (and it would require us to implement JSON5, JSON C, etc.) I'd like to understand your motivation and why you think we should flag JSON with comments as |
Sure but I'd argue that we're also not here to misguide people (into thinking that comments are part of JSON).
This PR turns the lexer for JSON into a lexer for "JSON with Comments" (as established by Microsoft's VS Code). Introducing "JSON with comments" as a separate lexer changes nothing about whether or not we need to support JSON5. In any case supporting JSON5 would require a dedicated lexer. (Unless you're really advocating that the JSON lexer should handle all extensions of JSON ... but I don't think you are).
I agree but I think the proper way to address this would be to implement proper lexer guessing. I suggested a solution for that in #2005 (btw haven't heard back from you yet regarding my last reply). |
I hear you. Many years ago I bought an MP3 player and eagerly started testing to confirm that it was following the MP3 spec...but I discovered it was able to play an animated GIF as cover art. That offended me, so I threw it in the trash. Other people might not have been so lucky, and might have assumed that Look, I'm not going to be drawn into a pharisaical discussion about spec purity. People, companies, and software products are adding comments to JSON documents, and this patch will address those real-world situations. It's okay if the patch isn't merged, but I'm not going to defend the lexer on this one point while you gloss over the fact that the existing lexer also doesn't validate that there are no trailing commas after final array items. |
Thanks for the sarcasm, I think it really adds to the discussion. I never said that we shouldn't address the fact that people use extensions of JSON, I just think that there's a better way to do so.
I indeed wasn't aware of that. I also think that emitting error tokens for such trailing commas would be preferable. |
Hey, please don't get too wound up about this. I very much appreciate the contributions both of you have made and I would not want this to end in bad blood. Ultimately, the way Pygments is working is that it's not validating any input language, nor was it ever a goal of Pygments to do so (unless @birkenfeld has different goals in mind.) We have plenty of lexers which accept slightly more relaxed code for the sake of user convenience. It's also a question of discoverability -- a naive user might try JSON, see it works, move along, but not know how to find JSON+comments or select it if it doesn't. I'm leaning towards accepting this change as-is to reduce user pain at no (obvious) downside. I do hear your concern about highlighting invalid JSON but I'm afraid that ship has sailed unfortunately. As with every project used in the real-world, we'll have to make some decisions which will leave folks unhappy. I hope you can accept that and continue to contribute (and yes, I'm aware I didn't reply on the other thread yet -- I don't have a good solution for that either, nor did I spent sufficient time on said issue to give a good reply :/.) |
Of course, no worries. If the |
@Anteru is correct - this library is not meant to validate. Marking unrecognized things as error is not an attempt at validation, it is a refusal to guess more than anything else. So since we have a good, and in many cases valid, interpretation of this syntax, it is worth supporting it without having another slightly different lexer, IMO. Of course, it should be prominent in the docstring. |
@not-my-profile I apologize for the sarcasm. That stemmed from something unrelated to this discussion, and I'm sorry for it. @birkenfeld I'll update the doc string to reflect that comments are also supported. |
Thanks everyone! |
It looks like some random input triggered an issue in the Elpi lexer:
I don't think this is related to the JSON changes but I'll try to get this addressed. |
Got it. The Elpi lexer has pathological backtracking that can be triggered with the following minimal input: from pygments.lexers.elpi import ElpiLexer
list(ElpiLexer().get_tokens("a" * 30)) Only lowercase characters will trigger pathological backtracking. Uppercase characters and digits are not a problem. I'll open a ticket regarding this issue, but would like to address it in a separate PR. Edit: Opened #2053 to track this. |
Thanks (for both!) |
This adds support for comments in JSON. Valid JSON documents are not affected, but if comments appear in the document they will be parsed as comments now.
This added support doesn't claim to support any JSON supersets like JSONC or JSON5. It's just comments. 😄
Please let me know if there's anything that I need to cleanup or improve. Thanks!