Skip to content

JSON Lexer produces Error tokens when whitespace surrounds ":" char #2010

@jamesls

Description

@jamesls

The JSON lexer will produce Error tokens if whitespace surrounds the : char. Repro:

from pygments.lexers.data import JsonLexer
from pprint import pprint


lexer = JsonLexer()
pprint(list(lexer.get_tokens('{"foo" : "bar"}')))

Current behavior:

[(Token.Punctuation, '{'),
 (Token.Name.Tag, '"foo"'),
 (Token.Error, ' '),                 # <---- causes an exception to be raised
 (Token.Punctuation, ':'),
 (Token.Text.Whitespace, ' '),
 (Token.Literal.String.Double, '"bar"'),
 (Token.Punctuation, '}'),
 (Token.Text.Whitespace, '\n')]

Previous behavior:

[(Token.Punctuation, '{'),
 (Token.Name.Tag, '"foo"'),
 (Token.Text, ' '),
 (Token.Punctuation, ':'),
 (Token.Text, ' '),
 (Token.Literal.String.Double, '"bar"'),
 (Token.Punctuation, '}'),
 (Token.Text, '\n')]

I believe this was introduced in b4f0583. Now that a whitespace token is being used instead of text, the lexer will hit the fallback cause of producing an Error token:

elif character == ':':
# Yield from the queue. Replace string token types.
for _start, _token, _text in queue:
if _token is Text:
yield _start, _token, _text
elif _token is String.Double:
yield _start, Name.Tag, _text
else:
yield _start, Error, _text

I noticed this through Sphinx, where any JSON code block that has whitespace around : produces this warning:

myfile.rst:Could not lex literal_block as "json". Highlighting skipped.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions