Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with .properties format using whitespace delimited key #2224

Open
mjustin opened this issue Sep 2, 2022 · 0 comments
Open

Issues with .properties format using whitespace delimited key #2224

mjustin opened this issue Sep 2, 2022 · 0 comments

Comments

@mjustin
Copy link

mjustin commented Sep 2, 2022

In looking over the Wikipedia example for .properties files, I noticed some cases that this library doesn't get correct. I've copied the example as a Pygments demo in case the Wikipedia page changes. This ticket is for some issues I noticed around key parsing when the key is delimited from the value by whitespace.

Per the Java Properties documentation:

[…] In addition to line terminators, this format considers the characters space (' ', '\u0020'), tab ('\t', '\u0009'), and form feed ('\f', '\u000C') to be white space.

[…]

The key contains all of the characters in the line starting with the first non-white space character and up to, but not including, the first unescaped '=', ':', or white space character other than a line terminator. All of these key termination characters may be included in the key by escaping them with a preceding backslash character; for example,

\:\=

would be the two-character key ":=". Line terminator characters can be included using \r and \n escape sequences. Any white space after the key is skipped; if the first non-white space character after the key is '=' or ':', then it is ignored and any white space characters after it are also skipped. All remaining characters on the line become part of the associated element string; if there are no remaining characters, the element is the empty string "". Once the raw character sequences constituting the key and element are identified, escape processing is performed as described above.

Pygments correctly highlights lines with a single space or single tab between the key and value, so long as the value doesn't contain a space or tab:

key value
key	value

On the other hand, Pygments does not correctly parse lines with multiple spaces or tabs between the key and value, or lines with values containing spaces:

key  value
key		value
key the value
key the:value
key the=value

For all these lines, "key" should be green indicating a key, and the text following the whitespace ("value", "the value", "the:value", or "the=value") should be in red indicating a value.

Additionally, a line consisting of a key followed by just whitespace should be highlighted as a key with no value, the same as if it were a key on a line by itself (The lines "key" and "key " should both have the word "key" highlighted.

Demo screenshot:

image

jmzambon added a commit to jmzambon/pygments that referenced this issue Sep 18, 2022
…limited key

Added:
- support for space delimitor in every case, included multiline value
- check for odd number of backslash escapes
- "!" as comment start
- support for escape of spaces and separators
Dropped:
- undocumented ";" and "//" comment start
jmzambon added a commit to jmzambon/pygments that referenced this issue Sep 18, 2022
…limited key

Added:
- support for space delimitor in every case, included multiline value
- check for odd number of backslash escapes
- "!" as comment start
- support for escape of spaces and separators
Dropped:
- undocumented ";" and "//" comment start
jmzambon added a commit to jmzambon/pygments that referenced this issue Sep 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant