Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parser accepts control chars 0x00 and 0x1F #2909

Closed
torsten-schenk opened this issue Sep 25, 2023 · 4 comments
Closed

Parser accepts control chars 0x00 and 0x1F #2909

torsten-schenk opened this issue Sep 25, 2023 · 4 comments
Labels

Comments

@torsten-schenk
Copy link

In the file jv_parse.c, I cam across the following line (currently line 497):

if (c >0 && c < 0x001f) // reject character

I also tested it with an actual .json-file containing strings with unescaped character 0x00 / 0x1f, in both cases the strings got accepted.

@wader
Copy link
Member

wader commented Sep 25, 2023

Hi, i get error for 0x00 but 0x1f is accepted. What version are you using and can be elaborate a bit more what behaviour your expecting?

$ jq --version
jq-1.7

$ echo -ne '"\x00hello"' | jq .
jq: parse error: Unfinished string at EOF at line 1, column 1

$ echo -ne '"\x1fhello"' | jq .
"\u001fhello"

$ echo -ne '"\x10hello"' | jq .
jq: parse error: Invalid string: control characters from U+0000 through U+001F must be escaped at line 1, column 8

https://datatracker.ietf.org/doc/html/rfc8259#section-7

All Unicode characters may be placed within the
quotation marks, except for the characters that MUST be escaped:
quotation mark, reverse solidus, and the control characters (U+0000
through U+001F).

So should the reject condition actually be if (c >= 0 && c =< 0x001f)?

@torsten-schenk
Copy link
Author

torsten-schenk commented Sep 25, 2023

No problem. I attached three .json-files, each containing a control character in the string.

If characters 00 of 1f (test_00.txt and test_1f.txt) are in the string, the parser accepts this string and escapes the characters when printing them.

If character 01 (test_01.txt) and other control characters are in the string, the parser rejects the string with the message "jq: parse error: Invalid string: control characters from U+0000 through U+001F must be escaped at line 1, column 4"

The expected behaviour is, that all control characters are rejected by the parser, so that this error message is printed for all three attached files.

You actually replicated this behaviour in your second test, echo -ne '"\x1fhello"' | jq . should have been rejected in the same way as echo -ne '"\x10hello"' | jq .

So should the reject condition actually be if (c >= 0 && c =< 0x001f)?

Yes, that would solve the issue.

test_00.txt
test_01.txt
test_1f.txt

@wader
Copy link
Member

wader commented Sep 25, 2023

@nicowilliams was there some reason 0x00 and 0x1f was excluded in the check? used to be handled somewhere else etc?

@nicowilliams
Copy link
Contributor

@nicowilliams was there some reason 0x00 and 0x1f was excluded in the check? used to be handled somewhere else etc?

Nope, just an off-by-one bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants