Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

std/json fails to escape most non-printables, breaking generation and parsing #10541

Closed
niv opened this issue Feb 3, 2019 · 4 comments

Comments

Projects
None yet
4 participants
@niv
Copy link
Contributor

commented Feb 3, 2019

  • The stdlib json library fails to escape all high bytes and apparently also vertical tab. This results in raw bytes slipping into the generated json, breaking the format and any attempt at parsing it.

  • The generated unicode escape sequences "\uXXXX" are generated from ordinal values, but the spec requires them to be hexadecimal, resulting in the wrong character being written out. See: https://www.ecma-international.org/ecma-262/6.0/#sec-source-text

Example

import json

var ary = newJArray()
for i in 0..255: ary.add(%($i & "=" & $i.chr))
let first = ary.pretty
let second = first.parseJson.pretty
echo first
echo second
doAssert(first == second)

Current Output

  "10=\n",
  "11=
      ",
  "12=\f",
..
  "31=\u0031",
  "32= ",
..
  "127=�",
  "128=�",
..

Expected Output

  "10=\n",
  "11=\v",
  "12=\f",
..
  "31=\u001F",
  "32= ",
..
  "127=\u007F,
  "128=\u0080",
..

Possible Solution

https://github.com/nim-lang/Nim/blob/master/lib/pure/json.nim#L607

This proc (and the accompanying parser) should probably be rewritten according to spec.

Additional Information

Discovered here: niv/neverwinter.nim#4

Tested on 19.2 and latest #head.

@LemonBoy LemonBoy added the Stdlib label Feb 3, 2019

@Araq

This comment has been minimized.

Copy link
Member

commented Feb 4, 2019

Wasn't this fixed and backported a couple of weeks ago?

@niv

This comment has been minimized.

Copy link
Contributor Author

commented Feb 4, 2019

We'll be retesting on .4 and I will report back here.

@narimiran

This comment has been minimized.

Copy link
Member

commented Feb 9, 2019

Wasn't this fixed and backported a couple of weeks ago?

Yes, here is the PR: #10437 — that solves one point of this issue ("The generated unicode escape sequences "\uXXXX" are generated from ordinal values, but the spec requires them to be hexadecimal")

The remaining points are:

  • no vertical tabulator (\v) at position 11
  • positions 128 .. 255

narimiran added a commit to narimiran/Nim that referenced this issue Apr 9, 2019

narimiran added a commit to narimiran/Nim that referenced this issue Apr 10, 2019

@narimiran

This comment has been minimized.

Copy link
Member

commented Apr 10, 2019

@niv #10987 should fix the problem with \v

I tried to also fix the other problem (positions 128 .. 255) based on your example, but my fix only caused other problems so I reverted it.
Note that parsing unicode characters works ok, e.g. change a line in your example to ary.add(%($i & "=" & toUTF8(Rune(i)))) and you'll see the correct characters are produces, and first == second.

@Araq Araq closed this in 2608bc3 Apr 10, 2019

narimiran added a commit that referenced this issue May 7, 2019

json: add '\v' support, fixes #10541 (#10987)
(cherry picked from commit 2608bc3)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.