Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explain the difference with JCS / draft-rundgren-json-canonicalization-scheme-06 #5

Open
matrey opened this issue Jun 24, 2019 · 2 comments

Comments

@matrey
Copy link

matrey commented Jun 24, 2019

I stumbled upon this:
https://cyberphone.github.io/ietf-json-canon/
https://tools.ietf.org/html/draft-rundgren-json-canonicalization-scheme-06

Your work is listed there, with the remark "In contrast to JCS which is a serialization scheme, the listed efforts build on text level JSON to JSON transformations".

Appendix H. Other JSON Canonicalization Efforts

There are (and have been) other efforts creating "Canonical JSON".
Below is a list of URLs to some of them:

o https://tools.ietf.org/html/draft-staykov-hu-json-canonical-form-00 [7]

o https://gibson042.github.io/canonicaljson-spec/ [8]

o http://wiki.laptop.org/go/Canonical_JSON [9]

In contrast to JCS which is a serialization scheme, the listed
efforts build on text level JSON to JSON transformations.

Could you explain in layman's terms what are the actual differences, if any?
And which spec / library would you recommend for which usage?

@simon-greatrix
Copy link

A canonical JSON form should be able to represent anything JSON can represent in a unique way. Strings and numbers are complicated beasts when you dig down into their fine details and the various proposals differ in how they handle them.

http://wiki.laptop.org/go/Canonical_JSON
Avoids the complexity of floating point numbers by banning them completely. As JSON allows floating point numbers, this fails to actually be a canonical JSON format.

https://tools.ietf.org/html/draft-rundgren-json-canonicalization-scheme-06
Explicitly forbids "lone surrogates" in Strings which JSON standard clearly allows. Hence this also fails to be a valid canonical JSON format. To explain, Unicode represents some specialised characters (such as emoticons) with pairs of characters. The pair must consist of both a high surrogate and a low surrogate. It is invalid Unicode to have a surrogate outside of a pair, but it is valid JSON.

In my opinion, there is no good reason to generate invalid Unicode. If you want to send binary data, then Base-64 produces a more compact representation than UTF-8. If you want to use surrogates as markers, then you should be using the private use characters. However, "no good reason" does not allow us to ignore the standard that says invalid Unicode is allowed in JSON.

Additionally, this proposal requires a complicated method for representing floating point numbers. Complexity leads to inaccurate implementations and fragile applications built upon them.

In contrast to the above, the canonical representation described in this project allows all valid JSON to have a valid and unique representation.

@cyberphone
Copy link

@simon-greatrix @matrey None of the proposals are "perfect" for the simple reason that JSON was not designed to support canonicalization. That no such proposal has become a standard (real or de-facto) seems to say the same thing.

Textual canonicalization like this scheme is simple and covers the entire JSON specification but has downsides when it comes to integration in JSON tools. My take on this topic (draft-rundgren-etc) "cripples" JSON to the I-JSON level but is easier to integrate since it ultimately can be a part of a JSON serializer only. Number serialization is indeed a tricky problem but other people have done an awesome job in this area so I'm not particularly worried about that anymore. 5 different platforms currently perform identically on a set of 100 million test values.

The only real problem I have stumbled upon is described here: https://tools.ietf.org/html/draft-rundgren-json-canonicalization-scheme-13#appendix-E
That is, some canonicalization issues spill over to the application side as well which you of course do not want. OTOH, the remedy isn't rocket science and the current alternative (dressing messages in Base64Url), is at least as intrusive on applications.

It is possible that Base64 is the final solution but there are folks in the financial sector who are less keen on that. In fact, none of the Open Banking APIs use anything but clear text JSON but they do not use canonicalization, they rather bind to the HTTP body and put detached signature data in HTTP headers. Although working, this greatly complicates serialization, embedding and countersigning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants