Concise encoding for JSON Schema #259

mkovatsc · 2017-02-24T11:43:07Z

JSON Schema is highly relevant for machine-to-machine communication. The Internet of Things is expected to encompass a large share of resource-constrained devices. For semantic interoperability, they also require access to technologies such as JSON Schema.

Thus, the the authors should also look into concise encodings for JSON Schema, for instance, CBOR or EXI. CBOR is often used in conjunction with tags and numeric identifiers in IANA registries to overcome the issues of string identifiers. For details, the overall system architecture needs to be checked (how many and how often are Schemas shared, how are they stored, etc.)

handrews · 2017-02-27T22:24:16Z

We also have a note to investigate EXI: #13

For details, the overall system architecture needs to be checked (how many and how often are Schemas shared, how are they stored, etc.)

We definitely expect that schemas are often pre-packaged and loaded from a local store indexed by the URI rather than being fetched from the URI. Fully dynamic systems will want to have clients download schemas as needed, but the spec says that schemas SHOULD be cacheable for long periods of time (I would argue indefinitely- if you need to change the schema it should get a new URI, but that's just me, and I'm sure someone will want to use a "latest" URI for schemas):

HTTP servers SHOULD set long-lived caching headers on JSON Schemas. HTTP clients SHOULD observe caching headers and not re-request documents within their freshness period. Distributed systems SHOULD make use of a shared cache and/or caching proxy.

danielpeintner · 2017-03-29T13:41:44Z

Note: I think that JSON schema could be easily improved w.r.t. efficient representations by providing more information about the actual type of a value.

Let me give you an example

        "currentTime": {
            "type": "string",
            "format": "date-time"
        }

This snippet is fine and allows for example EXI4JSON to pick the right codec (in this case dateTime).

A similar format identifier could be defined in http://json-schema.org/latest/json-schema-validation.html#rfc.section.7.3 for example for binary data also. Doing so would allow efficient representations easily detect binary data and store it much more efficiently.

Besides date-Time as a whole also "time" or "date" might be of interest which seems to be removed...

Yet another aspect could be regular expressions.

I hope this input is helpful. Thanks!

handrews · 2017-04-30T19:07:02Z

This seems relevant:
https://github.com/quartzjer/JSCN

JSON Constrained Notation for encoding into CBOR. Doesn't look like he's published his first I-D yet but the repo is very active.

handrews · 2017-04-30T19:11:26Z

@danielpeintner see #199 for "date" and "time". As for "regex" as a format, @Julian and @awwright were discussing the possibility that it was removed by accident at some point so feel free to file an issue about that if you'd like to track it.

danielpeintner · 2017-05-02T11:52:36Z

@handrews I am agreeing with what has been said in #199 ...

Once you get back to the discussion "which" format should be in and which not I would argue for having one format to describe binary data in JSON.
Having that said, I also think the list of available formats should not be very long.

handrews · 2017-05-02T14:50:16Z

@danielpeintner JSON Hyper-Schema has a feature for encoded binary data:
https://tools.ietf.org/html/draft-wright-json-schema-hyperschema-01#section-5.3

Would that work for you? I know that WoT's Thing Description is not using Hyper-Schema, but would the feature handle the case you have in mind?

danielpeintner · 2017-05-03T07:41:04Z

Yes, the JSON Hyper-Schema feature would work!

It defines that a string should be interpreted as binary data and that's exactly what I was looking for.

handrews · 2017-08-18T18:31:25Z

@danielpeintner @mkovatsc Note that there is now a draft for JSON Constrained Notation.

I have not dug into it but does it cover some of what you need? While we can't officially reference a draft, if it is covering the topic I would rather allow it to be handled there. Should that stall, we can always pull it in if needed.

If JSCN is useful, what, if anything, is necessary for JSON Schema to say about it?

handrews · 2017-08-22T21:43:46Z

@danielpeintner I have opened a discussion on using "media" outside of Hyper-Schema at #363

handrews · 2017-08-22T21:48:34Z

@mkovatsc have you had a chance to look at JSCN (JSON Constrained Notation) to see how much it would help? If it is useful, then we can just focus on anything we need to do here (registering tags for schema keywords?) I don't have time to dig into this myself right now, but would be happy to see it move forwards. If not now, then we can come back to it after publishing the next draft in late October.

mkovatsc · 2017-08-23T14:21:59Z

I had a look at the JSCN and am not fully sure what to make from it. It more tries to be a string compressor that wants to exploit the fact that the string represents JSON -- as it tries to preserve even things like whitespaces.

In general, it should be clear, that true, false, and null should be mapped to true, false, and null respectively. Numbers should not lose precision, yes, but when consuming JSON encoded data, nobody cares if there are semantically equivalent notations, such as 1 vs 1.0 or if the exponent character is upper or lower case.

There are even things that will not work out such as "Ordering of key/value pairs in JSON objects and CBOR maps MUST be preserved." This does not even hold for JSON implementations by definition of objects/dictionaries.

handrews · 2017-08-23T14:55:50Z

@mkovatsc thank you for taking the time to investigate that. I wasn't quite sure what to make of it either but have been too focused on the next hyper-schema draft to really dig into it and its relation to CBOR.

It sounds like we can decisively disregard JSCN for our purposes, meaning we should define the most efficient possible mapping into CBOR ourselves. I gather this would involve registering a set of tags?

I will not have time to work on this for the draft that is due in October. Do you or anyone else from WoT have time to make a proposal? If not, do you think it's worth poking whatever CBOR community forums exist to see if someone in the CoAPI/CBOR world does?

handrews · 2017-10-25T19:58:20Z

@danielpeintner note that in the forthcoming draft we have moved the binary data media object over to the validation spec as contentMediaType and contentEncoding (as these names are in line with the rest of both validation and hyper-schema, while the media sub-object was a weird exception).

danielpeintner · 2017-10-26T06:30:02Z

Thanks for the information!

epoberezkin · 2017-11-10T22:04:44Z

@handrews I was wondering why did we need "contentMediaType" and "contentEncoding" and not just some format(s)?

handrews · 2017-11-10T22:39:45Z

@epoberezkin I actually did think about that and came up with a good reason why not. Now if only I can remember what it was....

I know I considered just saying "any media type can be a format", but given that "format" is not currently media type-oriented that felt awkward. And you'd still need "contentEncoding" or something similar. The media type and the encoding are (mostly) orthogonal.

Also, media types are extensible (including both registered and unregistered media types), and so are formats, and having two different types of extensible things in the same keyword seemed like a bad idea.

I really feel like I had a strong reason than either of those, but I'll have to think and see if I can remember it. As of right now, I'd say that these concepts seemed distinct and self-contained enough to be worth the other keywords.

Oh, one more thing was that I figured that it would simplify implementation requirements if the choice to support "format" validation and the choice to support "content*" validation were independent. With "format", we do give a starter, core set of formats, but with "content*" I feel like implementations would choose whichever media types and encodings seem most relevant to them, and trying to mandate a core set did not seem correct.

epoberezkin · 2017-11-11T14:26:06Z

The media type and the encoding are (mostly) orthogonal.

Yes, but they could have been just different formats and you can apply two at the same time.

I figured that it would simplify implementation requirements if the choice to support "format" validation and the choice to support "content*" validation were independent.

That makes sense.

By the way, I've used formats "json" and "edn", I guess they would fit one of these keywords.

mkovatsc mentioned this issue Feb 24, 2017

Investigate CBOR compatibility #6

Open

handrews added the core label Sep 28, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Concise encoding for JSON Schema #259

Concise encoding for JSON Schema #259

mkovatsc commented Feb 24, 2017

handrews commented Feb 27, 2017

danielpeintner commented Mar 29, 2017

handrews commented Apr 30, 2017

handrews commented Apr 30, 2017

danielpeintner commented May 2, 2017

handrews commented May 2, 2017

danielpeintner commented May 3, 2017

handrews commented Aug 18, 2017

handrews commented Aug 22, 2017

handrews commented Aug 22, 2017

mkovatsc commented Aug 23, 2017

handrews commented Aug 23, 2017

handrews commented Oct 25, 2017

danielpeintner commented Oct 26, 2017

epoberezkin commented Nov 10, 2017

handrews commented Nov 10, 2017

epoberezkin commented Nov 11, 2017

Concise encoding for JSON Schema #259

Concise encoding for JSON Schema #259

Comments

mkovatsc commented Feb 24, 2017

handrews commented Feb 27, 2017

danielpeintner commented Mar 29, 2017

handrews commented Apr 30, 2017

handrews commented Apr 30, 2017

danielpeintner commented May 2, 2017

handrews commented May 2, 2017

danielpeintner commented May 3, 2017

handrews commented Aug 18, 2017

handrews commented Aug 22, 2017

handrews commented Aug 22, 2017

mkovatsc commented Aug 23, 2017

handrews commented Aug 23, 2017

handrews commented Oct 25, 2017

danielpeintner commented Oct 26, 2017

epoberezkin commented Nov 10, 2017

handrews commented Nov 10, 2017

epoberezkin commented Nov 11, 2017