Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON Schema #28

Closed
0xced opened this issue May 27, 2011 · 12 comments
Closed

JSON Schema #28

0xced opened this issue May 27, 2011 · 12 comments

Comments

@0xced
Copy link

0xced commented May 27, 2011

It would be nice to have JSON Schema validation.

@johnezang
Copy link
Owner

I spent quite a bit of time digging through the RFC's and some of the sample implementations, and I didn't really see anything that would seem to require integrating a JSON Schema validator directly in to a JSON serializing / deserializing library in order to be able to do validation. And there just seems to be something... fundamentally wrong about adding schemas to JSON. :)

If you really need JSON Schema validation, I'm sure it wouldn't be that hard to hack up something that used -stringByEvaluatingJavaScriptFromString: and one of the JavaScript JSON Schema validators without too much effort.

Ultimately, I don't think a JSON Schema validation system belongs inside JSONKit- JSONKit serializes and deserializes JSON at a very low-level, whereas a JSON Schema validator checks the final, deserialized result. There's really nothing to integrate in to JSONKit in order to do JSON Schema validation- it something that's best done externally to JSONKit, which would then make it useable by all the Objective-C JSON libraries.

@0xced
Copy link
Author

0xced commented May 28, 2011

Given the name JSONKit I thought it might do more than serialization / deserialization. ;)

@erichocean
Copy link

"And there just seems to be something... fundamentally wrong about adding schemas to JSON. :)"

It's called "unit testing" and acts as an assertion that the JSON you expect is the JSON you got. There's nothing "fundamentally wrong" about unit testing.

@johnezang
Copy link
Owner

Schema validation is not "unit testing"- not even close. All schema validation tells you is if a particular piece of (in this case) JSON conforms to the schema. And let's face it, the vast majority of schemas validate nothing more than whether or not a particular property exists or not, and what its basic type is (string, number, etc).

This is about as useful as knowing that the sun will rise tomorrow.

Then there's the problem that once the schema is defined, it effectively freezes the format. Want to add a property / field? Now everyone using the old schema now rejects the updated format. Now what do you do? This happens all the time, and any solution which gets in the way of this "corner case" is a failure.

And "unit testing" implies that you are doing some kind of testing and validation in a sandbox, on code that isn't released and on data that isn't live or representative of what the released code is likely to actually encounter once out in the field. So the only time schema validation makes any sense what-so-ever is if you run it on every single piece of JSON that you get, every single time.

The later is really "input validation", and it's something that schema validation doesn't do, and can't do. Schema validation is a small subset of what's needed for proper input validation, and making the mistake that schema validation is equivalent to proper input validation is probably a bigger problem than not doing any schema validation at all because it gives programmers who don't know any better a false sense of security.

JSON Schemas can not perform Consistency and Integrity checks and validations in the ACID sense of the words. Both of those actions are best performed by a proper ACID compliant backing store- an SQL database, for example. Trying to duplicate C/I like verification at the JSON level is just plain stupid- it's fundamentally impossible, and in the real world it's likely to be completely redundant as a ACID compliant backing store will have to do the exact same work anyways.

If you feel that you need all these schema validators and fancy bells and whistles- use XML. If you need all those bloated extras that XML provides, but you choose JSON, then the problem isn't that JSON doesn't provide those things, it's that you choose the wrong data exchange format.

JSON, the fat free alternative to XML.

JSON is a bit like Objective-C's duck-typing. It does a lot more by saying and requiring a lot less.

@erichocean
Copy link

I'm sorry, for a software project, you are seriously amped up over nothing. We're professionals here, not children.

@johnezang
Copy link
Owner

I assumed everyone was a professional- reasonable people can have different opinions about something. Whether or not an opinion is compelling or not depends on the strength of merits and reasoning. You stated:

"And there just seems to be something... fundamentally wrong about adding schemas to JSON. :)"

It's called "unit testing" and acts as an assertion that the JSON you expect is the JSON you got. There's nothing "fundamentally wrong" about unit testing.

I simply stated the reasons why I believe that your statement is flawed, along with providing some of the rational for why it is non-sensical to do schema validation at the JSON level. This isn't "amped up", it's called "thinking" and "objective reasoning".

Now, if you want to build a system / protocol / specification that incorporates some form of schema validation, but uses JSON as its data interchange basis... that's something different. JSON is just a way to transport / serialize data. As an analogy using TCP/IP and JSON, IP is to JSON as JSON Schema validation is to TCP- one is layered on top of the other. This is why I believe that JSON Schema validation does not belong inside JSONKit proper.

The only way it would make sense to add JSON Schema validation in to JSONKit is if JSON Schema validation was mandatory to parsing JSON. That's the only time it would make sense to me to deeply and tightly integrate schema validation directly in to the JSONKit parser. IMHO, this would not only make JSONKit much slower, but much less useful. Although I can not directly speak for the creators of JSON, it is my opinion that the type of complexity and bloat the JSON Schema adds is _exactly_ the type of thing that the creators of JSON wanted to avoid.

Now, I'll happily listen to any compelling reasons why I should spend a non-trivial amount of my uncompensated time to add JSON Schema validation to JSONKit. If you can come up with a compelling argument why I should do this, then I will reconsider my position. But trying to reframe the argument to one where I'm treating you like a child isn't one of them.

Now, the OP asked that a feature be added to JSONKit, and it was a perfectly reasonable request. Although I am clearly biased in this particular case, I did spend several hours reviewing the JSON Schema RFC and some of the JSON Schema validator implementations out there. As far as I know, there is no pure Objective-C JSON implementation for doing JSON Schema validation, either as an external add-on to a JSON library or directly integrated in to the Objective-C based JSON parser. Even though I was initially biased against it, I did genuinely consider it- people use JSONKit, and even though I might not have a need for JSON Schema validation, others might, but based on what I saw there were very few people who were actively using JSON Schemas. These facts, along with my estimation of how much effort it would take to implement the feature combined with the amount of value that would be added by that effort led me to conclude that it simply wasn't worth my time. My time is limited, and there are things I would like to add to JSONKit, and this feature request just doesn't make the cut.

Regardless of that opinion, based I what I had read while researching JSON Schema validation, I could at least suggest a possible work around. The JSON Schema page the OP linked to had a number of JSON Schema implementations, four or five of which were written in JavaScript. So, I suggested that using one of these JavaScript JSON Schema validator implementations, they could probably hack something together fairly quickly using -stringByEvaluatingJavaScriptFromString. I might not be willing to add the feature, but I could at least suggest a way that they could accomplish it with a few hours of work if it was a really important need for whatever they were doing.

@grgcombs
Copy link

Well put. I personally have no need for schemas, as described, but if I did I would expect to use a distinct product for that. I like the fact that JSONKit is low level, extremely fast, and simple. Anything else might interfere with those three things.

@omarkilani
Copy link

IMHO JSON Schema support doesn't belong in JSONKit, which aims to be fast, simple, etc, etc.

Plus, if John doesn't want to add it, then that's pretty much that. :)

@edog1203
Copy link

Hear, hear.

@johnezang
Copy link
Owner

I had some additional thoughts on this, specifically regarding the JSON Schema (draft 03) specification. Since I am the JSONKit project owner, I am to some degree in a position of "benevolent dictator", and so I thought it would be worthwhile to expand on my reasoning for why I don't think this is a good fit for JSONKit (or really JSON in general).

  • 5.2 properties

    The draft RFC does not specify any additional requirements for properties that are not present in RFC 4627. In fact, JSON Schemas are themselves specified in JSON. As the JSONKit README.md file explains, JSON is required to be encoded in "Unicode". However, Unicode is a complicated standard, and it is possible to represent the same "string" (used here loosely) multiple ways, and whether or not two strings compare equal is a complicated, non-trivial process. Therefore, there is some concern, at least on my part, as to whether or not one can even unambiguously represent which properties are permitted in JSON using JSON Schema (draft 03).

    Granted, this may be a nit-picking issue to some, as many of the problems deal with unusual corner cases, particularly corner cases involving the treatment of some of the more advanced features available in Unicode and Unicode normalization issues, but they are still problems which one may be able to exploit.

    One potentially big problem that comes to mind is a JSON Schema that allows for "additional properties" that aren't explicitly spelled out in the JSON Schema. One could craft a piece of JSON that exploited the fact that different pieces of the JSON and JSON Schema stack may handle Unicode issues differently. The JSON Schema validator may be programmed so that properties are treated (in terms of comparing equal) as exact byte sequences, without any additional Unicode processing or semantics taken in to account. However, the JSON part of the stack, and possibly even something lower level (think NSString encapsulation hiding of how it deals with Unicode issues) may do things differently. For example, if NSString were to perform Unicode NFKC normalization, there exists the possibility that a malicious piece of JSON could pass JSON Schema validation, but the NFKC normalization transforms the malicious property in to a property specified by the JSON Schema, but contains a value that the JSON Schema forbids.

    In other words, as near as I can tell, the JSON Schema (draft 03) does nothing to prevent these types of problems. In fact, because JSON Schemas are themselves specified in JSON, it may not even be possible to prevent some of these types of problems.

  • 5.3 patternProperties

    The name of each property of this attribute's object is a regular expression pattern in the ECMA 262/Perl 5 format, while the value is a schema.

    ... well, which is it? ECMA 262 or Perl 5? And which version of Perl 5? If you're going to be creating a specification which defines what the normative definition of what is permitted in the JSON it defines, it only makes sense that the standard should allow you to specify the normative schema definition in such a way that it is completely unambiguous and not subject to interpretation.

  • 5.15 uniqueItems

    are booleans/numbers/strings and have the same value; or

    Again, this gets back to some of the issues raised in JSONKit's README.md- whether or not two strings in Unicode have "the same value" is non-trivial.

  • 5.16 pattern

    Regular expressions SHOULD follow the regular expression specification from ECMA 262/Perl 5

    The language here is different than clause 5.3. This clause uses the keyword SHOULD, which has a very specific meaning (defined in a separate RFC). Not only does the problem from clause 5.3 remain (which is it? ECMA 262 or Perl 5?), this clause makes it completely optional that the regex syntax used in the schema be either ECMA 262 or Perl 5 compatible. This allows you to write JSON Schema patterns in Magical h0h0 Regex Syntax, and a piece of JSON is valid if and only if it validates using this JSON Schema. What a fantastic idea.

    There is also the problem that a regex is great at matching certain types of things, but extremely poor at being able to match some classes of input (i.e., PDA vs. FSA issues). And this is arguably the wrong place to do what is typically considered proper input validation- does a particular JSON String contain text that will eventually cause a SQL injection attack once it is combined and massaged by some latter, post-JSON processing?

  • 5.23 format

    Validators MAY (but are not required to) validate that the instance values conform to a format.

    ... Let me get this straight... It is completely optional that JSON Schema validators validate a value and ensure that the JSON they are validating contains valid values? Seriously? No, really, _seriously?_

These are just a few of the problems I see with JSON Schema validation. My other concern is that I think there is a huge impedance mismatch between what JSON Schema validation is perceived to accomplish, and what it actually does. In fact, one of the top links on Google for JSON Schema Validation is JSON Schema validation for RESTful Web services. This blog post gives the impression that using JSON Schema to validate JSON some how improves security or prevents security problems caused by (non-JSON Schema conforming) JSON.

... there's just one problem- it doesn't. It can't. This is made painfully obvious by the example that is given- a JSON Schema that checks if the JSON contains three properties, and those properties are either a string, number, or boolean. Then, some JSON that conforms to the schema is given with a big green indicator, and then some JSON where a boolean is given where a number is expected and vice versa, with a big red indicator. Woo hoo! That'll learn those hackers! I mean, come on, seriously? In all honesty, I can't even think of a contrived real world security example that JSON Schemas would "prevent". Never mind the fact that validating values is completely optional according to the draft RFC JSON Schema (draft 03) spec.

From a practical implementation standpoint, there's only a few ways I know of to implement the various pattern schema attributes (which specify a regex pattern the property or value must match). One is RegexKitLite, which is preferred because it uses the system provided ICU regex engine. There's just that little detail that it's "ICU Regular Expressions", which is not ECMA 262 or Perl 5 Regular Expressions. So, the next alternative is PCRE, but that means building / bundling the entire PCRE library, which is non-trivial and a fairly big chunk of code. This is just one of the complicated, non-trivial implementation issues I see when reading over the JSON Schema (draft 03) RFC, and this represents just a handful of lines from the spec.

From a purely gut intuition level- I can't figure out why JSON Schema validation would even be useful, other than an elaborate, complicated way of making sure that a bit of JSON contains X properties, and those properties are either a string, number, boolean, etc. In what way is this useful? What does it accomplish? That a piece of JSON contains a property that is a [string, number, boolean]? That's tautological reasoning at its finest.

Again, maybe I'm missing something. That happens a lot. But usually when you completely miss something that's worthwhile, someone can come up with a concrete example of why JSON Schema validation is "really useful", and when you hear it, it makes complete sense. If someone has such an example, feel free to speak up.

But as it stands, at least to me, JSON Schema looks like a complicated non-solution to a trivial non-problem. XML already has a set of mature tools and standards for doing these types of things. While I can't speak for others, I don't think I'm alone when I say that one of the reasons why I choose to use JSON is precisely because JSON doesn't have stuff like "Schema Validators".

@ghost ghost assigned johnezang Jun 3, 2011
@0xced
Copy link
Author

0xced commented Jul 26, 2011

Thank you for this detailed explanation.

@0xced
Copy link
Author

0xced commented Mar 7, 2013

Hey, patternProperties, uniqueItems, pattern and format have all vanished from JSON Schema spec version 4 :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants