Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should the JSON-LD API have a "mode": "strict" flag? #100

Closed
msporny opened this issue Apr 11, 2012 · 24 comments
Closed

Should the JSON-LD API have a "mode": "strict" flag? #100

msporny opened this issue Apr 11, 2012 · 24 comments

Comments

@msporny
Copy link
Member

msporny commented Apr 11, 2012

I just had a long conversation with @dlongley about what we should do when compaction results in something that the author may not have intended. The result of the conversation was the proposal for a "mode": "strict" option that can be passed to API calls like .compact() and .frame(). Here's the background:

When people request that something is compacted or framed, they will probably write code against that data structure. For example, to iterate over a list of items in the case of "@container": "@list", or to perform math on something they expect to be a native number (via "@type": "xsd:integer" coercion). The problem comes in when their data contains something erroneous like "foo", that is to be coerced to an "xsd:integer", or if their data contains two {"@list": [....]} values.

"mode": "strict" allows them to tell the JSON-LD API if they are okay with the API taking artistic license when generating output. So, for example, if somebody states that "abc" should be an "xsd:integer" - the number 0 is produced in non-strict mode. In strict mode, an exception is thrown stating that the conversion could not be performed.

In other words, what people put in the context is what they should get out, and non-strict mode does its absolute best to make sure that when a number is requested in non-strict mode, that a number is generated... even if the source data is "abc". If we don't do this, people will have to write a large amount of branching logic in their code to handle cases where the coercion failed.

People using the framing API are going to code to the framing structure that they provide. They don't want their code to check every single detail (no micro-syntaxes). The people that care about catching errors in their data will use strict-mode, those that are okay with some data coercions not happening perfectly will either 1) not care or 2) write code to catch the cases where data wasn't coerced exactly how they wanted it to be coerced.

Basically, strict mode means "I am asking for something very specific, either give me that or throw an exception". Non-strict mode means "make a best effort of giving me something reasonable (including the expanded form of things if they cannot be fully compacted/framed)".

@gkellogg
Copy link
Member

Most of my other processors also implement a "strict" or 'validate" option that does something like this. It could actually be split into two options: "strict" to throw an exception for data typed literals, or IRIs that are not lexically correct. I also use "canonicalize" to transform data to a canonical form. Otherwise, most RDF libraries leave things such as "foo"^^xsd:integer un-touched. In the case of JSON-LD, this could extend to transforming appropriate datatypes to JSON native representations.

@lanthaler
Copy link
Member

This sounds very vague.. a couple of examples (use cases) would definitely help. I'm not sure anyone would like to have an "abc" converted to a 0 (zero). If they are fine with that, it means they don't care about that value anyway and the string there wouldn't matter.

There's a simple solution to that, I think.. just modify the context or frame. The user expresses his intentions explicitly by doing so. I don't see any advantages of introducing a strict flag - but maybe I'm also missing something. That's why I asked for a couple of examples/use cases :-)

@niklasl
Copy link
Member

niklasl commented Apr 19, 2012

I think something like this would be good. Perhaps though, a permissive mode would throw away data which doesn't fit the mold? Maybe with three modes, such as strict (error on mismatch), match (drop mismatching data), and lax (try to coerce but keep everything).

@gkellogg
Copy link
Member

An alternative might be a "lint" API method that could provide some more informative feedback about the document. It might be implemented through the expand algorithm, but could callback with error and warning information instead of the expanded document.

@msporny
Copy link
Member Author

msporny commented Apr 19, 2012

@niklasl 3 modes is what @dlongley and I were discussing last week, we should discuss a bit more - but it seemed as if the 3 modes would cover every use case we could think about while not overly-complicating the API (for developers - implementers will have a bit of a harder time with it - but that's why they're implementers... they can take the pain. :) )

@gkellogg lint API method is interesting... but could just as easily be an external tool, no? That is, I can't see many people using it as a part of their application. It would be useful for building linting applications, but that's it, no?

@lanthaler
Copy link
Member

I agree, lint should be an external tool. Every library implementing the JSON-LD API is free to log as much information as it wants.. That probably already addresses most of what I think you have in mind.

Regarding those 3 processing modes I still can't any compelling use case or requirement for that and still believe it just adds unnecessary complexity.

Could you please add the use cases you were discussing to this ticket so that we all can discuss them? Thanks

@msporny
Copy link
Member Author

msporny commented Apr 29, 2012

Use case for 'strict', 'match' and 'lax' modes for JSON-LD processors:

Ideally, a developer working with JSON-LD data is going to want to write their application to process that data such that conditional branching is minimized. That is, the developer doesn't want to check to see if a value is an integer with a @type of xsd:double, or and integer with no type, or a string that is "five", but is of type "xsd:integer". There are three modes that the developer's mind might be in when discussing this:

  1. lax mode: "I know my incoming data is dirty, try to coerce things that coerce cleanly, but leave everything else as-is". This ensures that the data is "cleaned" as much as possible, everything that can't be cleaned is left as-is. This is called "lax" mode because the JSON-LD processor will be very forgiving with the input and will allow the developer to make the final decision on how to interpret the value.
  2. match mode: "My incoming data should be clean, and I want any data that is not clean to be dropped." This ensures that the developer is only working with data that matches their application's very specific assumptions. Anything that is an "xsd:integer" better already be an integer or cleanly coerce-able to an integer or the value is dropped. For example, a value of 5 with xsd:integer is not dropped, but a value of "five" with xsd:integer is dropped.
  3. strict mode: "My incoming data should be clean, and I want you to throw an exception if it isn't." This ensures that the developer knows when there is an error in the input data and that it also matches their applications' very specific assumptions. Anything that is an "xsd:integer" better already be an integer or cleanly coerce-able to an integer or an exception is thrown. For example, a value of 5 with xsd:integer passes, but a value of "five" with xsd:integer throws an exception.

@msporny
Copy link
Member Author

msporny commented Apr 29, 2012

PROPOSAL: Allow three JSON-LD processor modes when processing JSON-LD documents: lax, match, and strict. The lax mode attempts to type coerce what it can while preserving what it cannot. The match mode attempts to type coerce what it can while removing what it cannot. The strict mode attempts to type coerce what it can and throw an exception for anything that it cannot.

@lanthaler
Copy link
Member

How does this eliminate the need for conditional branching? Lax mode: have to deal witch things that don't coerce cleanly, match mode: have to deal with things that are dropped, strict mode: have to deal with exceptions.

I what you describe as "lax mode" is what I would expect from a JSON-LD processor (in compaction).

@dlongley
Copy link
Member

"Eliminating conditional branching" means "eliminating it in all but one spot". The objective is to prevent a spattering of conditional branching throughout all of your code; wherever you have to access a value. Instead, if you used strict mode, you would just have a single place where you handled an exception and indicated that the data was unacceptable. You wouldn't bother going any further with trying to process it.

If you're ok with dealing with a bunch of conditional checks, you'd use lax mode. Though, I'm not sure anyone would really ever want to write code for such output. Instead you'd either work with the output in expanded form, match-mode, or strict-mode. In fact, I'm not that sure about the utility of lax-mode.

@lanthaler
Copy link
Member

OK, let me try to address this issue in a more organized fashion. First of all, while the proposal is generic in allowing "three JSON-LD processor modes when processing JSON-LD documents" it really just applies to compaction (also as part of framing) and fromRDF.

So let's start by looking at compaction a bit closer. The problem we have there is basically about where we have "matches" and where we don't. So let's look at three expanded inputs:

  1) native   |   2) string     |   3) expanded native                 |   4) expanded string
"te:rm1": 5.1 | "te:rm1": "5.1" | "te:rm1": { "@value": 5.1,           | "te:rm1": { "@value": "5.1",
              |                 |             "@type": "xsd:double" }  |             "@type": "xsd:double" }

We can now try to compact these four input documents with the following two three contexts

  a) just term  |   b) term & type             |   c) term & different type
"t1": "te:rm1"  | "t1": { "@id": te:rm1"       | "t1": { "@id": "te:rm1
                |         "@type": xsd:double" |        "@type": "xsd:integer" }

The question is now which term mapping applies on which input documents. I think we all agree that a) would apply to all, the situation for b) and c) is not as clear. I would argue that b) just applies to 3) and 4) and c) doesn't apply to any input document.

The next question is the result of 3b and 4b and whether we will preserve the native data type in expansion or not. This is related to the issues #81 and #98.

IMO, the right result of compacting 3) and 4) with b) would be

  3b)      |   4b)
"t1": 5.1  | "t1": "5.1"

I think 3b) is obvious, but some of you might argue that "5.1" should be converted to a native data type. I don't think that should be the case. If someone specifies a type coercion he has indirectly given up native data types. The situation is different in RDF round tripping. Therefore I would say that if we introduce a strict flag, it should be just for fromRDF.

fromRDF (and of course toRDF) should be the only place where native data types are converted to typed literals and back. If such a conversion from a type coerced string to a native data type fails we could have a flag to either throw an exception or keep it in expanded form. Compaction, expansion and framing do not need such flags IMHO.

Summarizing I would propose the following:

PROPOSAL: Introduce a strict flag for the fromRDF algorithm which, when set, triggers an exception to be thrown when a literal typed to xsd:integer or xsd:double can't be converted to a native JSON number or a literal typed to xsd:boolean can't be converted to a native JSON boolean. If the flag is not set, such values will be converted to the expanded object notation form (@value).

PROPOSAL: During expansion, all type coerced native data types (booleans and numbers) are converted to type coerced strings in the expanded object form. During compaction, such strings in expanded MUST NOT be converted to native types but to a simple string instead of the expanded object notation form (the object is simply replaced with the value of @value).

@lanthaler
Copy link
Member

RESOLVED: Introduce a 'useNativeTypes' flag for the fromRDF algorithm which, when set, attempts to convert xsd:boolean, xsd:integer, and xsd:double to native JSON values. If the conversion fails the value will be converted to the expanded object notation form.

@lanthaler
Copy link
Member

Note: We might should change the name of the flag as it probably should be set to true by default. So something along the lines of doNotUseNativeTypes might be more approriate.

@dlongley
Copy link
Member

dlongley commented May 1, 2012

We might consider something slightly different for the flags to fromRDF:

For fromRDF we could have four flags: @type, @integer, @double, @boolean.

Each of these flags would have the following default values:

@type => rdf:type
@integer => xsd:integer
@double => xsd:double
@boolean => xsd:boolean

These flags could be set to any property desired -- or set to null to prevent conversion. This would also replace the notType flag; you would just set @type to null.

@dlongley
Copy link
Member

dlongley commented May 1, 2012

We could simplify the @double and @boolean flags to:

@Number: ["xsd:double", "xsd:integer"]

But the analogy isn't there for toRDF; we'll probably want the same flags there to indicate how to convert existing native types into RDF.

@lanthaler
Copy link
Member

Good idea. But I would change the names of the flag @integer, @double, and @boolean to just "integer", "double", and "boolean" or perhaps even better to "integerNumber" and "floatNumber" to highlight that there's not a 1:1 type match.

I would be fine with having integerNumber/floatNumber in toRDF() and just number in fromRDF().

Furthermore, I think we should have a flag which controls whether an exception is thrown if the conversion to native types fails or whether in that case the result should just remain in expanded form.

@dlongley
Copy link
Member

dlongley commented May 2, 2012

I agree we should have the strict flag for toggling conversion exceptions on/off.

@niklasl
Copy link
Member

niklasl commented May 3, 2012

Has the "match" option been dropped from consideration, or am I missing something? (The purpose of which, as stated above, is to ensure clean compacted data even if there is ragged input data.)

@lanthaler
Copy link
Member

Niklas, we decided to move all those issues to fromRDF()/toRDF() as that's where they seem to belong. We haven't decided yet what exact flags fromRDF()/toRDF() have.

@niklasl
Copy link
Member

niklasl commented May 3, 2012

Alright, that sounds good.

@lanthaler
Copy link
Member

RESOLVED: JSON-LD will support a JSON-LD Processor Event mechanism that will report certain events (to be decided later) via a callback given through JSON-LD API calls.

RESOLVED: The JSON-LD Processor Event callback would be registered for every JSON-LD API call, and would provide the type of event and the data associated with the event for the callback. This mechanism would be used to report potential errors, warnings and when the processing of the document was complete.

@lanthaler
Copy link
Member

RESOLVED: When a JSON-LD processor processes input that would result in an exception, it should instead call the JSON-LD Processor Event callback with data concerning the issue that was detected.

@lanthaler
Copy link
Member

I created issue #150 to keep track of the flags Dave proposed above.

lanthaler added a commit that referenced this issue Aug 1, 2012
In this update I still use exceptions as they are still used in the rest of the spec. In a future update all exceptions should be replaced to calls of the error callback handler as decided in issue #100.

This closes #92.
@lanthaler
Copy link
Member

I close this issue as it is not relevant anymore.

We decided to not introduce a mode: strict flag. Instead

RESOLVED: JSON-LD will support a JSON-LD Processor Event mechanism that will report certain events (to be decided later) via a callback given through JSON-LD API calls.

RESOLVED: The JSON-LD Processor Event callback would be registered for every JSON-LD API call, and would provide the type of event and the data associated with the event for the callback. This mechanism would be used to report potential errors, warnings and when the processing of the document was complete.

RESOLVED: When a JSON-LD processor processes input that would result in an exception, it should instead call the JSON-LD Processor Event callback with data concerning the issue that was detected.

The specs haven't been updated yet but this is being tracked now in issue #153 as it wasn't defined in detail yet.

The flag to switch on/off the conversion of typed literals to JSON native data types

RESOLVED: Introduce a 'useNativeTypes' flag for the fromRDF algorithm which, when set, attempts to convert xsd:boolean, xsd:integer, and xsd:double to native JSON values. If the conversion fails the value will be converted to the expanded object notation form.

and the flags Dave proposed as an alternative above are being tracked in issue #150.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants