Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON-LD in HTML #231

Closed
msporny opened this issue Mar 17, 2013 · 13 comments
Closed

JSON-LD in HTML #231

msporny opened this issue Mar 17, 2013 · 13 comments

Comments

@msporny
Copy link
Member

msporny commented Mar 17, 2013

@gkellogg proposed a new, non-normative feature that would allow JSON-LD to be embedded in HTML documents:

http://lists.w3.org/Archives/Public/public-linked-json/2013Mar/0019.html

The feature would allow developers to embed JSON-LD by doing something like this:

<script type="application/ld+json" data-context="http://example.org/context.jsonld">
{
   "name": "Gregg Kellogg"
}
</script>
@lanthaler
Copy link
Member

An alternative would be to define an alias for @graph in the remote context and always require its use:

<script type="application/ld+json">
{
  "@context": "http://example.org/context.jsonld",
  "data": {
     "name": "Gregg Kellogg"
  }
}
</script>

@lanthaler
Copy link
Member

RESOLUTION: Add the JSON-LD in HTML feature to the JSON-LD Syntax specification without support for data-context. We are still discussing data-context and the danger of it forcing a JSON-LD initial context.

@gkellogg
Copy link
Member

One alternative to @data-context which has no real support, would be to combine the context along with the mime type in the @type attribute. According to the definition of valid MIME Type in HTML5, a mime-type can contain type parameters. If we were to add back a media parameter to specify the context, we could potentially do something like the following, which does not rely on any additional attribute:

<script type="application/ld+json; context=http://schema.org">
{
  "@type": "Book",
  "image": "catcher-in-the-rye-book-cover.jpg",
  "name": "The Catcher in the Rye",
  "bookFormat": "Paperback",
  "author": "/author/jd_salinger.html",
  "aggregateRating": {
    "@type": "AggregateRating",
    "ratingValue": 4,
    "reviewCount": 3077
  },
  "offers": {
    "@type": "Offer",
    "price": 6.99,
    "priceCurrency": "USD",
    "availability": "InStock"
  },
  "numberOfPages": 244,
  "publisher": "Little, Brown, and Company",
  "datePublished": "1991-05-01",
  "inLanguage": "English",
  "isbn": "0316769487"
}
</script>

One concern about specifying the context outside of the fragment itself is that if it were missing, some agents may choose to assume that it was intended (e.g., http://schema.org). This could be quite dangerous, the sense of the group is to not support this, but require that the context be included in the document itself. This could be done generically as follows:

<script type="application/ld+json; context=http://schema.org">
{
  "@context": "http://schema.org",
  "data" 
    "@type": "Book",
    "image": "catcher-in-the-rye-book-cover.jpg",
    "name": "The Catcher in the Rye",
    "bookFormat": "Paperback",
    "author": "/author/jd_salinger.html",
    "aggregateRating": {
      "@type": "AggregateRating",
      "ratingValue": 4,
      "reviewCount": 3077
    },
    "offers": {
      "@type": "Offer",
      "price": 6.99,
      "priceCurrency": "USD",
      "availability": "InStock"
    },
    "numberOfPages": 244,
    "publisher": "Little, Brown, and Company",
    "datePublished": "1991-05-01",
    "inLanguage": "English",
    "isbn": "0316769487"
  }
}
</script>

Provided that "data", in this case, is aliased to "@graph". The ensures that the document content will always be interpreted as JSON-LD, and will survive copy-paste into some other tool.

@lanthaler
Copy link
Member

Gregg, have you perhaps already checked how quoting of the value of the context media type parameter works in that case? Usually you would have to wrap the value in quotes, but in this case you can’t because the whole media type (including the parameter) is already in quotes.

@gkellogg
Copy link
Member

RFC 2616 makes no recommendations on quoting parameters; it has very little to say at all. I believe this is up to the format definition to describe what quoting and/or escaping requirements are. Of course, you could use quotes within quotes, either escaping them, or using "'" instead of '"', but I would say that we would require the value to be URI escaped, and explicitly unescaped prior to dereference; for most cases, this won't be necessary.

@niklasl
Copy link
Member

niklasl commented Mar 19, 2013

Yes. Specifically, according to RFC 2616, section 3.7:

Parameters MAY follow the type/subtype in the form of attribute/value
pairs (as defined in section 3.6).

In section 3.6, (non-token) attribute values are defined as quoted-string (which uses duoble-quotes ("...")). As Gregg says, in HTML attributes, that requires the use of either &quot;...&quot; for values, or the use of single quotes:

<script type='application/ld+json;context="http://schema.org/"'>
</script>

However, back go Gregg's point. The "MAY" in 3.7 means that we can get away with another form. Granted, perhaps that in turn might cause trouble in some mime-type parser implementations. That remains to be tested. Otherwise, this possibility (using a mime-type parameter for the context) seems feasible.

lanthaler added a commit that referenced this issue Mar 20, 2013
@gkellogg, was there a reason why you choose such a complex example? If not, I think this is much simpler and illustrates the mechanisms just as well.

I've also removed the part which says "For text/html, text inside of the script tags does not need to be escaped." Which is wrong, see http://www.w3.org/TR/html5/scripting-1.html#restrictions-for-contents-of-script-elements

This addresses #231.
lanthaler added a commit that referenced this issue Mar 20, 2013
This is something completely orthogonal to the rest of the spec. I think moving it to the end makes that text flow better.

This addresses #231.
@lanthaler
Copy link
Member

Looks like in HTML5 using double-quotes inside single quotes is the preferred way to deal with this, see e.g. HTML5's source element:

<source src='video.ogv' type='video/ogg; codecs="theora, vorbis"'>

The MAY in RFC2616 just means that there may be parameters or there may be none. It is very clear about the fact that non-token values have to enclosed in double quotes.

parameter     = attribute "=" value
attribute     = token
value         = token | quoted-string
token         = 1*<any CHAR except CTLs or separators>
separators    = "(" | ")" | "<" | ">" | "@"
                | "," | ";" | ":" | "\" | <">
                | "/" | "[" | "]" | "?" | "="
                | "{" | "}" | SP | HT
quoted-string = ( <"> *(qdtext | quoted-pair ) <"> )
qdtext        = <any TEXT except <">>

That being said, I would still prefer to require that the @context keyword has to be used within the JSON-LD block. This is, IMO, much simpler.

@niklasl
Copy link
Member

niklasl commented Mar 20, 2013

True, it's either double quoted or no parameters (I read that "MAY" way too late at night).

It would be nice to have a way of specifying the context out-of-band here in a simple way. And the mime-type with quoted parameters seems most correct, though arguably cumbersome in an HTML attribute. It's interesting to see the same pattern used in the <source> example though. Good spot Markus.

(I think it's a nice feature for content negotiation as well. With support for specifying a context in JSON-LD-based mime-types, a client can request a specific form which it either has (or is prepared to fetch) the context definition for, or whose "surface syntax" is sufficient for its needs.)

Perhaps an option is to support this form, and also define the token form to have special meaning; specifically a domain name? E.g.:

<script type="application/ld+json; context=schema.org">

could imply a content-negotiable context at http://schema.org/ and/or at http://schema.org/.well-known/context.jsonld. Just a thought.

@gkellogg
Copy link
Member

In my (cursory) review of the RFC, I looked for, but didn't find the detailed syntax.

Saying "context=schema.org" has a clear meaning, so having it expand to an IRI would clearly be the intention, and in-line with how browsers turn domain names into URLs.

I agree that it's better to include the context inline, and we'll see how this resolves. Ideally, we can simply drop the context parameter altogether.

@lanthaler
Copy link
Member

Sorry, I think I have to disagree. Having such an implicit from to specify a context opens a can of worms. Is it HTTP or HTTPS? Will it use HTML5 URL parsing or something else.. We should not go down that route but try to advocate the use of @context inside the document.

@lanthaler
Copy link
Member

I did ask this already in c728c6b but would like to record it also here as commit discussions are somewhat difficult to find and I don't wanna forget this.

The section currently says:

If a processor extracts the JSON-LD content into RDF, it should expand the JSON-LD fragment into an RDF dataset using the algorithm defined in JSON-LD-API Convert to RDF Algorithm [JSON-LD-API]. If the HTML file contains multiple JSON-LD script tags, or other RDF statements are extracted, the result is the RDF merge of the datasets.

Other processors implementing this mechanism may choose to return the expanded JSON-LD output.

Markus: Can we drop this part? I think it doesn't add much value but sounds overly complicated for people just wanting to use it as JSON-LD.

Gregg: It's important to give guidance when there are multiple script tags, so that the result is clear. We could say that the result is the merge of all such documents.

When extracting RDF, JSON-LD could be combined with other formats (microdata, RDFa, Turtle in HTML). In this case, it's also necessary to say what the expected result it. Without this statement, it won't be clear what to do.

Markus: I understand where you are coming from, but that’s an orthogonal aspect that is application dependent. In our spec we should define how JSON-LD can be embedded in HTML. How such data is used is beyond the scope of this spec. I would strongly prefer to remove this part.

@lanthaler
Copy link
Member

@lanthaler
Copy link
Member

The section has been added to the syntax specification. Unless I hear objections, I will close this issue in 24 hours.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants