-
Notifications
You must be signed in to change notification settings - Fork 811
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JSONLD context doesn't have https variant #2853
Comments
Firstly, although subtle, there is a difference between the purpose of the vocabulary definition download files and the JSON-LD context. The download files containing the RDF Triples that define the vocabulary, in various serialisations. The context to provide short cut terms to use in JSON-LD code. Because of this, and there being no standardised accepted way to indicate need for different context versions of a context, the only version returned is one that reflects the underlying coding of the data that defines Schema.org in the repository. Currently (as of version 11.0) that is http based. It is worth noting however that as of version 12.0 (due for release soon and visible in draft form on webschemas.org), that underlying coding is moving to be https based. |
@RichardWallis - can you clarify the meaning of "underlying coding is moving to be https based"? Is the intent for the schema.org vocabulary to use
or will those definitions continue to use Discussion here and issues #2814 and #2852 imply that the switch to (the following is copied from issue #2814 since that issue is not open) It appears v12.0 was to use
What is the intended vocabulary namespace for schema.org moving forward? The difference does impact downstream processing and recommendations for our community of implementors. |
An update on this. Firstly, I appreciate and share the desire for migration towards https everywhere. During the v12 launch we did initially switch the entire context to use a vocab declaration of 'https://schema.org/' for all cases. This was part of @RichardWallis's efforts in #2814. I regret that I had not realized this change was part of #2814 - it was problematic immediately (e.g. JSON-LD tests started failing in Apache Jena), because it changed the output triples of all parsers that use contexts in realtime. The change was immediately reverted because of this. The idea of the URL for a http:-based context giving 'http:' triples, and https:-based giving a context designed to generate triples using 'https:', is one approach. While it would be difficult on our current (100% static served appengine) infrastructure, we could explore that further. However, that change would not address the larger problem - that of having a mix of 'http' and 'https' schema.org triples out there. BackgroundTrends in web technology have made it clear that sites are going to rapidly move towards https:, and it was increasingly untenable for us to avoid redirecting e.g. http://schema.org/Event to https://schema.org/Event At that point, we had a usability problem: the main URL for the documentation of Event was https, but most markup used http (whether rdfa, microdata, json-ld). That was how we ended up agreeing to say in the FAQ that both variations were fine, and that consumers would have to figure out the equivalences (https://schema.org/docs/faq.html#19). The most recent changes to this codebase move us into an environment in which Schema.org's internal definitions use 'https' on-disk. Anyone working with schema.org in an RDF setting will need to decide whether to canonicalize to the http: or the https: form, and since both forms are very much "out there" in the wild, this is unavoidable. Consequently we publish a version of the definitions in both flavours, and expect this is likely to be needed for a while. Switching the content of the JSON-LD context to generate https: triples is a very special situation. Unlike RDFa and Microdata, changes to that definition can alter the behaviour of software processes at a distance. If we do go there, I think it's the kind of change we ought to publicize at least a year in advance, with significant supporting documentation. |
However, the https transition does fall into the category: it's inevitable. Separately we are working with folks to upgrade SPARQLer (Apache Jena) to support https IRIs in javascript programs. While a different domain, the https issue is a gating issue there too. But endpoints in SPARQL programs increasingly are problematic. Whether consuming or generating content, it seems to us that a fast transition to https is in our collective best interest. |
@jaygray0919 et al., can I suggest a different framing of the situation? Schema.org has long expressed a few things that made sense in 2011, when optimizing for ease of adoption from webmasters/publishers who knew little about these technologies, were working solely in Microdata, and had relatively modest incentives to adopt. There was less expertise, tooling, documentation and advice to draw upon. Hence, in the datamodel doc:
At Google for example we use some heuristics to normalize string-based shortcuts into thing-based structure. This isn't always easy, and involves determining a plausible type where possible For e.g. "alumniOf": "Westergate Comprehensive" might get expanded into "alumniOf": { "@type": "Organization": "name": "..." }. It might be useful for consuming applications to work towards more shared canonicalization / normalization steps. Of these, mapping http: triples into https: would be amongst the easiest, since it is lossless, simple to implement, etc. If mappings exist between e.g. Schema.org and Dublin Core, Wikidata, FOAF, SKOS etc., we know that we can relatively easily create an https: version of such mappings. My view is that this kind of pre-processing will become increasingly important, and that we'll find more useful things to collaborate on in that space - e.g. shacl, shex etc. Anyone who has looked at any kind of structured data from the wider Web knows that you can't just load it up into an application environment and use it without various kinds of cleanup, quality check, canonicalization, heuristics etc. This was true of Dublin Core, FOAF, Open Graph markup, and it remains true of Schema.org too. Data is inherently messy. It is unfortunate that we have this http vs https issue in the Schema.org ecosystem, but in terms of making data from the Web usable for applications it is a relatively simple problem. |
Liking the idea ... On our side, we want to be fast followers, and defer to a consensus design where folks (who are smarter than we are on this issue) do the design or formulate the pre-processing 'linter'. As a user, we need a solution that doesn't get flagged/rejected by other subsystems (like a browser) or another processor that enforces system-wide rules. We face that problem today with some |
One approach that may be helpful for consumer canonicalization for the
A content consumer can adjust their mechanism for retrieving the remote context document by intercepting requests for the Depending on the library being used for processing, it can be very straight forward. Here's a worked example using the https://gist.github.com/datadavev/3ba3b12390c859b2f780ad7b78ebd739 It's not a perfect solution since there may be references explicitly to expanded |
The SDL uses separate vocabularies for http and https varieties, and as sub-classes domains and ranges are separate, it will flag most attempts to intermix the two. Note that RDFa initial context relates the “schema” prefix to the http version automatically. My tools favor the “schemas” prefix for schema.org. Any change to defaults, as with the JSON-LD context must be well advertised and coordinated. |
This issue is being tagged as Stale due to inactivity. |
Is there a timeline/strategy for the transition to Besides the practical aspects how large-scale consumers of data from the wider Web could address the issue as discussed in this thread, it would require less footnotes when teaching semantic web technologies with practical and working examples if students who just learnt about the standards for RDF term equality and different RDF serialisations could write triples with URIs copied-and-pasted from the browser address bar at schema.org (or from some place from the rendered HTML) and combine them with NQuads from the json-ld.org playground (or some other JSON-LD/RDFa deployment) and things would just work together. In the meantime, maybe it would be useful to have an extension to what is being shown when clicking [more...] at eg. https://schema.org/Person, which says:
And the extension could be:
Thus making the transition more transparent. Maybe with a pointer to FAQ item 19 that could be updated with a bit more of the technical background from this thread. Edit: I just found #2886, which is not linked in this thread yet. |
https://schema.org/docs/developers.html offers http and https variants of the ontology (though in #2852 I question whether that's a great idea):
However, requesting just the JSONLD context (see #2851) doesn't make that distinction.
Both of these
return the same link:
No matter whether you access it by http or https
It returns the same file, which defines ontology terms as http:
The text was updated successfully, but these errors were encountered: