Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

is having ontology URLs in http and https variants a good idea? #2852

Closed
VladimirAlexiev opened this issue Feb 26, 2021 · 8 comments
Closed

Comments

@VladimirAlexiev
Copy link

I'd like to raise an objection to this practice, although I'm sure it has been discussed at length.

It makes it harder for consumers to process the data.
If I crawl (or get from WebDataCommons) 50B schema triples and half use http the other half https ontology URLs, I have to either:

  • declare equivalentProperties and equivalentClasses, and enable expensive reasoning to handle that
  • employ an expensive URL rewrite process before ingesting the data

The schema website redirects eg http://schema.org/Person to https://schema.org/Person.
Does this extra step cause significant insecurity, inconvenience or inefficiency?

@akuckartz
Copy link

HTTPS should be preferred. Maybe the use of the HTTP URLs can be depcrecated?

Another alternative would be to change HTTP itself - which might be worth the (huge) effort.

@RichardWallis
Copy link
Contributor

With the upcoming release of version 12.0 we are at the end of a long process (see PR #2814 and nearby) of transitioning all aspects of the vocabulary site, definitions, and examples from http to https.

When launched a decade ago everything was http based and since that time significant amounts of http based data has been implemented and shared. In an ideal world, if Schema.org was being launched today, https only based vocabulary definitions would probably be the answer. However, providing the option of http or https definitions should be the pragmatic way forward for some time, to support those who have an established investment in http based data.

I am going to close this issue now. Once the results of this final step towards https have been released we can see if it raises any new questions

@VladimirAlexiev
Copy link
Author

those who have an established investment in http based data.

...cannot easily use https data being produced by "modern" schema users

@WeaverStever
Copy link

I found an anomaly the other day on the SDTT. I wanted to self document a script and provide the full path to my @type declarations.

The SDTT will not accept https versions for @type.

Valid declarations for SDTT
"@type" : "Organization"
or
"@type" : "http://schema.org/Organization"

Invalid declaration
"@type" : "https://schema.org/Organization"

@RichardWallis
Copy link
Contributor

@WeaverStever what you are showing is actually correct behaviour. I agree it seems odd but let me explain.

I presume your example reads something like this:

Success:

{
  "@context": "https://schema.org",
  "@type": "http://schema.org/Organization",
  "name": "Example Org Inc"
}

Failure:

{
  "@context": "https://schema.org",
  "@type": "https://schema.org/Organization",
  "name": "Example Org Inc"
}

The "@context": "https://schema.org", line instructs the application to retrieve a context file identified from links returned from the https://schema.org address.

Inspecting the file that is returned to the JSON-LD processor within most applications using standard tools, including the SDTT, (text version for easy human reading) shows this:
"@vocab": "http://schema.org/",

Note my previous comment as to why the file is always the same regardless of using http or https in the @context statement.

Using "@vocab": "http://schema.org/", as a starting point, it is clear why the first example succeeds and the second fails.

The final stages of moving the vocabulary to https are part of the upcoming release of V12.0. This gives a unique opportunity to see how this process will change.
(Note: This comparison will not work once the main site moves to version V12.0)

If you change the example to pull its context file from the webschemas.org site (currently at V12.0), the behaviour is as follows:

Failure:

{
  "@context": "https://webschemas.org",
  "@type": "http://schema.org/Organization",
  "name": "Example Org Inc"
}

Success:

{
  "@context": "https://webschemas.org",
  "@type": "https://schema.org/Organization",
  "name": "Example Org Inc"
}

The reason for the reverse behaviour being the "@vocab": "https://schema.org/", line in the V12.0 context file.

Fortunately, it is not normal practice to fully define @type values in JSON-LD in the way you describe. So for most, this change will not be apparent.

~Richard

@WeaverStever
Copy link

@RichardWallis

Thanks for the clarification, the change to the context could easily be overlooked.

The reason I want a full URL to the @type is because we are collecting internal JSON objects (flat files) for music works and recordings rather than locking ourselves into a formal database (this industry is clear as mud and currently stuck in 1970 technology). Since the ontology is already pretty well defined in schema, if we want to later add a property, I'd like to stick to those defined in the schema. Thus, having the full url within the file will be handy for whoever edits it, to quickly lookup defined properties.

Apparently, the SDTT is not prepared for the new context and I also note that allowing the server to decide is not valid.

Fails SDTT

{
  "@context": "https://schema.org",
  "@type": "//schema.org/Organization",
  "name": "Example Org Inc"
}

Fails SDTT

{
  "@context": "https://webschemas.org",
  "@type": "https://schema.org/Organization",
  "name": "Example Org Inc"
}

Thanks!

@RichardWallis
Copy link
Contributor

Weird, I'm sure the webschemas.org version worked when I tried it a couple of days ago.

Whatever, the principle of what I was explaining was important.

As to your localised needs I could suggest the use of triplestores, standardised for well over a decade, and possibly considering serialisations other than JSON-LD. However we are now drifting well beyond the scope of this issue.

@jaygray0919
Copy link

While not dead-specific to this issue, legacy http: identifiers are a problem when used in a service. All browsers routinely look for http: and flag the item. You see this, for example, whenever a reference image is served from http:. This issue also materializes when endpoints and IRIs are served from http: - even when the reference is buried in JS. IMHO, we'll be better off when target content is served from https: and should not delay in moving existing content to an https: server.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants