New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"Canonical URL" should provide the (HTTPS) URL that is the value of rel="canonical" #2018
Comments
@Aaranged yeah, noticed that also just this past weekend. Thanks for writing this issue up. |
There are two meanings/uses of canonical that are 'apparently' in conflict here, which is a consequence of moving the site to https, whilst the vocabulary (that the site describes) remains http based.
Several years of usage will have resulted in millions of http based term URLs being harvested into knowledge graphs and data stores. To such systems, without special interventions, the two term URLs There is therefore a major backwards compatibility issue, if/when we move the scheme for term identifying URLs from http to https. For some, it would be as big a deal if we were moving from I am not saying that we shouldn't move to https, for canonical term URLs, just that it is not as simple as you might at first assume. An initial step is already in place to ease that transfer. If you take a look at the RDF definitions of the terms (on-page RDFa, JSON-LD & RDF/XML, dumps etc.), you will find that Following the above description, what you have spotted however is a defect with the |
An admirable summary of the current situation @RichardWallis - a couple of comments on what you've documented.
While comprehensible to some developers, this will be lost on most web publishers, as the direction to "use a URL you can't actually resolve in your browser" is counter-intuitive, and likely to be ignored. And the presence or absence of this on-page "Canonical URL" value won't do anything to stem the flow of markup that employs https://schema.org that originates from site users copying and pasting a schema.org (now always-HTTPS) URL from their browser's address bar. In other words, insofar as this "Canonical URL" statement is designed to instruct publishers which protocol to use it has no value, as absent specific requirements about protocol use from specific data consumers there's no mechanism to enforce "correct" protocol encoding (e.g. the Google Structured Data Testing Tool considers the
Indeed. And while this makes sense, it sets up an even more confusing mismatch for those familiar with As I said initially, if this requires changing the statement from "Canonical URL" to something else, like "URL to use in markup" than that's IMO much better than requiring publishers to know and appreciate the difference between two definitions of "canonical URL". FWIW what's far and above the most commonly understood by "canonical URL" is its manifestation in
Understood, but from a practical perspective (that is, "practical" in terms of what protocol web publishers are now using in their markup) it's a moot point. Publishers are using |
@Aaranged I agree with your views about not being able to hold back the tide of https term identifier usage much longer. A tide that will inevitably increase now that the web interface has moved to be exclusively https. My opinion, overriding my natural conservatism about changing fundamental things in widely shared vocabularies, is that we should soon pragmatically move forward to make the vocabulary https based. Whenever we do this there will be pain for some, mostly data consumers. I believe doing it sooner rather than later will reduce confusion for data producers. This I believe would mean:
Those are the somewhat easy to implement steps - meaning they will probably down to me for coding. However we should also consider the following:
|
Google's schema.org recommendations have switched to https. While I recognize this is not a tool specific community or repository, the change provides a further incentive to make the vocabulary switch to HTTPS. Could this be included in scope for v3.5? Happy to assist if there's an appetite for this. |
Voting for a full vocabulary switch to HTTPS.
Jeannie Hill
Hill Web Creations
Digital Marketing, Google Analytics, PPC & SEO Consultant | When you're
serious about marketing, you employ winning strategies.
Office 651.460.2496 <(651)%20460-2496> |Cell 651.206.2410
<(651)%20206-2410>
jeannie@hillwebcreations.com | Hill Web Creations on LinkedIn
<http://www.linkedin.com/profile/view?id=22161273&authType=name&authToken=NBrD&invAcpt=218672614_I176303754_215&trk=eml-comm_invm-b-profile-newinvite&fromEmail=&ut=2B3u0P9-t-NRQ1>
| Hill Web Creations <http://www.hillwebcreations.com/>
…On Tue, Jan 8, 2019 at 11:48 AM Mark van Berkel ***@***.***> wrote:
Google's schema.org recommendations have switched to https. While I
recognize this is not a tool specific community or repository, the change
provides a further incentive to make the vocabulary switch to HTTPS. Could
this be included in scope for v3.5
<#2052>? Happy to assist if
there's an appetite for this.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#2018 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAeRYJZ9BvsyFpKdD09zWRgAGf06MZYcks5vBNnygaJpZM4VbcCo>
.
|
Let's not rush it. There are a lot of subtle things that can get busted e.g. mappings, sparql queries etc. I will look into getting some measures of http vs https data from the Web. |
When creating NEW website should we use https://schema.org OR http://schema.org? |
Implemented |
A current type or property page on schema.org lists at top a "Canonical URL", such as this statement for Book.
In the code of the page we also find a form canonical URL declaration. This is for the
https://
version of a page, as now thehttp://
version of a page 301 redirects to thehttps://
version.<link rel="canonical" href="https://schema.org/Book" />
Accordingly the top-of-page statement should provide the same value as the
href
attribute of<link rel="canonical">
. Without this normalization there is a persistent mismatch between the canonical URL the page describes (i.e. to humans) and the canonical URL provided to machine data consumers.For those working in the search engine space
canonical
has a very precise and well-documented meaning. If what's being described by the on-page "Canonical URL" is something other than this URL the wording should be change to reflect what this value means: "canonical" shouldn't have one meaning for humans and one for data consumers in the context of the very same page.The original intent here was probably to inform users what URL they should use for a term defined in an extension, such as abridged (and which now points to the correct
href
value for<rel="canonical">
, herehttps://schema.org/abridged
rather thanhttps://bib.schema.org/abridged
. However, as documented above, the "canonical URL" provided on page now conflicts with thehref
value because of the HTTP protocol in the former.The text was updated successfully, but these errors were encountered: