Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal to add new property /mobileUrl as sub-property of /url #3134

Open
alex-jansen opened this issue Jul 8, 2022 · 44 comments
Open

Proposal to add new property /mobileUrl as sub-property of /url #3134

alex-jansen opened this issue Jul 8, 2022 · 44 comments
Assignees
Labels
no-issue-activity Discuss has gone quiet. Auto-tagging to encourage people to re-engage with the issue (or close it!). Queued for Staging (webschemas.org) Editorial work provisionally complete; ready for final review/checks.

Comments

@alex-jansen
Copy link
Contributor

Context - This is a proposal from Google based on our experience consuming schema.org markup and working with similar data from online merchants. If it were accepted, it would make it easier for us and others to differentiate mobile optimized URLs from standard (desktop) URLs.

Proposal - Schema.org supports a /url property for use on /Thing, used to specify the URL of an item. Online publishers often create separate mobile-optimized landing pages, and we therefore propose to add a new /mobileUrl property as a sub-property of /url for use on /Thing to allow consuming systems to better understand mobile vs non-mobile versions of web pages representing the same entity.

@danbri
Copy link
Contributor

danbri commented Jul 8, 2022 via email

@alex-jansen
Copy link
Contributor Author

The idea is to use /url for all cases where there is only a single url, supporting both mobile and non-mobile devices, and only use the new /mobileUrl if there is an explicit second different page optimized for mobile devices.

Agreed that automatic detection of super-properties might impact existing systems so I would look forward to community feedback. Especially since there will be more cases where the ability to identify a "main" vs "secondary" value of a property could be very useful, for example to identify a primary image for the /image property on /Product. This could be done through a sub-property of /image, or we could add a new standalone property that is not a sub-property (similar to primaryImageOfPage). The latter option would still have a major effect on consuming systems which will have to be updated to process the new property. It would therefore make sense to at least explore a more generic solution.

@Tiggerito
Copy link

Joost Tweeted about a similar thing recently about posterImage and featuredImage being new properties for Article:

https://twitter.com/jdevalk/status/1544614542432428032

What makes a sub-property different to a regular property?

How about using a sub-class instead. Url->MobileUrl and then providing an array for url with one of the entries using MobileUrl to indicate it is specifically for mobiles. Or would consumers struggle with url becoming an array and dealing with sub-classes? I guess adding a new property is less likely to break existing implementations.

@HughP
Copy link

HughP commented Jul 9, 2022

If discovery and association are the goals, then why not declare the non-moble site in the Sitemap file?

@danbri
Copy link
Contributor

danbri commented Aug 9, 2022

Thanks, everyone. Ok, lots of things going on here!

  • @HughP - this is about more than finding the URL, ... the desire is to help in cases where there's a second site with different URL(s), and to make clear that's a "mobile specific" site.
  • As time goes by, mobile is becoming the default. Or rather, responsive design is encouraging a world in which "web pages render well on a variety of devices and window or screen sizes from minimum to maximum display size to ensure usability and satisfaction".

So there's some careful wording to find there. We know that it is common for organizations to spin up a whole new site for mobile, regardless of trends and best practice. So having /mobileUrl makes some sense to identify those.

On @Tiggerito's suggestion(s) - using subclasses here is tricky. Firstly, the Schema.org datamodel is a delicate balancing act between the HTML5 Microdata syntax and the conventions from W3C RDF. Our representation of URLs as strings is somewhat unusual in RDF usage, where all nodes in the graph can carry URIs/URLs. I am wary of complicating the situation further by having types whose members are the URLs (rather than the people, places, things, sites, pages etc.) identified by those URLs. Secondly the terminology of "array" (presumably from JSON) doesn't quite work here, since our datamodel is common across JSON-LD, RDFa and Microdata, as well as other data environments like RDF-Star, Property Graphs, SPARQL/Turtle etc. When we use arrays in JSON-LD it is typically an unordered list representing repeated use of a property on some item. This stems from early confusion in JSON about key redeclaration (e.g. see here).

@jdevalk also made very reasonable enquiries about sub-properties.

To answer @Tiggerito's question: a sub-property is just a property, so it is no different to a regular property.

The point about sub-properties is just that they indicate reliable patterns of truth amongst descriptions that use them. For example, whenever something 'X' has a brother 'Y', then 'X' has a sibling 'Y'. So we can capture that by saying that the more specific property, let's call it /hasBrother has a more general "super property", "/hasSibling". The subPropertyOf relationship is just an awkward semi-backwards way of saying "super property". Both relate a specific property to a more general one that it implies (given some pair of entities etc etc).

In this way sub-properties are very similar to the subtype hierarchies which are much more prominent at Schema.org. You can infer that something which is a OpinionNewsArticle will also be a NewsArticle (or Article, or CreativeWork, or Thing). As you work up the hierarchy things get less informative, and more general.

So one problem for us with sub-properties (why we don't go crazy using them everywhere) is that in most of the concrete markup notations available to us (except RDFa), it is pretty awkward to list multiple properties connecting two specific entities. This puts publishers in situation of deciding whether to use the more informative, specific property (which may also be newer, less widely understood) or the more boring general superproperty. It also puts data consumers in position of doing more processing to infer super-properties when they encounter instead a sub-property.

To conclude: I think we should add this to Pending, but with some design constraints:

  • we don't want to create a chaos where the billions of pages using /url get needlessly revised to use /mobileUrl
  • we say /mobileUrl is for specific situations in which a data consumer wants to know that additional URLs are for a separate Mobile sites.
  • We expect /url to generally point to sites that will be being used by mobile users and will gradually get better for mobile.

@danbri danbri self-assigned this Aug 9, 2022
@danbri danbri added the Queued for Staging (webschemas.org) Editorial work provisionally complete; ready for final review/checks. label Aug 9, 2022
@danbri
Copy link
Contributor

danbri commented Aug 9, 2022

@alex-jansen does this meet your usecase? Could you draft a line for the release notes?

Queued for staging

@alex-jansen
Copy link
Contributor Author

Thanks @danbri this will work. Agreed with the complexities surrounding sub-properties.

@jonoalderson
Copy link

This feels like a slippery slope to defining infinite numbers of URL types based on context, which seems like an oddly arbitrary direction for the standard to travel in.

Should we support DesktopUrl? What about printUrl? Or bookmarkURL? Or RssFeedUrl? Or OnMySmartToasterUrl? How many URLs can/should/might a thing have?

Not keen on adding arbitrary appendages to the standard just because folks at Google want a simplistic attribute to sniff for, when a little exploration might lead us to a more graceful solution.

If we're trying to describe that a certain type of thing is optimally consumed by a certain media, that's nothing to do with the URL - it's a descriptor of a consumption method, which may have a URL... surely?

And if the objective here is to map the relationship between 'desktop' and 'mobile' URLs (for publishers who can be bothered to mark that up correctly), we already have a standard; rel=alternate. I've definitely seen discussions elsewhere here for providing sets of 'alternate' URLs (e.g., in the context of AMP pages); maybe expanding that with descriptors of their purpose is a cleaner approach? We could even define media queries, etc.

@jdevalk
Copy link
Contributor

jdevalk commented Aug 9, 2022

FWIW, I fully agree with @jonoalderson that this is not a good idea. We already have standards for this...

I feel we should just tell people that need this, that their website is broken. Not clutter schema.org by adding properties for things that shouldn’t exist.

@danbri
Copy link
Contributor

danbri commented Aug 9, 2022 via email

@jonoalderson
Copy link

jonoalderson commented Aug 9, 2022

Yeah, "mobile sites" as a subclass of WebSite would definitely be worse 😆
A website has WebPages. Some of those are designed for mobile devices. Some of those have parity with consumer-agnostic/desktop WebPages. I still don't really see what this couldn't be achieved by providing an array of URLs for a WebPage, and/or describing the role of those URLs.

I also think that constraining this only to product/offer - purely because that's the bit that the Google Merchant Center folks care about - feels like an even slipper slope than conflating URLs. Products and offers aren't special types of things which have or need different types of URLs from the rest of the web.

If we want to solve this, I think we should take the time to do so generally, rather than in a way which creates a Google-centric frankenspec. Appreciate you trying to defuse it, but I'm dubious as to how much impact that'll have on a Pavlovian web.

@danbri
Copy link
Contributor

danbri commented Aug 9, 2022

Twitter link to surface this to non github folk - https://twitter.com/danbri/status/1557101340094078976?s=21&t=2YfSoX-qmWW7oOE-4pnu7w

Arrays don’t really exist in schema.org graph, any more than

or ‘itemprop’. How would an application trying to be useful to mobile users select from the different things in the array (or from the repeated property the array encodes).

@jonoalderson
Copy link

http://pending.webschemas.org/LinkRole feels close to what we'd need?

@pduchesne
Copy link

I second the remark of @jonoalderson : this opens the door to many other sub properties that will clutter the schema ; where's the limit then. Besides, there are other mechanisms in the web stack to cope with agent-dependent representations, e.g. HTTP redirects based on the User-Agent, responsive CSS, or "rel=alternate" links.

@danbri
Copy link
Contributor

danbri commented Aug 9, 2022 via email

@jonoalderson
Copy link

Yeah, Roles are tricky, and adoption is low. 👍

That said... If we're in a position where we're responding to demand from Google to describe URL roles, I can't help but think that there's some responsibility for 'them' to seek to do that responsibly (i.e., using scalable syntax). I'm sure that if Google's documentation described and provided examples of how to mark up roles, then there'd be reasonable (or no less than otherwise) adoption?

Did roles fail, simply because there was no demand? And now, what is this, if not demand?

@danbri
Copy link
Contributor

danbri commented Aug 9, 2022 via email

@jdevalk
Copy link
Contributor

jdevalk commented Aug 9, 2022

What if we created a type alternateURL, which in itself would have the properties URL, a property for MIME type and something to classify it, either rel or role or URLType or whatever, then we could allow that on Thing, and it could point to mobile URLs, RSS feeds, JSON representations of pages, etc.

This could also be useful for images, to point to other mine types, and potentially more applications. That’d be a lot cleaner and useful in the long run.

@gkellogg
Copy link
Contributor

@danbri said:

:something :url “https//…/x1234” {| :mobileyNess :VeryMobileFriendly |} .

(In Turtle-star proposed notation the inner annotation is also asserted)

With this kind of thing potentially available soon, whatever we do now feels like it will inevitably be a pragmatic stepping stone.

For some value of soon. Presuming that the proposed RDF-star WG is chartered (voting should be done), then it's ~2 years before a REC is out. Not a specific work item, but there is a JSON-LD-star draft that would likely also come out, but so far no work on RDFa-star much less Microdata-star.

Of course, this does enable some useful use-cases such as this, and what was attempted with the Role class, but some more complicated use cases may involve the introduction of more blank nodes as discussed Triples and Occurrences EXAMPLE 8:

_:a :occurrenceOf << :s :p :o >> ;
    :in <file1.ttl> ;
    dct:creator :alice.
_:b :occurrenceOf << :s :p :o >> ;
    :in <file2.ttl> ;
    dct:creator :bob.

@jvandriel
Copy link

jvandriel commented Aug 10, 2022

I have to say that for a big part I agree with what @jonoalderson and @jdevalk have said. Though to be honest, my initial thoughts when I read this was: "Oh no, not another crucial thing site owners/devs/consultants can and most likely will screw up. I'm going to advice my clients not to use this just to prevent any future issues".

<link rel="alternate" media> by itself often already creates a ton of implementation issues especially when combined with <link rel="alternate" hreflang>- which often goes hand in hand with either/both <link rel="canonical"> and <meta name="robots"> issues, so I for one am not a big fan of extending this into any structured data markup - which is really hard to spot and compare against what's being stated in the html itself (unless you make use of commercial crawlers, and that's only for encountering issues, not for resolving them.

I fear adding this could lead to many more issues than it's worth.

@HughP
Copy link

HughP commented Aug 11, 2022

@danbri

the desire is to help in cases where there's a second site with different URL(s), and to make clear that's a "mobile specific" site.

Please!! In the documentation clearly define "separate site". E.g., not just a responsive page, address of it must be a separate domain or sub domain can one have two sites on the same domain? Multi-site Wordpress allows different sites in a folder like hierarchy. But my reading of google indexing documentation for SEO is that they treat all sub domains as new domains but "all sites" may not be on different domains. Some clarity here need to exist in the documentation if this contentious proposal is moved forward.

@danbri
Copy link
Contributor

danbri commented Aug 16, 2022

I have started a design document with several options for addressing this use case, based on the discussions so far.

It is long and a rough draft, and I haven't checked it over for mistakes and repetition but I figure best to share in initial form here:

Indicating mobile version of a site's URL

I think the points made here about the relationship with HTML level metadata, especially from typed links, are important. However I do believe it is worthwhile being able to map such information into Schema.org, just as we can do with iCal files and other dedicated domain-specific formats.

At this point (since the draft is so rough) I would discourage detailed critiques, in case I've mis-stated something or just otherwise goofed up. That said, do please take a look! Schema.org has a lot of hidden baggage that constrains our options, and one lesson from this is that they ought to be represented explicitly somewhere.

@danbri
Copy link
Contributor

danbri commented Aug 16, 2022

FWIW "design 2" in the draft is I think pretty close to what @jdevalk is asking for here:

What if we created a type alternateURL, which in itself would have the properties URL, a property for MIME type and something to classify it, either rel or role or URLType or whatever, then we could allow that on Thing, and it could point to mobile URLs, RSS feeds, JSON representations of pages, etc.

However I couldn't make out if alternateURL was a type or a property here. With LinkRole the "alternate" aspect is covered by the linkRelationship, rather than baked-in vocabulary, hopefully leaving room for other cases to be handled without additional schema changes. The doc has an example of this, sketching a rel="canonical" from the mobile site back to the main one:

"relatedLinkSpec": [
    {
      "@type": "LinkRole",
      "fromLink": "https://www.example.com/product2213",
      "toLink": "https://m.example.com/product2213",
      "linkRelationship": "alternate",
      "media": "handheld"
    },
    {
      "@type": "LinkRole",
      "fromLink": "https://m.example.com/product2213",
      "toLink": "https://www.example.com/product2213",
      "linkRelationship": "canonical"
    }
  ]

See also IANA Link Relation Types registry and this Google blog post it references.

@jdevalk
Copy link
Contributor

jdevalk commented Aug 16, 2022

However I couldn't make out if alternateURL was a type or a property here.

I was thinking of a property at the time I think.

Reading your doc - and mind you I do appreciate all the explanation in it, it's super helpful, so thanks for that!! - I'm afraid that this is all overly complex. To be honest, whichever option we chose will only cause more people to "do it wrong". Given how hard people are finding things like hreflang, I think we really should be asking ourselves whether adding this level of complexity is worth it in any way. Let's face it: currently, an Offer can have one url. That url could just use the HTML rel="alternate" methodology to point to its handheld variant and then we'd have the same "result" for most parsers with far far less difficulty.

@jdevalk
Copy link
Contributor

jdevalk commented Aug 16, 2022

@alex-jansen I keep coming back to the fact that having a version of mobileUrl means we'll need a metaverseUrl at some point too, or another device type / context / thing. Maybe... we should just change what people should do here. If url has multiple versions for different device types, maybe we should just ask of implementors that the URL given in url will do a redirect to the best page for the current device type?

@jvandriel
Copy link

jvandriel commented Aug 18, 2022

I still feel adding web page alternate/media/hreflang meta data to schema.org is a route we shouldn't take (although I do understand the appeal of it).

Not only do tons of sites make horrific mistakes with the alternate/media/hreflang/robots/canonical info they publish but on top of that for many it will be close to impossible to get that information into their markup. Reason being that this type of information often isn't part of the info a CMS stores. More often than not information published in the <head> of a website is handled via different processes than the rest of the template system (or worse, it's in (semi) static template files) and due to this parts of/all alternate/media/hreflang info can be near impossible to get to, meaning that copying such info into any markup will be a technical no go, or end up being so costly to realize that it isn't financially viable.

Having said that, if we leave out the alternate/media/hreflang part of things I do feel that with a slight tweak we might be able to get what @alex-jansen is looking for while also accommodating @jdevalk's wish to be able to point to a json-ld representation of a page.

What if instead of adding mobileUrl we add a slightly more generic alternateUrl property* (like @jdevalk suggested) though without any of the additional fluff and with a description that mentions its value can be used to refer to an alternative representation of the resource as (for example) a mobile version on a separate (sub)domain or a json-ld representation of a web page (or in the future even a possible metaverse url)?

Given all the possible scenarios @danbri described in his document (thanks for that by the way, it really helped) I feel that this solution has the least downsides (mainly at the consumer side) while also keeping it simple enough that it actually might get adopted at a reasonable scale, without tons of sites making a mess of things.

<script type="application/ld+json">
{
  "@context": "https://schema.org/",
  "@type": "Product",
  "@id": "#some_offer_1234",
  "url":"https://www.example.com/product2213",
  "alternateUrl": "https://m.example.com/product2213"  
}
</script>
  • preferably we add this to WebPage (or as an alternative to Thing) as adding it to Product and/or Offer feels very off to me. Reason being so far we've been speaking about alternative representations of web pages and not so much the entities they contain and thus it doesn't make much sense to me to specify this information at the Product or Offer level, unless the domain of the property is Thing.

@jvandriel
Copy link

jvandriel commented Aug 18, 2022

Oh, and maybe alternateUrl shouldn't be a sub-property of url because if I understood @danbri's document correctly this could cause an issue in RDF?

@danbri
Copy link
Contributor

danbri commented Aug 18, 2022

Thanks everyone! Just jumping in on this last suggestion for now:

If all we write is

{
  "@context": "https://schema.org/",
  "@type": "Product",
  "@id": "#some_offer_1234",
  "url":"https://www.example.com/product2213",
  "alternateUrl": "https://m.example.com/product2213"  
}

This does not tell us anything different about the second URL, and code written in the past wouldn't know to look for it.

JSON-LD represents repetition of a property with JSON arrays, so we can just repeat several "url" values:

{
  "@context": "https://schema.org/",
  "@type": "Product",
  "@id": "#some_offer_1234",
  "url":  ["https://www.example.com/product2213", "https://m.example.com/product2213"]
}

This would make both URLs visible to code based on Schema.org 2011-2021. It would not tell us anything at all about what the difference between them amounted to, though.

I do very much hear you (all), and the concern that whatever we do doesn't add an additional layer of complexity and duplication to what we have in HTML. However I do continue to believe that there is value in having a way to write down in Schema.org (RDF, JSON-LD etc.) the information communicated by a collection of HTML pages. Schema.org does not itself say much about "what to say where", e.g. should an e-commerce site with 5 million pages declare their phone number or logo on each of those pages?

Any thoughts on RFC 9264?

It seems to pull together a kind of data model for the information scattered across pages in rel=xyz syntax. We are also pretty close here to the notion of Schema.org feeds/sitemaps/dumps, in which a single larger Schema.org file might summarize the contents of many different web pages.

@jvandriel
Copy link

jvandriel commented Aug 18, 2022

Before diving into the RFC documentation you shared @danbri, I have some questions I think are of relevance in regards to the feeds/sitemaps/dumps...

Why the desire for something new to accumulate repetitive info in as opposed to 'simply' digesting cross-page referenced entities?

To give a specific example (we all can easily come up with other ones as well):
When a product page contains multiple repetitive elements I tend to publish the minimum information required to satisfy the targeted consumer/feature and for the rest I simply refer to the entity I want to elaborate on via @id - wherever that entity lives (within the same site). Meaning elaborate information about:

  1. An organization will either be on its home page or on a dedicated about page.
  2. A brand can be found on a dedicated page about that brand
  3. Return policies - yet another dedicated page
  4. etc.

Now the only reason I tend to have websites publish the same info over and over again is because search engines have very specific requirements (as well as a lot of recommendations) to be eligible for their search result features and without including such repetitive info on each Product page there will be tons of errors/warnings and possibly even a loss of eligibility for the desired search result features.

As a publisher I'd be an extremely happy camper if I'd never have to write any repetitive info ever again if instead I could simply refer to any entity, no matter where it 'lives' on a site, e.g.

{
  "@context": "https://schema.org/",
  "@type": "Product",
  "@id": "#mainEntityOfPage",
  "name": "Some Product 1234"
  "url": "https://www.example.com/products/product2213",
  "brand": {@id": "https://example.com/brands/some_brand_1234#mainEntityOfPage"} // or "brand": "https://example.com/brands/some_brand_1234"
  "offers":
  {
    "@type":"Offer",
    "seller": {@id": "https://example.com/#mainEntityOfPage"} // or "seller": "https://example.com/"
    "hasMerchantReturnPolicy": {@id": "https://example.com/return-policy#mainEntityOfPage"} // or "hasMerchantReturnPolicy": "https://example.com/return-policy"
  }
}
</script>

What I've been wondering ever since the first mention of a sort of 'feed' is why wouldn't the above method be enough (outside of the fact search engines having to be willing to adopt cross-page referencing), why the need for an even more aggregated form?

Personal note:
As a long time publisher I'm pretty fond of being able to decide where my content lives - be it in html or any form of structured data, especially because in practice it's quite nice to have the structured data about an entity live at the same location where that entity is being described in html. Separating some of these due to being part of possible repetitive info would be uncomfortable to say the least (for now I'll skip on diving into some of the technical obstacles I see coming up the road until there's some sort of a draft).

@jvandriel
Copy link

jvandriel commented Aug 19, 2022

"This does not tell us anything different about the second URL"

I don't fully agree with that statement as IMHO url tells us what can be considered the primary url whereas alternateUrl mentions any alternatives. It just doesn't tell us what the purpose of those alternates is, and it's this part I was hoping consumers could figure out themselves (content negotiation?) without burdening publishers even more by forcing them to be more specific.

Now If that's too much to ask of consumers (which I can imagine) I'm not sure how to move on from here as I feel it's too much to ask of publishers as well due to the financial implications this will have (of course I'm not talking about large corporations here but more SME sized organizations), as well as the amount of technical issues this will most likely generate.

"and code written in the past wouldn't know to look for it."

Is this really that much of an issue? (sincere question, no sarcasm intended) The web evolves, technologies change, organizations adapt and move on to the next new thing/issue. IMO a new property isn't something so crucial consumers wouldn't be able to adapt without too much trouble - or at least not compared to the effort the rest of the web has to put in to successfully integrate alternate/media/hreflang meta data into their markup.

Again, I do understand the appeal of having such info in markup - heck, in the future I might even find a purpose for it in my own projects as well - though I remain sceptical to create vocabulary for it if the intent is to have this be adopted at scale based on the amount of websites that to this day still have issues implementing alternate/media/hreflang/canonical/robots meta data in the <head> of their web pages.

@jonoalderson
Copy link

jonoalderson commented Aug 19, 2022

To add some context to the technical/logistical/resourcing difficulties around that, even in Yoast SEO (in WooCommerce and Shopify) we'd struggle to help users to describe the differences between (and varying roles of) their alternative URLs.

There are huge UI / definition challenges around this, and absolutely zero standardization in how such variations are handled at present. WordPress (rightly?) doesn't even have a concept of different URLs for different intended device consumption.

That'll realistically mean no correct/meaningful adoption across the whole of WordPress and Shopify.

Sure, other platforms exist. But if this approach is hoping for meaningful adoption, a significant portion of the web isn't designed to be able to handle it. As Jarno points out, hreflang is 'hard' enough, and this is an even more complex version of the same type of problem.

I think signposting the existence of an alternate URL is a sensible middle-ground; and let the respective WebPage describe what it is/contains/represents (and if/how it's intended to be device-specific).

@Tiggerito
Copy link

Just playing with combining some ideas.

How about an alternateUrl property that accepts URL and LinkRole.

And some changes to LinkRole:

  • 'toLink' re-uses the 'url' parameter.
  • 'fromLink' becomes 'source' which can be a URL or a Thing (which should contain a url property)
  • if LinkRole is contained in a property its 'source' is assumed to be the entity it is contained in.

An example including the different ways links can be added to alternateUrl:

<script type="application/ld+json">
{
  "@context": "https://schema.org/",
  "@type": "Thing",
  "@id": "#this"
  "url": "https://www.example.com/product2213",
  "alternateUrl": [
    "https://other.example.com/product2213",
    {
      "@type": "LinkRole",
      "url": "https://m.example.com/product2213",
      "linkRelationship": "alternate",
      "media": "handheld"
    }
  ]
}
</script>

And where LinkRole could be used on its own:

<script type="application/ld+json">
{
  "@context": "https://schema.org/",
  "@graph": [
    {
      "@type": "LinkRole",
      "source": "https://m.example.com/product2213",
      "url": "https://www.example.com/product2213",
      "linkRelationship": "canonical"
    },
    {
      "@type": "LinkRole",
      "source": {
        "@id": "#this"
      },
      "url": "https://www.example.com/product2214",
      "linkRelationship": "next"
    }
  ]
}
</script>

@alex-jansen
Copy link
Contributor Author

alex-jansen commented Aug 24, 2022

This is a great discussion on an obviously complex topic, also illustrated by previous more or less failed attempts.

It might be worthwhile to note that at Google we do see a substantial percentage of eCommerce sites with separate mobile-optimized versions, which is why we provide the ability for merchants to supply separate mobile URLs in our Product data specification. Other players such as Meta and Microsoft support the same attribute as well. I hope this at least illustrates that supporting alternate URLs would be a valuable addition to Schema.org. Another option would be using AI or other mechanisms to infer which sites are mobile versions of other sites, but allowing the explicit identification of related URLs in the structured data markup itself seems more predictable and gives more control to site owners.

Taking @jonoalderson's comment, this discussion indeed goes beyond just mobile-optimized pages. At Google we for example also support store-specific offer URLs for merchants with physical stores.

Taking @jdevalk proposal then to add a new additionalUrl property with the ability to optionally specify its role for those sites that can support it seems eminently reasonable to me and worth iterating on.

@jdevalk
Copy link
Contributor

jdevalk commented Aug 24, 2022

Thanks @alex-jansen!

I think the challenge we have with an additionalUrl property is to make it meaningfully useful. I see @Tiggerito's idea above and I'm honestly intrigued, though his examples are not the things I'm thinking of most. prev and next have relatively few actual use cases in my eyes. hreflang on the other hand would be a biggy.

{
  "@context": "https://schema.org/",
  "@type": "Thing",
  "@id": "#this"
  "url": "https://www.example.com/product2213",
  "alternateUrl": [
    "https://other.example.com/product2213",
    {
      "@type": "LinkRole",
      "url": "https://m.example.com/product2213",
      "linkRelationship": "alternate",
      "media": "handheld"
    },
    {
      "@type": "LinkRole",
      "url": "https://www.example.com/es/product2213",
      "linkRelationship": "alternate",
      "inLanguage": "es"
    },
    {
      "@type": "LinkRole",
      "url": "https://www.example.com/fr/product2213",
      "linkRelationship": "alternate",
      "inLanguage": "fr"
    }
  ]
}

Even better, readibility wise, would be to add two types of link relations, alternate-language and alternate-device:

{
  "@context": "https://schema.org/",
  "@type": "Thing",
  "@id": "#this"
  "url": "https://www.example.com/product2213",
  "alternateUrl": [
    "https://other.example.com/product2213",
    {
      "@type": "LinkRole",
      "url": "https://m.example.com/product2213",
      "linkRelationship": "alternate-device",
      "media": "handheld"
    },
    {
      "@type": "LinkRole",
      "url": "https://www.example.com/es/product2213",
      "linkRelationship": "alternate-language",
      "inLanguage": "es"
    },
    {
      "@type": "LinkRole",
      "url": "https://www.example.com/fr/product2213",
      "linkRelationship": "alternate-language",
      "inLanguage": "fr"
    }
  ]
}

@jvandriel
Copy link

jvandriel commented Aug 24, 2022

"At Google we for example also support store-specific offer URLs for merchants with physical stores."

I've seen/dealt with this on ecommerce websites in the past (same goes for the mobile urls as well as urls created for different merchant platforms) though in many occasions this involves a custom solution either created by the organization itself or a third party (sometimes even including external managing tools and hosting). Many of these solutions often are 'hacked' into an already existing platform and therefore more or less operate outside the regular processes of how a website is generated (marketing tends to disrupt internal development processes) - which admittedly doesn't have to be any issue for single language/country sites at all. The real trouble starts when there's internationalization involved and one has to start expressing these 'exceptions' via alternate/media/hreflang in the <head> as the rest of a site's processes often can't get all the data they need to do so successfully (especially fun when it involves 3rd party solutions).

And it's exactly these types of exceptions where the issues with expressing link roles happen a lot. Not so much in regards to the urls provided via any product XML feed (as these are custom) but getting them to 'fit' into what's being expressed in the <head> of the main site. And because of this I'm so hesitant to the whole idea of expressing link roles in any markup as well.

Again, I rarely see this cause any issues for any product feeds nor for the 'exception' pages themselves but it mostly happens everywhere around it (not necessarily an issue for the merchant/advertisement platform though it can be quite detrimental for your organic search results).

@jvandriel
Copy link

jvandriel commented Aug 24, 2022

Maybe it's a naive question @alex-jansen but is there any chance you have any insights into what % screws up their alternate implementations?

Reason I ask is because I'm aware my opinion is likely biased by my profession as I've mainly worked on websites that make mistakes (I'm not called upon if all goes well) and so it could well be that % isn't as big as my experiences make me believe.

@danbri
Copy link
Contributor

danbri commented Sep 20, 2022

Thanks everyone for all the discussion here, and for your insights and perspective.

Here's what I suggest we do:

  1. For the next upcoming release of Schema.org we do not address this issue - it clearly needs more work.
  2. Anything we do towards adding "this is the mobile-specific version" expressivity should be within a larger context centred on giving a round-trippable Schema.org representation of all IANA-typed and "rel=foo" relationship types.
  3. Note that LinkRole was substantially in this direction, but is tangled up in its own larger project (creating nodes that represent graph edges) which also needs some attention, modernisation (and possibly retiring/deprecating).
  4. Note the concern expressed in this issue that sites already publish typed link information in structures outside of their JSON-LD, RDFa and Microdata Schema.org structured data, so there is risk of confusion, forking etc from having equivalent expressivity in Schema.org.
    • This is a fair concern, but if we allow it to dominate over other considerations, Schema.org would not exist. At launch, Schema.org had schemas that covered areas already modeled elsewhere (calendars, people descriptions, image formats, e-commerce, bibliographic metadata etc etc.). At its best, Schema.org provides some unification and integration across disparate domains, and shields web publishers from having to navigate dozens of independent small schemas. To do this well requires collaboration, transformations and documentation of equivalencies (mappings). We can aspire to do the same with link types, and should expect to be judged on whether our work empowers convertors that can generate Link: headers or HTML from Schema.org, and vice-versa.
  5. Specifically, adding expressivity to Schema.org does NOT mean that all schema.org publishers and consumers have to understand that expressivity everywhere. While we DO expect that vocabulary should be used, it is for consuming applications and the wider ecosystem to decide where it is sensible to use the markup.
    • A good example here is the vocabulary around https://schema.org/Reservation which was added primarily for use in email. It gave us some vocabulary under https://schema.org/Trip which can be used for email-extraction, but are also applicable for applications working with public travel/tourism data. We should be able to improve Schema.org's expressivity without it being understood as necessarily creating work for all publishers, or being used in specific publishing situations.

If this seems a viable way forward, I will set things moving. I think it is at least clear that this issue is unresolved and should not delay our upcoming release.

@danbri
Copy link
Contributor

danbri commented Sep 20, 2022

Responding to @jvandriel's question separately:

is there any chance you have any insights into what % screws up their alternate implementations?

Speaking from a Google perspective, we do not have numbers to share at this point. We can say it fits the general longstanding pattern with Schema.org, which is that without supporting tooling, documentation and consuming applications, data quality can be pretty rough. When there are incentives and support, and ideally a clear link to business priorities, data is almost always much better.

My understanding is that in the "mobile URL" case, what we see fits this pattern. Google's Shopping Feeds infrastructure is associated (historically at least) with a commercial relationship vendors have with Google for which their data is central. And the feed data is generally pretty good, because the data is used and useful and tied to costs and expectations that are carefully monitored.

@jdevalk
Copy link
Contributor

jdevalk commented Sep 20, 2022

@danbri Assuming that you want to hear from those involved in this thread, per your:

If this seems a viable way forward,

So: I think you're right in your conclusions.

@danbri
Copy link
Contributor

danbri commented Sep 20, 2022 via email

@WeaverStever
Copy link

Glossing over the discussion, I am reminded that Google uses positioning for image expected aspect ratios.

"image": [
        "https://example.com/photos/1x1/photo.jpg",
        "https://example.com/photos/4x3/photo.jpg",
        "https://example.com/photos/16x9/photo.jpg"
       ],

So a simple fix might be...

"url": [
        "https://example.com/home",
        "https://m.example.com/home"
       ],

@jvandriel
Copy link

jvandriel commented Oct 12, 2022

"If this seems a viable way forward,"

My hesitations are solely based on the fact I foresee all kinds of technical obstacles for website owners due to limitations originating from how different content management systems work. As long as we take this into account during further discussions I'm all for continuing the discussion as I do see value in getting this resolved.

In the meantime I'll try to reach out to several companies that have specialized crawling software in hopes of having them join future discussions so that we can make sure that whatever it is we come up with (outside what is mentioned in the html) can be checked and verified.

@github-actions
Copy link

This issue is being nudged due to inactivity.

@github-actions github-actions bot added the no-issue-activity Discuss has gone quiet. Auto-tagging to encourage people to re-engage with the issue (or close it!). label Jan 11, 2023
@danbri
Copy link
Contributor

danbri commented Jan 11, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
no-issue-activity Discuss has gone quiet. Auto-tagging to encourage people to re-engage with the issue (or close it!). Queued for Staging (webschemas.org) Editorial work provisionally complete; ready for final review/checks.
Projects
None yet
Development

No branches or pull requests

10 participants