Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

/WebContent ? Introduce a common supertype of WebSite, WebPage, WebPageElement #2358

Open
danbri opened this issue Oct 3, 2019 · 10 comments
Assignees

Comments

@danbri
Copy link
Contributor

@danbri danbri commented Oct 3, 2019

Sometimes it does not matter whether you're pointing at a web site or a web page or a part of a web page. Although we already have the url property, it is awkward to use in cases where you also want to talk about the properties of the thing being linked to.

There's also a case to be made that the distinctions between site, page, and part of page are getting blurred, with single page apps loading content dynamically in the background, etc. While it is certainly sometimes useful to distinguish sites from pages and parts of pages, there are occasions when you want to say "there's a chunk of Web content [at url]" that has such-and-so properties.

I think we can address this fairly simply by adding a supertype into the hierarchy between CreativeWork and the trio of WebSite, WebPage, WebContent. The assumption would be that, without further information, the content would typically be expressed in the languages of the browser-oriented Web (HTML/CSS/JS etc.).

We could use encodingFormat or other properties to be more specific about simpler formats, for cases where rendering without a Web platform implementation is relevant.

@danbri danbri self-assigned this Oct 3, 2019
@WeaverStever

This comment has been minimized.

Copy link

@WeaverStever WeaverStever commented Oct 3, 2019

Not seeing a webContent type, is this what you are trying to accomplish?

<script type="application/ld+json">    
{
  "@context": "https://schema.org",
  "@type": "WebSite",
  "url": "https://www.example.com/",
  "potentialAction": {
    "@type": "SearchAction",
    "target": "https://query.example.com/search?q={search_term_string}",
    "query-input": "required name=search_term_string"
  },
  "mainEntity":{
    "@type": "Person",
    "@id": "https://www.example.com/#",
    "name": "Leroy Brown",
    "description": "baddest man in the whole damn town"
  },
  "about":[
    {
      "@type": "CreativeWork",
      "inLanguage": "es",
      "@id": "https://www.example.com/biography/#",
      "about": {
        "@type":"webPage",
        "url": "https://example.com/biography",
        "mainEntityOfPage": {
          "@type": "creativeWork",
          "url": "https://example.com/biography/headshot.jpg"        
        }
      }
    },{
      "@type": "CreativeWork",
      "inLanguage": "es",
      "@id": "https://www.example.com/tou/#",
      "about": {
        "@type":"webPage",
        "url": "https://example.com/tou",
         "mainEntityOfPage": {
           "@type": "creativeWork",
           "url": "https://example.com/tou/#toudiv"                
        }
      }
    }
  ]
}
 </script>
@danbri

This comment has been minimized.

Copy link
Contributor Author

@danbri danbri commented Oct 3, 2019

This is a draft plan for adding WebContent as a supertype of WebSite, WebPage, WebPageElement. It is separate from the Actions mechanism.

@WeaverStever

This comment has been minimized.

Copy link

@WeaverStever WeaverStever commented Oct 3, 2019

Not clear about "Actions mechanism", maybe the proposed name is throwing me off. Isn't everything already presumed to be a Website and Webpage?

So a "more specific type" of WebPageElement? Would this include something like a jquery slider that is not the mainEntityOfPage?

Or,

are we discussing output like rss, xml and json?

@danbri

This comment has been minimized.

Copy link
Contributor Author

@danbri danbri commented Oct 16, 2019

By "actions mechanism" I am referring to the approach we documented in https://schema.org/docs/actions.html and the associated pieces of vocabulary, including the Action type and its subtypes.

We do not presume everything is Website or Webpage. This project has schema definitions for hundreds of kinds of things that are described in such pages, from Person, and Event to Volcano and Organization.

The proposal here is to add a new term into the type hierarchy in between the very general "CreativeWork" and its rather specific "Website", "WebPage" and "WebPageElement" subtypes. You can look around at the subtypes of http://schema.org/Article (e.g. NewsArticle and its subtypes) to get a feeling for how that might look. The point is that sometimes it doesn't particularly matter which of WebSite, WebPage or WebPageElement we are concerned with. By analogy, sometimes we want to be able to just write "Article" without distinguishing between "NewsArticle" and "ScholarlyArticle". The suggestion is that by adding "WebContent" as a new type, we can fix this gap in the hierarchy which currently pushes us towards overly-general or overly-specific terminology.

@WeaverStever

This comment has been minimized.

Copy link

@WeaverStever WeaverStever commented Oct 16, 2019

@danbri

Thanks to the pointer to Action, i had forgotten it is in the schema, I don't use those features very often (i.e. viewcount etc).

Two things that are confusing me.

  1. The name WebContent seems too generic. To me, the phrase web-content usually means content that you purchase from a vendor, stock photos, or contracted body-content paid by the word.

  2. The https://schema.org/WebPageElement has a constraint,

  • "Instances of WebPageElement may appear as values for the following properties" > mainContentOfPage | WebPage

I'm not seeing an equivalence to WebSite and WebPage which do not have such constraints. At the top level, doesn't WebPage already cover what WebContent would cover? Or are you proposing "sub-typing" the output, such as PDF, RSS etc?

When I use WebPageElement (for rich content), I set an @id and url to correspond to my entry-point (index="myindex") on the page. In this manner, I can create a granular list of WebPageElement(s) within the mainContentOfPage section. For instance, with the emergence of "lazy loading," dynamically added content could inadvertently push my rich content to the bottom of the page where a spider might not read the entire page to discover my important element.

Perhaps discussing an Action, like tagging the best comment on a lazy loading page might be on topic here?

@Aaranged

This comment has been minimized.

Copy link

@Aaranged Aaranged commented Oct 23, 2019

I like where this is going, as to date schema.org really hasn't had a (relatively abstract) type that can be used to represent generic web-delivered content without the (somewhat structural, somewhat semantic) baggage that's associated with web page-delivered (i.e. HTML) content. This looks to go some way toward filling this need.

I characterize this as a "need" because the contemporary world of digital content operations is less and less concerned about content delivery endpoints, which continue to multiply (web sites, mobile applications, smart displays, smart speakers, etc.), and more about the structure and semantics of that content payload. This is epitomized by the growing popularity of headless content management systems, which are themselves a technical incarnation of the principles of intelligent content.

Intelligent content is content that’s structurally rich and semantically categorized and therefore automatically discoverable, reusable, reconfigurable, and adaptable. (After Rockley & Cooper, 2012).

This separation of presentation and data layers required by intelligent content has always been a strength of schema.org, as the schemas are by their nature both structured and semantic. However (understandably reflecting both the functional roots of the vocabulary and time it was launched), there has always been a certain HTML-centricity reflected in many type and property names (e.g. CollectionPage, mainEntityOfPage), and resulting in some inconsistencies in type framing and inheritance (e.g. Article is a sub-type of CreativeWork and of no "page", but FAQPage is a sub-type of WebPage, where - I would argue - the abstracting an "article" but binding "frequently asked questions" to HTML is liberating for the former, and unnecessarily constricting for the later).

And I think this part of this proposal from @danbri speaks well to this principle (emphasis mine):

There's also a case to be made that the distinctions between site, page, and part of page are getting blurred, with single page apps loading content dynamically in the background, etc. While it is certainly sometimes useful to distinguish sites from pages and parts of pages, there are occasions when you want to say "there's a chunk of Web content [at url]" that has such-and-so properties.

Indeed - and with the understanding that "Web content" here refers to "content delivered over the World Wide Web, rather than "web page content".

And for those of us now toiling in the trenches of intelligent (a.k.a. structured, a.k.a. connected) content the prospect, as outlined by the current description of the proposed type in Pending, is precisely the sort of pivot to endpoint-agnostic content schemas that has been long desired (again, emphasis mine).

The WebContent type makes it easier to describe Web-addressable content without requiring such distinctions to always be stated. (The intent is that the existing types WebPage, WebSite and WebPageElement will eventually be declared as subtypes of WebContent. )

To the suggestion that "We could use encodingFormat or other properties to be more specific about simpler formats" ... sure? The question mark just because I'm not sure how important this is either at from a content model perspective (where encoding is a moot point) or from a content artifact perspective (where the encoding is part and parcel of instancing a content artifact for a particular endpoint) - but I can't see that supporting this level of expressiveness is in any way problematic.

So all in all a big thumbs-up for this proposal, and for building on this approach, conceptually, as new types are considered. I'd even go so far as to say that this type approach is necessary is to remain useful as the digital world moves further away from single-serve pieces of HTML content and closer to a flexible universe of API- and query-driven multi-format content.

@MichaelAndrews-RM

This comment has been minimized.

Copy link

@MichaelAndrews-RM MichaelAndrews-RM commented Oct 23, 2019

I support creation of a webContent supertype because some content has a generic purpose and is not created to be delivered in a specific container. Feeds of content can be consumed by different applications and presented in different formats. Many APIs provide just a tile, a description and a body for text.

That said, I also feel the ultimate value of having a supertype is that it can enable more specificity in schema.org for content-related subtypes. I believe we ultimately need more subtypes relating to web content, which can indicate the different purposes of content. Having a broad supertype could open up possibilities for more subtypes.

Much web content does have specific roles, which can be identified by subtypes.

When content has a specific role, it can support user interaction and enable user agents perform specific actions. This is becoming more important as content becomes more multi-modal (able to be interacted with using different means.) VR is web delivered, but has unique characteristics that need to be identified for agents to interact with such content.

Already there are web-delivered content subtypes identified in accessibility standards such as ARIA relating to web content such as widgets. Epub standards refer to scriptable components.

Epub identifies specific roles for content such as "flow" that indicate structure and relationships between different content parts.

Another possibility is to indicate user features that are broader than individual data. For example, a collection of definitions would be a glossary.

The value of greater specificity in subtyping web content would allow bots to assemble content from different sources in a way that is coherent to human readers.

@mrcruce

This comment has been minimized.

Copy link

@mrcruce mrcruce commented Oct 23, 2019

Nicely stated on directionality of agnostic, decoupled content. I would be very hesitant to endorse including "Web" as the overbearing qualifier for "Content". Especially in lineage WebPage. I would rather see us head towards ContentModule, or ContentContainer or a similar more primitive vein.

Per @danbri -

The proposal here is to add a new term into the type hierarchy in between the very general "CreativeWork" and its rather specific "Website", "WebPage" and "WebPageElement" subtypes.

It's very true. The existing hierarchy doesn't allow for general-use 'endpoint-agnostic content'.

There is a schema.org precedent for native content forms...and that is Object.

For example, AudioObject, VideoObject, MediaObject.

The existing 'Object' structure is notable for consideration for native decoupled content.

Perhaps... ContentObject or, if limited to text, TextObject? It naturally follows, that it can be 'embedded in a web page or a downloadable dataset'. It can accrue properties, such as 'Speakable'.

Topics could be semantic properties of ContentObjects, e.g. HealthTopic. And the more semantic extensions enabled on ContentObject, the greater the flexibility for machine recombination of modular forms into new composites.

WebPage, WebSite, and WebPageElement could all contain ContentObjects or TextObjects.

Because the Web is but one channel among many for content, and all need native content forms available for object-oriented reuse, devising the right container form for modular content becomes a pivotal design decision.

If the plan is to solidify a container for modular content at the elemental level, we should go ALL IN on actual decoupled forms. I suggest intentionally, and unabashedly, dropping all ties to WebPage for these more fundamental units of content that can be used widely off-Web.

Yes, a lot of content has a 'generic purpose and is not created to be delivered in a specific container'. And content objects need to transit multiple modes, channels, and rendering formats across a lifecycle. Therefore, decoupled content objects need metaphors and heuristics free of labels or ties to any particular mode, channel, or format.

One more suggestion. Perhaps make accommodation for ContentObjectVariants of a root ContentObject? Variants will allow machines to explicitly understand the relationship of a ContentObject with its relatives that convey related meaning in different ways (for different audiences or renderings, for example).

Building towards an abstracted ContentObject structure gives maximum future utility with a minimum of duplication.

Long live decoupled, object-oriented, metadata-rich content. It's out of these building blocks, the future of all customer experience and education takes form.

@eaton

This comment has been minimized.

Copy link

@eaton eaton commented Oct 23, 2019

If the plan is to solidify a container for modular content at the elemental level, we should go ALL IN on actual decoupled forms. I suggest intentionally, and unabashedly, dropping all ties to WebPage for these more fundamental units of content that can be used widely off-Web.

I tend to agree. I think the concept of a web site is important and meaningful — i.e., "This information describes CNN.com, not a particular section of it or a particular article or even the home page per se, but the place on the web called CNN.com, whose URI happens to be http://cnn.com." Once we go below that, though, it seems that collisions quickly occur between the media-forms @mrcruce is talking about (AudioObject, MediaObject, etc), their underlying semantic purpose (Event, Person, Review, etc), and the particular representation of them (WebPage, WebPageElement).

I confess that my usage of schema.org and my familiarity with the type hierarchy has always focused heavily on describing the semantic purpose of specific elements inside overall site structures, rather than the construction hierarchy of a site itself. Is the intent here to provide a richer set of types for things that exist on the web but lack more specific semantic purpose? Or is it to clearly capture that a given thing (say, a Review in the form of a Video) was delivered as a WebPageElement, or a WebPage, or more broadly a WebContent?

This kind of question often comes up when developing deep/rich type hierarchies — inevitably there are discrete things that are many types at once, and then you get C++…

@eaton

This comment has been minimized.

Copy link

@eaton eaton commented Oct 23, 2019

I apologize if I'm talking myself through what is obvious to everyone else — but I've got this bone and i'm gnawing on it, now!

The kind of metadata we're talking about is poorly-suited for describing things that are simultaneously several distinct things. (For example, a Review that is a Video that is the sole WebPageElement on a WebPage that is the only content on a single-page WebSite). Because of that, the hierarchical types function as a description of how the thing is being thought of at the moment rather than a description of its inherent thingness.

From that perspective, there's no inherent conflict between delivery-wrapper-oriented types like WebContent and WebPage and WebPageElement; media-oriented types like VideoObject and AudioObject; and purpose-oriented types like Review or Event. They describe different distinct aspects of a given thing, and they do that well.

As we pit WebContent against more delivery-agnostic type concepts like TextObject, are we asking the type hierarchy to do too much? Are we trying to stuff multi-inheritance into a single-inheritance model? I think that we are, but I also suspect I might be missing something important.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants
You can’t perform that action at this time.