Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add existing CreativeWork as an expected type for existing description property #2942

Closed
gmackenz opened this issue Aug 20, 2021 · 21 comments
Closed
Assignees
Labels
no-issue-activity Discuss has gone quiet. Auto-tagging to encourage people to re-engage with the issue (or close it!). Queued for Editorial Work Editor needs to turn issues/PRs into final code and release notes.

Comments

@gmackenz
Copy link
Contributor

Currently the description property expects only Text. This is insufficient if there is additional information that needs to be conveyed with the text string (like authorship, licensing information, expiration date, etc.). I would propose we allow descriptions to also be CreativeWorks which has such properties allowing attaching various annotations to a text string.

This is proposal comes from Google to allow us to better handle/process description text for media and other content.

@RichardWallis
Copy link
Contributor

I believe I can see where this is coming from - presuming that this is referring to the use case of types of Things (eg. Products) that do not have such properties, and not CreativeWorks / CreativeWork subtypes (as they already have those properties).

However, it is not that simple. For example, the expires property in a CreativeWork linked from the description property of a Product would describe when the CreativeWork will expire, not the Product it is linked from. The same is true for the other suggested as useful properties, they are describing the CreativeWork, not the Thing that has that CreativeWork as its description.

I can see the benefit of being able to have a CreativeWork as a description - eg. a BlogPosting acting as a quality description of a Place. But it would be logically incorrect to assume that properties of a linked CreativeWork description [other than its textual content] expand the information about the entity that links to it.

@WeaverStever
Copy link

Isn't this what the https://schema.org/about does for us?

@RichardWallis
Copy link
Contributor

about and description could be considered to be inverse of each other in this context, yes. As in the content of the [description] CreativeWork will be 'about' the Thing it describes. For mark up simplicity you might want to explicitly assert the relationship from the Thing end.

However, this bit of semantics doesn't address what appears to be the main suggestion in this issue - that properties (expires, author, license, etc.) of the describing CreativeWork, would be informational as to the status of the Thing it is describing.

@gmackenz
Copy link
Contributor Author

Thanks for the feedback Richard. That was the intent, the string of text is a CreativeWork in of itself with attributes that are not about the Product/TVEpisode this text string is a description of. The properties on CreativeWork would add additional context/information just about the string of text.

Is there another way we can attach dates, licensing, sources to a string value? Would we need a new type to capture the text value and additional information about the text? What would that look like? PriceSpecification?

@RichardWallis
Copy link
Contributor

@gmackenz Now you explain it that way I am more comfortable.

However, are you not proposing what is already possible via the subjectOf property?

@gmackenz
Copy link
Contributor Author

@RichardWallis I don't believe that would work as that is for the object and not applicable directly for the description text of that object. So for example: a TV programming data feed using schema.org markup for a TVEpisode for a TV program that includes a description string supplied from a NBC source that is only contractually available for 6 months. We can't capture that data currently in schema.org as far as I can tell. I don't believe subjectOf would work for that usage.

@RichardWallis
Copy link
Contributor

@gmackenz now you flesh out that scenario it appears a bit too specific for linking to via the description property.

A simple Text description, as you indicate, has no information about its authorship, licensing, expiry date etc. Hence to capture that sort of thing you are looking to a CreativeWork or subtype thereof.

With current Schema capability that would be a CreativeWork (eg. Article) that would be about the program. Via the inverse the program's markup could indicate that it is subjectOf the article.

The only difference I am picking up from your description of the need, is that somehow you want to indicate that the relationship between the entity (TV Program) and its description will expire. Or is it you want the link to remain but indicate that the descriptive text has past its expires data.

@github-actions
Copy link

This issue is being nudged due to inactivity.

@github-actions github-actions bot added the no-issue-activity Discuss has gone quiet. Auto-tagging to encourage people to re-engage with the issue (or close it!). label Nov 22, 2021
@gmackenz
Copy link
Contributor Author

gmackenz commented Apr 7, 2022

Revisiting this as there is growing need for Google to have a means of annotating text with additional information (localizing restrictions, time of use, etc.) for the 'description' text that may be provided by content providers for MediaObjects (mostly VideoObjects for now such as Films, TV Shows, TV Episodes, etc.).

Either we can expand the domainIncludes of description to include both Text and CreativeWork and use the latter's properties for additional information about the description text content.

:description a rdf:Property ;
    rdfs:label "description" ;
    :domainIncludes :Thing ;
    :rangeIncludes :CreativeWork,
        :Text ;
    rdfs:comment "A description of the item." ;
    owl:equivalentProperty dc:description .

Or we could instead expand the domainIncludes of description to include both Text and a new TextObject subtype of MediaObject. A new MediaObject subtype of TextObject would allow us to have specific properties later in regards to text as opposed to audio, video, etc.

:TextObject a rdfs:Class ;
    rdfs:label "TextObject" ;
    :isPartOf <https://pending.schema.org> ;
    rdfs:subClassOf schema:MediaObject ;
    rdfs:comment "A text file." .

:description a rdf:Property ;
    rdfs:label "description" ;
    :domainIncludes :Thing ;
    :rangeIncludes :Text,
        :TextObject ;
    rdfs:comment "A description of the item." ;
    owl:equivalentProperty dc:description .

@philbarker
Copy link
Contributor

See also #1379 and #276 and suggestion of treating the (structured) abstract of an academic work as a CreativeWork.

@RichardWallis
Copy link
Contributor

In principle I support your second (TextObject) option - using some of the general CreativeWork subtypes such as VideoObject or Sculpture would not be particularly useful as a description.

In practice if we implemented TextObject, we should handle at least in the description of the type, some of the inevitable questions that would arise. Such as, is this type for plain text only or is html, markdown, or other text formats acceptable.

gmackenz added a commit to gmackenz/schemaorg that referenced this issue Apr 14, 2022
Add a new MediaObject subtype of TextObject. Add TextObject as a new expected type of the property description. schemaorg#2942
@gmackenz
Copy link
Contributor Author

gmackenz commented Apr 14, 2022

In practice if we implemented TextObject, we should handle at least in the description of the type, some of the inevitable questions that would arise. Such as, is this type for plain text only or is html, markdown, or other text formats acceptable.

So what is allowed for Text currently?

I believe TextObject's text should be pretty much text format-agnostic.:

"A text file. The text can be unformatted or contain markup, html, etc."

Edited my contribution: 2c3a595

@danbri
Copy link
Contributor

danbri commented Jun 27, 2022

Let me throw something into this mix, in spirit of @WeaverStever's earlier

Isn't this what the https://schema.org/about does for us?

(we also have an inverse of that, called https://schema.org/subjectOf )

My sense is that there is value in pursuing two related directions:

1.)

Firstly per @gmackenz proposal, noting that alongside MediaObject, VideoObject, ImageObject, ... there is quite naturally a TextObject, for the subset of MediaObject instances whose content is human-readable text.

This is tricky. Markdown perhaps qualifies due it being mildly-formalized email style plain text with a bit of structure. But its link syntax is more for machines, and it is entirely possible to write very obfuscated markdown. SGML/XML-based formats are textual but with a lot of machine structure. So we might consider StructuredTextObject as a subset.

2.) We also have, as @MatthiasWiesmann points out (/cc @alex-jansen) the /Text type has potential to be cleaned up, both at its lowest levels (escape chars.), best practice documentation, normalization etc (consider the escape chars issue for JSON-LD inside HTML), ... but also in terms of having subtypes of Text for things like MarkdownText or MustacheTemplateText. I would like to have better options for https://developers.google.com/search/docs/advanced/structured-data/dataset to use instead of speculatively trying to treat all descriptions that look like Markdown as Markdown. An important part of this is also HTML and XML markup within property values, and the differences between RDFa, JSON-LD and Microdata in these.

It would not make sense to have schema types for every type of text-based format ever - many cases will be better served by https://schema.org/encodingFormat - but for common cases like markdown and simple HTML might deserve special casing (as Text subtypes and as TextObject subtypes?)

Thoughts?

@smrgeoinfo
Copy link

Seems like having an 'encodingFormat' property on text object would be a good, extensible way to indicate how to interpret the bitstream for the TextObject. Might be even better if it were a subProperty - 'textEncodingFormat'.

@mfhepp
Copy link
Contributor

mfhepp commented Jul 22, 2022

I think this is a tricky issue, and a quick, pragmatic fix can lead us into a lot of trouble. Text content is used everywhere in markup, and we can easily end up in cyclic dependencies and inconsistencies. Just a few examples:

  • in RDF syntaxes, text content can have a language tag, but any richer type for "text plus meta-data" could easily introduce multiple ways of expressing the same thing.
  • Text is such a basic type of content that adding a level of indirection here will add a lot of burden to consumers of data and break many legacy applications.

IMO, there are multiple separate issues to be addressed, at least the following:

  1. Being able to attach 'stable' meta-data like licensing, authorship, etc. to a textual representation.
  2. Versioning / temporal aspects: When has this text been published, crawled, etc. (for temporal aspects in Knowledge Graphs, see e.g. the nice paper by Franz Krause et al.. This is orthogonal to the first problem and it may be difficult to be clear about the scope of any such meta-data (is the product description updated? or the product? or the offer?).
  3. Formal links between derivative works, translations, different syntaxes of the same text, like English vs. German product descriptions, long vs. short descriptions; plain text vs. Markdown vs. HTML.
  4. Being able to use richer syntaxes in text content, like Markdown and HTML while being able to indicate the syntax (e.g. adding a Media Type.

I am sure there are more.

For all of these, there are potential dependencies on

Before we open a can of worms, we should, IMO, either

  • develop a minimal, generic solution that is conceptually clean, or
  • limit the extension to a narrow application area, e.g. by just allowing a new text value for product descriptions etc.

Ping @danbri

@HughP
Copy link

HughP commented Jul 22, 2022

When I hear "text Encoding" I think "ASCII", "UTF-8", etc. not "text markup" such as HTML, Markdown, etc.

@danbri danbri self-assigned this May 17, 2023
@danbri danbri added Queued for Editorial Work Editor needs to turn issues/PRs into final code and release notes. and removed no-issue-activity Discuss has gone quiet. Auto-tagging to encourage people to re-engage with the issue (or close it!). labels May 17, 2023
danbri added a commit that referenced this issue May 17, 2023
* Add a new expected type of TextObject

Add a new MediaObject subtype of TextObject. Add TextObject as a new expected type of the property description. #2942

* Updating description for TextObject

As per comment by Richard Wallis issue#2942#issuecomment-1091419590

---------

Co-authored-by: Dan Brickley <danbri@google.com>
Copy link

github-actions bot commented Oct 4, 2024

This issue is being nudged due to inactivity.

@github-actions github-actions bot added the no-issue-activity Discuss has gone quiet. Auto-tagging to encourage people to re-engage with the issue (or close it!). label Oct 4, 2024
@MatthiasWiesmann
Copy link
Contributor

This is become relevant again with the advent of AI generated text, we might need it to propagate the IPTC tags.

@alex-jansen
Copy link
Contributor

description already allows values of type TextObject, which has the digitalSourceType property for IPTC tags through inheritance from CreativeWork?

@danbri
Copy link
Contributor

danbri commented Oct 23, 2024 via email

@alex-jansen
Copy link
Contributor

Hi @danbri We indeed should not add a new type to the range of /description and I also don't think that is needed since /digitalSourceType was already added to /CreativeWork (and thus inherited by /TextObject) in Release 14.0 back in January. So I think we can actually close this issue, which is what I am doing now ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
no-issue-activity Discuss has gone quiet. Auto-tagging to encourage people to re-engage with the issue (or close it!). Queued for Editorial Work Editor needs to turn issues/PRs into final code and release notes.
Projects
None yet
Development

No branches or pull requests

11 participants