New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a generic Property-Values mechanism for long-tail use (e.g. e-commerce, EXIF) #263

Closed
danbri opened this Issue Jan 22, 2015 · 30 comments

Comments

Projects
None yet
6 participants
@danbri
Contributor

danbri commented Jan 22, 2015

@danbri

This comment has been minimized.

Show comment
Hide comment
@danbri

danbri Jan 29, 2015

Contributor

Used in #262 for Automobiles.

Contributor

danbri commented Jan 29, 2015

Used in #262 for Automobiles.

@danbri

This comment has been minimized.

Show comment
Hide comment
@danbri

danbri Feb 10, 2015

Contributor

I've asked Martin to break out the property-value piece from his larger proposal (currently all in one branch).

Contributor

danbri commented Feb 10, 2015

I've asked Martin to break out the property-value piece from his larger proposal (currently all in one branch).

@mfhepp

This comment has been minimized.

Show comment
Hide comment
@mfhepp

mfhepp Mar 11, 2015

Contributor

I just created an individual pull request for the property-values contribution for better tracking, see here:

#377

As for the comments raised by Tom Marsh in https://lists.w3.org/Archives/Public/public-vocabs/2015Jan/0004.html:

"I am supportive of adding the proposed property-value and EXIF changes into schema.org, but I would like to see them separated out from the other changes so we can approve and incorporate them independently and so that there is a clearer change history for people to follow in GitHub."

This is implemented with this pull request.

"Assuming we make this change, however, I think it is essential that we provide guidance on when it is acceptable to use these constructs. In particular, if publishers start to use property-value pairs where there are equivalent schematized properties, it will significantly dilute the value of the vocabulary. Therefore, I think we need to document a requirement that the name in the property-value pairs cannot match a schema.org property (it would therefore be considered invalid markup if the name did match)."

I added a note to the additionalProperty property, stating

"Note: Do not use additionalProperty if there is a specific property for this characteristic readily defined in schema.org."

As for prohibiting a property-value pair with a name that exists in schema.org, I am not recommending that, because

a) publishers may have properties in their local databases that accidentally clash with schema.org names (e.g. from a table of 200 product features for a technical product). Catching those and implementing a specific handling can be a bit challenge for implementers.

b) the local database schemas might use names for properties with a different meaning (e.g. weight for the package weight).

It is clear that we should encourage publishers to use specific properties when possible and that the mechanism must not be used to dilute the core vocabulary. I think the current proposal strikes a balance.
We might want to clarify this in a blogpost or implementation notes for this feature.

"I think we should also have an informal agreement as a community that we will make additions to the vocabulary for any properties that turn out to be widely used in property-value pairs so that we can encourage more normalized and consistent representations."

I agree, but this is something that should be mentioned in a blogpost or implementation notes for this feature.

Contributor

mfhepp commented Mar 11, 2015

I just created an individual pull request for the property-values contribution for better tracking, see here:

#377

As for the comments raised by Tom Marsh in https://lists.w3.org/Archives/Public/public-vocabs/2015Jan/0004.html:

"I am supportive of adding the proposed property-value and EXIF changes into schema.org, but I would like to see them separated out from the other changes so we can approve and incorporate them independently and so that there is a clearer change history for people to follow in GitHub."

This is implemented with this pull request.

"Assuming we make this change, however, I think it is essential that we provide guidance on when it is acceptable to use these constructs. In particular, if publishers start to use property-value pairs where there are equivalent schematized properties, it will significantly dilute the value of the vocabulary. Therefore, I think we need to document a requirement that the name in the property-value pairs cannot match a schema.org property (it would therefore be considered invalid markup if the name did match)."

I added a note to the additionalProperty property, stating

"Note: Do not use additionalProperty if there is a specific property for this characteristic readily defined in schema.org."

As for prohibiting a property-value pair with a name that exists in schema.org, I am not recommending that, because

a) publishers may have properties in their local databases that accidentally clash with schema.org names (e.g. from a table of 200 product features for a technical product). Catching those and implementing a specific handling can be a bit challenge for implementers.

b) the local database schemas might use names for properties with a different meaning (e.g. weight for the package weight).

It is clear that we should encourage publishers to use specific properties when possible and that the mechanism must not be used to dilute the core vocabulary. I think the current proposal strikes a balance.
We might want to clarify this in a blogpost or implementation notes for this feature.

"I think we should also have an informal agreement as a community that we will make additions to the vocabulary for any properties that turn out to be widely used in property-value pairs so that we can encourage more normalized and consistent representations."

I agree, but this is something that should be mentioned in a blogpost or implementation notes for this feature.

@tmarshbing

This comment has been minimized.

Show comment
Hide comment
@tmarshbing

tmarshbing Mar 12, 2015

@mfhepp, the changes mostly look good to me. I added a few comments in the change itself. Beyond that:

  1. Do we really need unitText? An alternative would be to loosen the requirements for unitCode to allow either a code or the text that unitText allows. This seems more consistent with other areas where we allow either free-text or URIs.
  2. The example of sensor size (for multi-dimensional properties) seems like a stretch. The markup doesn't differentiate between multi-valued (e.g., the Ethernet and USB example) and multi-dimensional. How would a consumer know which it is meant to be? Assuming there is no way to distinguish, I would recommend that the multidimensional value be represented as a single value, such as "23.2 x 15.4".
  3. I still think we need to prohibit property-value pairs with names that exist in schema.org. For the extension mechanism being discussed (http://lists.w3.org/Archives/Public/public-vocabs/2015Mar/0034.html), we wouldn't allow extensions to reuse existing names. I don't see why we would make a different decision in this case, which is also, in some sense, an extension mechanism. It does add additional burden to the publisher (assuming they want to be compliant, of course), but I think we want them to spend the time to understand what parts of their data can be expressed natively in schema.org. PropertyValue should only be for the parts that aren't already in the vocabulary not as a way to get around or ignore the vocabulary.

tmarshbing commented Mar 12, 2015

@mfhepp, the changes mostly look good to me. I added a few comments in the change itself. Beyond that:

  1. Do we really need unitText? An alternative would be to loosen the requirements for unitCode to allow either a code or the text that unitText allows. This seems more consistent with other areas where we allow either free-text or URIs.
  2. The example of sensor size (for multi-dimensional properties) seems like a stretch. The markup doesn't differentiate between multi-valued (e.g., the Ethernet and USB example) and multi-dimensional. How would a consumer know which it is meant to be? Assuming there is no way to distinguish, I would recommend that the multidimensional value be represented as a single value, such as "23.2 x 15.4".
  3. I still think we need to prohibit property-value pairs with names that exist in schema.org. For the extension mechanism being discussed (http://lists.w3.org/Archives/Public/public-vocabs/2015Mar/0034.html), we wouldn't allow extensions to reuse existing names. I don't see why we would make a different decision in this case, which is also, in some sense, an extension mechanism. It does add additional burden to the publisher (assuming they want to be compliant, of course), but I think we want them to spend the time to understand what parts of their data can be expressed natively in schema.org. PropertyValue should only be for the parts that aren't already in the vocabulary not as a way to get around or ignore the vocabulary.
@mfhepp

This comment has been minimized.

Show comment
Hide comment
@mfhepp

mfhepp Mar 13, 2015

Contributor

@tmarshbing: Thanks!

  1. As for unitText: I would like to keep unitText and unitCode distinct. On one hand, we do want people to use proper UNCEFACT Common Codes when they can, because they are much more reliable and allow unit conversion etc. On the other hand, we want people to publish unit information as text when they cannot do better. Having two separate properties for this maintains backward compatibility with the original GoodRelations model, tools, and data, and reduces the task for a consuming client.

  2. I can update the example or we remove it for the moment.

  3. I am fine with both directions - it is more important that we have additionalProperty as soon as possible. For publishers who may have 200 properties for products to handle, I would however not introduce a barrier to filter them for matching properties in schema.org, because the core motivation for the approach is that most e-commerce sites have plenty of such data but cannot map it to standardized properties (there is a long argument on the W3C mailing list why this is so). Now, forcing them to filter this data for existing property names in schema.org takes away that simplicity. I would rather recommend that consuming clients ignore such properties, or give priority to schema.org properties, than to make such markup formally invalid. Note that the source systems store the properties as instance data, so it is not something that can be fixed at the schema level: We want to extend our shop software extension modules so that they can also expose product features without the shop operators to manually map their product features to any given standard, but the extensions can only access the database schemas for storing property-values, while the actual properties are defined at the shop level.

It all boils down whether you want a lot of such data or rather less and more conforming.

Since additionalProperty is essentially limited to Product and Place I recommend to avoid such a strong and formal requirement.

Martin

Contributor

mfhepp commented Mar 13, 2015

@tmarshbing: Thanks!

  1. As for unitText: I would like to keep unitText and unitCode distinct. On one hand, we do want people to use proper UNCEFACT Common Codes when they can, because they are much more reliable and allow unit conversion etc. On the other hand, we want people to publish unit information as text when they cannot do better. Having two separate properties for this maintains backward compatibility with the original GoodRelations model, tools, and data, and reduces the task for a consuming client.

  2. I can update the example or we remove it for the moment.

  3. I am fine with both directions - it is more important that we have additionalProperty as soon as possible. For publishers who may have 200 properties for products to handle, I would however not introduce a barrier to filter them for matching properties in schema.org, because the core motivation for the approach is that most e-commerce sites have plenty of such data but cannot map it to standardized properties (there is a long argument on the W3C mailing list why this is so). Now, forcing them to filter this data for existing property names in schema.org takes away that simplicity. I would rather recommend that consuming clients ignore such properties, or give priority to schema.org properties, than to make such markup formally invalid. Note that the source systems store the properties as instance data, so it is not something that can be fixed at the schema level: We want to extend our shop software extension modules so that they can also expose product features without the shop operators to manually map their product features to any given standard, but the extensions can only access the database schemas for storing property-values, while the actual properties are defined at the shop level.

It all boils down whether you want a lot of such data or rather less and more conforming.

Since additionalProperty is essentially limited to Product and Place I recommend to avoid such a strong and formal requirement.

Martin

@tmarshbing

This comment has been minimized.

Show comment
Hide comment
@tmarshbing

tmarshbing Mar 17, 2015

For 2, I think removing it would be fine. There are already lots of examples (which is great!!).

For 1 and 3, I'd like to get some additional perspectives from others before we decide. @danbri, for example, what are your thoughts?

My take for 1 is that it shouldn't break backward compatibility to add language saying that free-text is allowed on unitCode. Presumably, clients and tools already have to handle the case that the unit code is not recognized. This would more clearly define what the behavior should be in such cases. That said, if others also support adding unitText, I don't have a big problem with it.

For 3, I would prefer the "rather less and more conforming" version. To some extent, I see this analogous to the question of whether we would rather have a non-marked-up product details page or one that conforms to schema.org. Given a sufficiently sophisticated client, we can read the not-marked-up page, but markup makes it so that many more clients can successfully read the data. If we make it too easy for publishers to "just use name-value-pairs", I think we will end up in a situation closer to the not-marked-up page case for consumers since the names in the name-value pairs will have no agreed-upon meaning. To put it another way, if we end up with more total data (including name-value pairs) but less data mapped to the vocabulary, I think we've done the community a disservice.

tmarshbing commented Mar 17, 2015

For 2, I think removing it would be fine. There are already lots of examples (which is great!!).

For 1 and 3, I'd like to get some additional perspectives from others before we decide. @danbri, for example, what are your thoughts?

My take for 1 is that it shouldn't break backward compatibility to add language saying that free-text is allowed on unitCode. Presumably, clients and tools already have to handle the case that the unit code is not recognized. This would more clearly define what the behavior should be in such cases. That said, if others also support adding unitText, I don't have a big problem with it.

For 3, I would prefer the "rather less and more conforming" version. To some extent, I see this analogous to the question of whether we would rather have a non-marked-up product details page or one that conforms to schema.org. Given a sufficiently sophisticated client, we can read the not-marked-up page, but markup makes it so that many more clients can successfully read the data. If we make it too easy for publishers to "just use name-value-pairs", I think we will end up in a situation closer to the not-marked-up page case for consumers since the names in the name-value pairs will have no agreed-upon meaning. To put it another way, if we end up with more total data (including name-value pairs) but less data mapped to the vocabulary, I think we've done the community a disservice.

@mfhepp

This comment has been minimized.

Show comment
Hide comment
@mfhepp

mfhepp Mar 20, 2015

Contributor

I just removed the example for multidimensional values, see mfhepp@bd79ff8.

Contributor

mfhepp commented Mar 20, 2015

I just removed the example for multidimensional values, see mfhepp@bd79ff8.

@mfhepp

This comment has been minimized.

Show comment
Hide comment
@mfhepp

mfhepp Mar 20, 2015

Contributor

As for unitText vs. unitCode, I had a chat with Dan yesterday and explained why I have a pretty strong preference to keep the two properties:

  1. This allows us to keep markup for both the unitCode and a human-readable version of the unit, which can be useful in many cases.
  2. Historically, the main motivation for the whole mechanism and the unitText property was that there can be "refinement and lifting services" that take schema.org/GoodRelations data and enrich it. It is much easier to e.g. add a new triple with the proper unit code than to replace the unit text value with a unit code.
  3. We want to keep up the motivation for publishers to use UN/CEFACT code on QuantitativeValue and PropertyValue whenever they can, for that makes unit conversion etc. much easier.

So if you are fine with it, I would stick to unitText.

Contributor

mfhepp commented Mar 20, 2015

As for unitText vs. unitCode, I had a chat with Dan yesterday and explained why I have a pretty strong preference to keep the two properties:

  1. This allows us to keep markup for both the unitCode and a human-readable version of the unit, which can be useful in many cases.
  2. Historically, the main motivation for the whole mechanism and the unitText property was that there can be "refinement and lifting services" that take schema.org/GoodRelations data and enrich it. It is much easier to e.g. add a new triple with the proper unit code than to replace the unit text value with a unit code.
  3. We want to keep up the motivation for publishers to use UN/CEFACT code on QuantitativeValue and PropertyValue whenever they can, for that makes unit conversion etc. much easier.

So if you are fine with it, I would stick to unitText.

@mfhepp

This comment has been minimized.

Show comment
Hide comment
@mfhepp

mfhepp Mar 20, 2015

Contributor

As for avoiding the misuse of the new mechanism for existing schema.org properties: I also discussed this with Dan and we reached agreement that this should be handled in the documentation.

The current text says so pretty clearly; we should complement that by a blog post at the time of the release or afterwards.

I am against a strict handling of this, because of the following: One of the main use cases for this are shop and other e-commerce applications. In the past, we build or help others develop many extension packages for shop software, which are now running on 50 - 100 k shop sites with likely billions of products and offers. This was only possible because most of the extensions allow for "one-click" installations with a clever mapping from the internal db schemas to schema.org / GoodRelations, with no need for the shop owner to manually define complex mappings etc.

Now, in such software, the product features are typically defined by the shop owner or important from a vast amount of data sources, and products can have 30 - 200 of them.

Asking a developer to

  1. filter out property names that are "reserved" (not just for product but in schema.org as a whole) and
  2. heuristically map those to special markup (e.g. schema:weight with schema:QuantitativeValue)

will be a very significant burden for a developer. Yet it will not necessarily improve the amount or quality of data you have. Developers will either choose to exclude such properties from the markup or use simple heuristics, which may not work reliably.

So I would tell developers:

  1. Always use specific schema.org properties when
    a) they exist and
    b) you can populate them.
  2. Using PropertyValue as a substitute will typically not trigger the same effect as using the original, specific property.

So if you are fine with it, I keep the current description. The wording for a blogpost at release time should be discussed.

Contributor

mfhepp commented Mar 20, 2015

As for avoiding the misuse of the new mechanism for existing schema.org properties: I also discussed this with Dan and we reached agreement that this should be handled in the documentation.

The current text says so pretty clearly; we should complement that by a blog post at the time of the release or afterwards.

I am against a strict handling of this, because of the following: One of the main use cases for this are shop and other e-commerce applications. In the past, we build or help others develop many extension packages for shop software, which are now running on 50 - 100 k shop sites with likely billions of products and offers. This was only possible because most of the extensions allow for "one-click" installations with a clever mapping from the internal db schemas to schema.org / GoodRelations, with no need for the shop owner to manually define complex mappings etc.

Now, in such software, the product features are typically defined by the shop owner or important from a vast amount of data sources, and products can have 30 - 200 of them.

Asking a developer to

  1. filter out property names that are "reserved" (not just for product but in schema.org as a whole) and
  2. heuristically map those to special markup (e.g. schema:weight with schema:QuantitativeValue)

will be a very significant burden for a developer. Yet it will not necessarily improve the amount or quality of data you have. Developers will either choose to exclude such properties from the markup or use simple heuristics, which may not work reliably.

So I would tell developers:

  1. Always use specific schema.org properties when
    a) they exist and
    b) you can populate them.
  2. Using PropertyValue as a substitute will typically not trigger the same effect as using the original, specific property.

So if you are fine with it, I keep the current description. The wording for a blogpost at release time should be discussed.

@vholland

This comment has been minimized.

Show comment
Hide comment
@vholland

vholland Mar 20, 2015

Contributor

Thanks for the explanation, Martin. Would it be possible to add a couple of sentences to the documentation outlining the benefits of using the existing properties. In particular, consumers of the data can make better sense of well-defined properties.

Contributor

vholland commented Mar 20, 2015

Thanks for the explanation, Martin. Would it be possible to add a couple of sentences to the documentation outlining the benefits of using the existing properties. In particular, consumers of the data can make better sense of well-defined properties.

@danbri

This comment has been minimized.

Show comment
Hide comment
@danbri

danbri Mar 20, 2015

Contributor

Something along these lines? "Note: publishers should be aware that applications designed to use specific schema.org properties (e.g. http://schema.org/width, http://schema.org/color, http://schema.org/gtin13, ...) will typically expect such data to be provided using those properties, rather than using the generic property/value mechanism."

Contributor

danbri commented Mar 20, 2015

Something along these lines? "Note: publishers should be aware that applications designed to use specific schema.org properties (e.g. http://schema.org/width, http://schema.org/color, http://schema.org/gtin13, ...) will typically expect such data to be provided using those properties, rather than using the generic property/value mechanism."

@vholland

This comment has been minimized.

Show comment
Hide comment
@vholland

vholland Mar 20, 2015

Contributor

That works for me.

Contributor

vholland commented Mar 20, 2015

That works for me.

@mfhepp

This comment has been minimized.

Show comment
Hide comment
@mfhepp

mfhepp Mar 20, 2015

Contributor

ok!


martin hepp
www: http://www.heppnetz.de/
email: mhepp@computer.org

On 20.03.2015, at 14:10, Dan Brickley notifications@github.com wrote:

Something along these lines? "Note: publishers should be aware that applications designed to use specific schema.org properties (e.g. http://schema.org/width, http://schema.org/color, http://schema.org/gtin13, ...) will typically expect such data to be provided using those properties, rather than using the generic property/value mechanism."


Reply to this email directly or view it on GitHub.

Contributor

mfhepp commented Mar 20, 2015

ok!


martin hepp
www: http://www.heppnetz.de/
email: mhepp@computer.org

On 20.03.2015, at 14:10, Dan Brickley notifications@github.com wrote:

Something along these lines? "Note: publishers should be aware that applications designed to use specific schema.org properties (e.g. http://schema.org/width, http://schema.org/color, http://schema.org/gtin13, ...) will typically expect such data to be provided using those properties, rather than using the generic property/value mechanism."


Reply to this email directly or view it on GitHub.

@mfhepp

This comment has been minimized.

Show comment
Hide comment
@mfhepp

mfhepp Mar 24, 2015

Contributor

Hi Dan:
I will add this to my pull request asap.

Martin

martin hepp http://www.heppnetz.de
mhepp@computer.org @mfhepp

On 20 Mar 2015, at 14:10, Dan Brickley notifications@github.com wrote:

Something along these lines? "Note: publishers should be aware that applications designed to use specific schema.org properties (e.g. http://schema.org/width, http://schema.org/color, http://schema.org/gtin13, ...) will typically expect such data to be provided using those properties, rather than using the generic property/value mechanism."


Reply to this email directly or view it on GitHub.

Contributor

mfhepp commented Mar 24, 2015

Hi Dan:
I will add this to my pull request asap.

Martin

martin hepp http://www.heppnetz.de
mhepp@computer.org @mfhepp

On 20 Mar 2015, at 14:10, Dan Brickley notifications@github.com wrote:

Something along these lines? "Note: publishers should be aware that applications designed to use specific schema.org properties (e.g. http://schema.org/width, http://schema.org/color, http://schema.org/gtin13, ...) will typically expect such data to be provided using those properties, rather than using the generic property/value mechanism."


Reply to this email directly or view it on GitHub.

@mfhepp

This comment has been minimized.

Show comment
Hide comment
@mfhepp

mfhepp Mar 24, 2015

Contributor

This is now fixed and included in the pull request. See mfhepp@97df3ee

Contributor

mfhepp commented Mar 24, 2015

This is now fixed and included in the pull request. See mfhepp@97df3ee

@danbri

This comment has been minimized.

Show comment
Hide comment
@danbri

danbri Apr 8, 2015

Contributor

@tmarshbing and I had a chat about this yesterday and would like work up some 'health warning' text so that publishers understand the extra value that comes from using 'real' schema.org properties. My phrasing above was a bit vague, so we may take @mfhepp's as a starting point:

"So I would tell developers:
Always use specific schema.org properties when a) they exist and b) you can populate them.
Using PropertyValue as a substitute will typically not trigger the same effect as using the original, specific property."

Tom may make a more specific suggestion here...

Contributor

danbri commented Apr 8, 2015

@tmarshbing and I had a chat about this yesterday and would like work up some 'health warning' text so that publishers understand the extra value that comes from using 'real' schema.org properties. My phrasing above was a bit vague, so we may take @mfhepp's as a starting point:

"So I would tell developers:
Always use specific schema.org properties when a) they exist and b) you can populate them.
Using PropertyValue as a substitute will typically not trigger the same effect as using the original, specific property."

Tom may make a more specific suggestion here...

@thadguidry

This comment has been minimized.

Show comment
Hide comment
@thadguidry

thadguidry Apr 8, 2015

I would really like for you to mention succinctly that:

Taking Advantage of Schema.org Properties and Promoting Reuse for effective Market Reach

You as a Developer or Business Owner might feel as though your terms and phrasing are better suited than a competitors or the general market or even Schema.org's choices, and may think that use of Property/Value will lead to market differentiation and effective reach for your targeted audiences.

However, what you might be doing in actuality is fragmenting your own industry. By not correctly aligning with peers (or even competitors) you might be confusing Search and App filters and your potential customers, ultimately hurting your penetration into those targeted audiences you were striving for. "You shoot yourself in the foot".

But by correctly taking advantage of existing Schema.org aligned terms, concepts, and our existing industry properties, you can help to deliver helpful hints to Search, App, and Market tools & filters. This allows unlimited and unexpected possibilities for your market reach, such as a consumer making their own targeted choice and being highly satisfied finding your product meets their exact needs, as well as allowing marketing tools to leverage in your competitive favor through effective ads, campaigns, & materials...all reusing and sharing the same language and semantics that is Schema.org.

thadguidry commented Apr 8, 2015

I would really like for you to mention succinctly that:

Taking Advantage of Schema.org Properties and Promoting Reuse for effective Market Reach

You as a Developer or Business Owner might feel as though your terms and phrasing are better suited than a competitors or the general market or even Schema.org's choices, and may think that use of Property/Value will lead to market differentiation and effective reach for your targeted audiences.

However, what you might be doing in actuality is fragmenting your own industry. By not correctly aligning with peers (or even competitors) you might be confusing Search and App filters and your potential customers, ultimately hurting your penetration into those targeted audiences you were striving for. "You shoot yourself in the foot".

But by correctly taking advantage of existing Schema.org aligned terms, concepts, and our existing industry properties, you can help to deliver helpful hints to Search, App, and Market tools & filters. This allows unlimited and unexpected possibilities for your market reach, such as a consumer making their own targeted choice and being highly satisfied finding your product meets their exact needs, as well as allowing marketing tools to leverage in your competitive favor through effective ads, campaigns, & materials...all reusing and sharing the same language and semantics that is Schema.org.

@mfhepp

This comment has been minimized.

Show comment
Hide comment
@mfhepp

mfhepp Apr 8, 2015

Contributor

I hesitate to use to strong language. We want to motivate sites to publish lots of product details data with that mechanism. While I agree that this should not be regarded as a shortcut that frees lazy developers from using propert schema.org properties, it should also not be described too negatively.

Motivating Web sites to mark-up product data sheets with 50 - 200 properties across hundreds of industries is a huge opportunity, and schema:PropertyValue is, from a few years of trying to lift such data, the most feasible approach so far.

Note that the proposal comes from our attempts to develop extensions that automatically add schema.org markup to Web shop software and PIM applications. They typically manage product features only at the level of named properties - string + value, sometimes an extra string for a unit or interval.

The only way to write such extensions with hassle-free installation is to take the data from the shops and PIM applications as they stand, without asking the shop owners to manually map their properties to standard properties.

Actually, we tried both ways: Some of the extensions we developed allow the granular configuration at the level of individual products or the mapping of popular properties like GTIN13 to standard GoodRelations properties.

Such features have almost never been used, and if used, the result was often unreliable.

So please - let's clarify that additionalProperty is not recommended for existing predefined properties, but that

  1. it is perfectly valid and recommended for non-standard product properties and
  2. it is better to use additionalProperty than not exposing a product feature.

Also note that the semantic heterogeneity of product feature is very significant, so it is sometimes really hard to judge whether available data matches an existing properties.

Martin

martin hepp http://www.heppnetz.de
mhepp@computer.org @mfhepp

On 08 Apr 2015, at 21:41, Thad Guidry notifications@github.com wrote:

I would really like for you to mention succinctly that:

Taking Advantage of Schema.org Properties and Promoting Reuse for effective Market Reach

You as a Developer or Business Owner might feel as though your terms and phrasing is better suited than a competitors or the general market or even Schema.org's choices, and may think that use of Property/Value will lead to market differentiation and effective reach for your targeted audiences.

However, what you might be doing in actuality is fragmenting your own industry. By not correctly aligning with peers (or even competitors) you might be confusing Search and App filters and your potential customers, ultimately hurting your penetration into those targeted audiences you were striving for. "You shoot yourself in the foot".

But by correctly taking advantage of existing Schema.org aligned terms, concepts, and our existing industry properties, you can help to deliver helpful hints to Search, App, and Market tools & filters. This allows unlimited and unexpected possibilities for your market reach, such as a consumer making their own targeted choice and being highly satisfied finding your product meets their exact needs, as well as allowing marketing tools to leverage in your competitive favor through effective ads, campaigns, & materials based...all reusing and sharing the same language and semantics that is Schema.org.


Reply to this email directly or view it on GitHub.

Contributor

mfhepp commented Apr 8, 2015

I hesitate to use to strong language. We want to motivate sites to publish lots of product details data with that mechanism. While I agree that this should not be regarded as a shortcut that frees lazy developers from using propert schema.org properties, it should also not be described too negatively.

Motivating Web sites to mark-up product data sheets with 50 - 200 properties across hundreds of industries is a huge opportunity, and schema:PropertyValue is, from a few years of trying to lift such data, the most feasible approach so far.

Note that the proposal comes from our attempts to develop extensions that automatically add schema.org markup to Web shop software and PIM applications. They typically manage product features only at the level of named properties - string + value, sometimes an extra string for a unit or interval.

The only way to write such extensions with hassle-free installation is to take the data from the shops and PIM applications as they stand, without asking the shop owners to manually map their properties to standard properties.

Actually, we tried both ways: Some of the extensions we developed allow the granular configuration at the level of individual products or the mapping of popular properties like GTIN13 to standard GoodRelations properties.

Such features have almost never been used, and if used, the result was often unreliable.

So please - let's clarify that additionalProperty is not recommended for existing predefined properties, but that

  1. it is perfectly valid and recommended for non-standard product properties and
  2. it is better to use additionalProperty than not exposing a product feature.

Also note that the semantic heterogeneity of product feature is very significant, so it is sometimes really hard to judge whether available data matches an existing properties.

Martin

martin hepp http://www.heppnetz.de
mhepp@computer.org @mfhepp

On 08 Apr 2015, at 21:41, Thad Guidry notifications@github.com wrote:

I would really like for you to mention succinctly that:

Taking Advantage of Schema.org Properties and Promoting Reuse for effective Market Reach

You as a Developer or Business Owner might feel as though your terms and phrasing is better suited than a competitors or the general market or even Schema.org's choices, and may think that use of Property/Value will lead to market differentiation and effective reach for your targeted audiences.

However, what you might be doing in actuality is fragmenting your own industry. By not correctly aligning with peers (or even competitors) you might be confusing Search and App filters and your potential customers, ultimately hurting your penetration into those targeted audiences you were striving for. "You shoot yourself in the foot".

But by correctly taking advantage of existing Schema.org aligned terms, concepts, and our existing industry properties, you can help to deliver helpful hints to Search, App, and Market tools & filters. This allows unlimited and unexpected possibilities for your market reach, such as a consumer making their own targeted choice and being highly satisfied finding your product meets their exact needs, as well as allowing marketing tools to leverage in your competitive favor through effective ads, campaigns, & materials based...all reusing and sharing the same language and semantics that is Schema.org.


Reply to this email directly or view it on GitHub.

@thadguidry

This comment has been minimized.

Show comment
Hide comment
@thadguidry

thadguidry Apr 8, 2015

I agree with all your points @mfhepp certainly, who wouldn't.

However, we should try to place a very light and slight onus of due diligence for sites and give them the necessary background information and best practices, despite past historical hassles.
We cannot do everything for everyone but we can provide good guidance and we should....

So I guess what the good guidance encompasses is what is up for discussion...as @danbri says "a health warning text". And I think I make some important points in my version of the "health warning text"

1 more point: @mfhepp don't you agree that we have had that proliferation of data already? ...the problem space was that it was not structured enough, if at all. Sure Property/Value helps a bit...but taking the time to provide highly structured data benefits everyone. We should always try to encourage the latter before the former and that's where I think this "health text warning" should try to promote as well as your bail out mechaisms for non-standards, which I agree with.

thadguidry commented Apr 8, 2015

I agree with all your points @mfhepp certainly, who wouldn't.

However, we should try to place a very light and slight onus of due diligence for sites and give them the necessary background information and best practices, despite past historical hassles.
We cannot do everything for everyone but we can provide good guidance and we should....

So I guess what the good guidance encompasses is what is up for discussion...as @danbri says "a health warning text". And I think I make some important points in my version of the "health warning text"

1 more point: @mfhepp don't you agree that we have had that proliferation of data already? ...the problem space was that it was not structured enough, if at all. Sure Property/Value helps a bit...but taking the time to provide highly structured data benefits everyone. We should always try to encourage the latter before the former and that's where I think this "health text warning" should try to promote as well as your bail out mechaisms for non-standards, which I agree with.

@mfhepp

This comment has been minimized.

Show comment
Hide comment
@mfhepp

mfhepp Apr 9, 2015

Contributor

I think the current wording as on

http://sdo-property-value-and-cars.appspot.com/additionalProperty

is sufficient.

As for the proliferation of product data semantics: This has plenty of causes and has been an open problem for decades, so while I agree it will be good to strive for more uniform data structures, we should not mix this with the immediate aim of providing a mechanism for exposing such data as it is available now.

Contributor

mfhepp commented Apr 9, 2015

I think the current wording as on

http://sdo-property-value-and-cars.appspot.com/additionalProperty

is sufficient.

As for the proliferation of product data semantics: This has plenty of causes and has been an open problem for decades, so while I agree it will be good to strive for more uniform data structures, we should not mix this with the immediate aim of providing a mechanism for exposing such data as it is available now.

@jvandriel

This comment has been minimized.

Show comment
Hide comment
@jvandriel

jvandriel Apr 9, 2015

If I might speak as a SEO specialist of Sanoma for a moment, for the site I'm currently working on Property/Value is the only realistic option there is for providing additional markup for close to a million items. Roughly a million items of which it isn't known upfront whether it's a Product or a Service, nor which specifications they have.

Now because these items are added to the site via programmatic solutions there's no method for manually adjusting markup/values. But more importantly, even if there would be a solution to do so manually, the site adds/removes/modifies close to 100.000 items PER DAY, meaning my employer would have to employ ±1000 Jarnos to be able to provide 'highly structured data'. Something that's definitely not going to happen, meaning we either deploy the Property/Value solution or don't publish any specifications at all. It's as simple as that.

Now I agree with @thadguidry that proper guidance should be given but I also agree with @mfhepp that strong language should be avoided or else we run the risk publishers (like the one I work for) might feel there's no or too little value in publishing Property/Value markup and therefore probably will decide not publish it at all.

Something I feel would be a big loss as, like @mfhepp, I think 'some structured data' is always better than none at all.

jvandriel commented Apr 9, 2015

If I might speak as a SEO specialist of Sanoma for a moment, for the site I'm currently working on Property/Value is the only realistic option there is for providing additional markup for close to a million items. Roughly a million items of which it isn't known upfront whether it's a Product or a Service, nor which specifications they have.

Now because these items are added to the site via programmatic solutions there's no method for manually adjusting markup/values. But more importantly, even if there would be a solution to do so manually, the site adds/removes/modifies close to 100.000 items PER DAY, meaning my employer would have to employ ±1000 Jarnos to be able to provide 'highly structured data'. Something that's definitely not going to happen, meaning we either deploy the Property/Value solution or don't publish any specifications at all. It's as simple as that.

Now I agree with @thadguidry that proper guidance should be given but I also agree with @mfhepp that strong language should be avoided or else we run the risk publishers (like the one I work for) might feel there's no or too little value in publishing Property/Value markup and therefore probably will decide not publish it at all.

Something I feel would be a big loss as, like @mfhepp, I think 'some structured data' is always better than none at all.

@thadguidry

This comment has been minimized.

Show comment
Hide comment
@thadguidry

thadguidry Apr 9, 2015

@jvandriel I am sympathetic that there is some effort involved in providing highly structured data. But let us try to not encourage laziness is all I am saying. In your particular case, there are probably programmatic solutions that solve your issue, and would not require more than 1 person to manage. A good algorithm that can give you over 95% accuracy to determine if something is a Product or Service is all that your probably missing. =) And if it does not exist already, it could be built through machine learning and human cognition...even using http://crowdcrafting.org/ or some such.

I just want everyone to do there part and I understand its asking others to provide something for free. But we still need to encourage and enlighten them that the time and resources they spend help to expand the knowledge of their products and services.

That includes content providers not taking unnecessary shortcuts by saying "its too hard". Let's encourage a mentality of "if you think its hard to provide highly structured data, you might consider that your not taking the right approach and there are folks that can certainly help you take the right approach to provide highly structured data via best practices, programmatic solutions, machine learning, and human cognition, just to name a few".

@mfhepp I am not trying to distract the aim of Property/Value. We need it. Everyone does. I just want to make sure we give proper guidance, advise them that things are not as hard as they seem to provide structured data, and in many cases, programmatic or other solutions exist to help even further.

I just won't accept laziness.

thadguidry commented Apr 9, 2015

@jvandriel I am sympathetic that there is some effort involved in providing highly structured data. But let us try to not encourage laziness is all I am saying. In your particular case, there are probably programmatic solutions that solve your issue, and would not require more than 1 person to manage. A good algorithm that can give you over 95% accuracy to determine if something is a Product or Service is all that your probably missing. =) And if it does not exist already, it could be built through machine learning and human cognition...even using http://crowdcrafting.org/ or some such.

I just want everyone to do there part and I understand its asking others to provide something for free. But we still need to encourage and enlighten them that the time and resources they spend help to expand the knowledge of their products and services.

That includes content providers not taking unnecessary shortcuts by saying "its too hard". Let's encourage a mentality of "if you think its hard to provide highly structured data, you might consider that your not taking the right approach and there are folks that can certainly help you take the right approach to provide highly structured data via best practices, programmatic solutions, machine learning, and human cognition, just to name a few".

@mfhepp I am not trying to distract the aim of Property/Value. We need it. Everyone does. I just want to make sure we give proper guidance, advise them that things are not as hard as they seem to provide structured data, and in many cases, programmatic or other solutions exist to help even further.

I just won't accept laziness.

@mfhepp

This comment has been minimized.

Show comment
Hide comment
@mfhepp

mfhepp Apr 9, 2015

Contributor

@thadguidry As you know, we are in agreement, so let's not start a virtual conflict ;-) but...

I just won't accept laziness.
If we hadn't accepted laziness e.g. wrt. broken links and invalid markup, the Web would not have become what it is.

As said: Let's not mix the general aim of providing more machine-friendly information on the Web with the very tangible property-value mechanism for product features.

People have tried for decades to e.g. consolidate taxonomic information about products (UNSPSC, eClass, ....), without major success.

I will be convinced in a minute if you point me to an algorithmic solution that establishes proper alignment between all the standards from

http://www.ebusiness-unibw.org/ontologies/pcs2owl/ (*)

They are all available in OWL and follow a common GoodRelations meta-model. Still I know of no automated solution to align them.

So it should be much easier than the general challenge which you consider "easy" ;-)

Martin

(*) A few of them must be generated locally using http://wiki.goodrelations-vocabulary.org/Tools/PCS2OWL due to copyright restrictions.

Contributor

mfhepp commented Apr 9, 2015

@thadguidry As you know, we are in agreement, so let's not start a virtual conflict ;-) but...

I just won't accept laziness.
If we hadn't accepted laziness e.g. wrt. broken links and invalid markup, the Web would not have become what it is.

As said: Let's not mix the general aim of providing more machine-friendly information on the Web with the very tangible property-value mechanism for product features.

People have tried for decades to e.g. consolidate taxonomic information about products (UNSPSC, eClass, ....), without major success.

I will be convinced in a minute if you point me to an algorithmic solution that establishes proper alignment between all the standards from

http://www.ebusiness-unibw.org/ontologies/pcs2owl/ (*)

They are all available in OWL and follow a common GoodRelations meta-model. Still I know of no automated solution to align them.

So it should be much easier than the general challenge which you consider "easy" ;-)

Martin

(*) A few of them must be generated locally using http://wiki.goodrelations-vocabulary.org/Tools/PCS2OWL due to copyright restrictions.

@mfhepp

This comment has been minimized.

Show comment
Hide comment
@mfhepp

mfhepp Apr 9, 2015

Contributor

Also: We should force publishers of data on the Web to first complete a major data-cleansing and enrichment process before they can use schema.org. That would put a major delay on the whole process.

That being said, we are in agreement that it is perfectly valid to create incentives for them - the better your data, the better will the search engines understand and present your information.

Contributor

mfhepp commented Apr 9, 2015

Also: We should force publishers of data on the Web to first complete a major data-cleansing and enrichment process before they can use schema.org. That would put a major delay on the whole process.

That being said, we are in agreement that it is perfectly valid to create incentives for them - the better your data, the better will the search engines understand and present your information.

@thadguidry

This comment has been minimized.

Show comment
Hide comment
@thadguidry

thadguidry Apr 9, 2015

@mfhepp Yup. :) No conflict. I just take a harder stance on the topic than others.

+1 for "health warning text" in some form. Not necessarily mine. But something.

thadguidry commented Apr 9, 2015

@mfhepp Yup. :) No conflict. I just take a harder stance on the topic than others.

+1 for "health warning text" in some form. Not necessarily mine. But something.

@tmarshbing

This comment has been minimized.

Show comment
Hide comment
@tmarshbing

tmarshbing Apr 17, 2015

I am also still very keen on health warning text. I would be fine with the wording @danbri proposed based on @mfhepp's original text: "Always use specific schema.org properties when a) they exist and b) you can populate them. Using PropertyValue as a substitute will typically not trigger the same effect as using the original, specific property."

@thadguidry, I wonder if we could put a longer set of best practices - some version of what you started above - in a doc page/blog post and refer to the best practices also from the health text. In that way, I would hope we could address the concerns about not sounding too negative while still providing enough guidance to prevent publishers from shooting themselves in the foot. Thoughts?

tmarshbing commented Apr 17, 2015

I am also still very keen on health warning text. I would be fine with the wording @danbri proposed based on @mfhepp's original text: "Always use specific schema.org properties when a) they exist and b) you can populate them. Using PropertyValue as a substitute will typically not trigger the same effect as using the original, specific property."

@thadguidry, I wonder if we could put a longer set of best practices - some version of what you started above - in a doc page/blog post and refer to the best practices also from the health text. In that way, I would hope we could address the concerns about not sounding too negative while still providing enough guidance to prevent publishers from shooting themselves in the foot. Thoughts?

@thadguidry

This comment has been minimized.

Show comment
Hide comment
@thadguidry

thadguidry Apr 17, 2015

@tmarshbing yes, I had the same thoughts. I think a Blog post would be fine, looks like we could collect comments from it also, if need be. Take whatever you want from my example, it's CC0. And blogging it makes it easier for folks to share the info, socially.

thadguidry commented Apr 17, 2015

@tmarshbing yes, I had the same thoughts. I think a Blog post would be fine, looks like we could collect comments from it also, if need be. Take whatever you want from my example, it's CC0. And blogging it makes it easier for folks to share the info, socially.

@danbri

This comment has been minimized.

Show comment
Hide comment
@danbri

danbri May 12, 2015

Contributor

I've added the disclaimer, and also into releases.html

Contributor

danbri commented May 12, 2015

I've added the disclaimer, and also into releases.html

@vholland

This comment has been minimized.

Show comment
Hide comment
@vholland

vholland Sep 15, 2015

Contributor

@mfhepp @danbri

I am scanning for easy issues to implement or close. Any reason to leave this open?

Contributor

vholland commented Sep 15, 2015

@mfhepp @danbri

I am scanning for easy issues to implement or close. Any reason to leave this open?

@danbri

This comment has been minimized.

Show comment
Hide comment
@danbri

danbri Sep 15, 2015

Contributor

Good find @mfhepp - it's done. One less open issue :) And thanks everyone for the discussion!

http://schema.org/PropertyValue

Contributor

danbri commented Sep 15, 2015

Good find @mfhepp - it's done. One less open issue :) And thanks everyone for the discussion!

http://schema.org/PropertyValue

@danbri danbri closed this Sep 15, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment