Skip to content

Conversation

christianlupus
Copy link

The Recipe currently only supports unstructured text for the list of ingredients. This is unfavorable for multiple reasons:

  1. Calculations based on the amount of servings is hard to accomplish (needs text parsing)
  2. Grouping of ingredients is not possible. Many recipes have groups of ingredients and refer to these in the instructions (like "put all dry ingredients for the dough into a bowl").

This PR tries to accomplish this in a backward-compatible manner. It is related to #882 and #2628 and possibly other issues.

In fact, the motivation to file this PR is this feature request (and similar/duplicate ones not linked here). The nextcloud cookbook app is an extension to the nextcloud server that allows to store recipes on the server to create personal, digital collections of recipes. One of the main development principles is to stick with the schema.org standard to save the recipes within the server. That way, the export/import into other applications should be kept maximally compatible. Acceptance of this PR will allow us to use the new functions while other applications can take their time for implementing this extension properly.

As this is my first contribution to schemaorg, I hope this suits your process. I was unaware how to overwrite the existing recipeIngredient property from within a data/ext/pending/*.ttl file. Thus, I changed directly on the schema.ttl. If there is a better way to get things done, please tell me. I will try to be conform to your requirements.

@RichardWallis
Copy link
Contributor

The basic proposal makes sense to me.
Although there are exceptions, it is not encouraged to share the name of a Type and a Property (with the only difference being the case of the leading character), as it often causes confusion.

Therefore I suggest that the property recipeIngredientGroup should be renamed, possibly to ingredientGroup.

Having said that, there may be a simpler solution whereby RecipeIngredientGroup could be added to the rangeIncludes of the existing recipeIngredient property, thus dispensing with the need for a new property. This would allow ingredients and groups of ingredients , and potentially groups of groups to be listed together.

As to submitting the proposal, the definition should ideally be placed in a suitably named .ttl file in the pending section with each new term being identified as :isPartOf <http://pending.schema.org> ;.

Making adjustments to terms defined elsewhere can be a bit of a challenge in this structure. However, I believe here you are only adding to existing term definitions (eg. expanding a rangeIncludes). In which case, as all definitions are eventually merged into a single graph, just defining those extra elements within the pending .ttl file should suffice.

Finally, to be an easily understood PR, and hopefully useful addition to the vocabulary, it should be accompanied by some examples.

@christianlupus. ping me if you need a bit of explanation on the format of the PR submission

@christianlupus christianlupus marked this pull request as draft October 14, 2020 10:41
Signed-off-by: Christian Wolf <github@christianwolf.email>
Signed-off-by: Christian Wolf <github@christianwolf.email>
@christianlupus christianlupus force-pushed the dev/recipeinstructiongroup branch from abc0d3e to 7b23dab Compare October 14, 2020 10:51
Signed-off-by: Christian Wolf <github@christianwolf.email>
Signed-off-by: Christian Wolf <github@christianwolf.email>
@christianlupus
Copy link
Author

@RichardWallis I tried to incooperate the comments from you into my changes here. I moved the changes mainly to a ext/pending file. The only change was to remove a comment to make travis happy.

Otherwise I went with your suggestion to make the groups a valid type for recipeIngredient. This was my first idea, to be honest, I made it more complex to avoid groups of groups of groups. But on the other side, the schema should not restrict a developer from a good design but allow all the flexibility as written in the about us texts.

I also added an additional example to cover the two additional cases. I just used the next free example number (eg-0458). Was this correct or is there a dedicated process to generate these numbers?

I am waiting for your comments then I can either correct the PR or mark ready for review.

@RichardWallis
Copy link
Contributor

@christianlupus Definition looks good and thanks for the example

Example numbers will be auto allocated at release time.

Let's see what others have to say on what you propose.

@christianlupus christianlupus marked this pull request as ready for review October 14, 2020 14:59
@christianlupus
Copy link
Author

May I just ask how this PR is going to proceed? No discussion has yet taken place here. So I am just curious about the further steps.

@lecoqlibre
Copy link

I also miss the grouping and quantity features. I like the PR you proposed. Any news on the process of being merged ?

@ffes
Copy link

ffes commented Nov 19, 2020

Any change this is gonna make it into v11? It is not mentioned in #2768

@danbri
Copy link
Contributor

danbri commented Nov 22, 2020

Thanks for making a detailed design proposal. There are certainly many ways that recipe markup could be improved. As we've tried to emphasise in various places the project tries to stay close to large scale usage of structured data, so this should probably be put aside until we can find commitments or at least interest from new or existing applications that use schema.org Recipe data (as well as a few publishers, ideally). Do we have any progress on that side of things?

@danbri
Copy link
Contributor

danbri commented Nov 22, 2020

(in general we don't jump straight to PRs until there's some consensus from issue discussions to "go for it", otherwise we accumulate code that can go out of date relatively easily.)

@jaygray0919
Copy link

IOHO this could become a "Down the Rabbit Hole" experience. For example, here is the BBC food/recipe ontology: https://www.bbc.co.uk/ontologies/fo. Relatively simple, but more complex than used in schema.org. Internally we use a recipe ontology that is ~20x the complexity of the BBC design.

The approach we take:

  • use schema for components common to what we do (i.e. schema components in all ontologies);

  • then, customize for specific use cases.

Adding complexity to recipe may not be a good use of resources and may conflict with other specialized ontologies that are increasingly designed to complement schema.org

@christianlupus
Copy link
Author

So, am I getting your suggestion right @jaygray0919 to define our own ontology and schema (as a private extension)?

@jaygray0919
Copy link

That's how we handle it. In simple terms, if a schema @Type and @Property exists, we use it as the default. @Thing is the obvious default, but Recipe/nutrition/NutritionInformation is comprehensive. Then, when more information is required, we use @Property and @Class together with properties such as domainIncludes and rangeIncludes. This technique is not "private". The Google SDTT will properly parse and interpret that entailment. Unfortunatley Google indicates they plan to deprecate SDTT in favor of RRTT (which is like using the Wayback Machine for content search). But that does not invalidate the approach. The JSON-LD Playground (as a representative parser) will properly consume the above entailment.

@christianlupus
Copy link
Author

  • then, customize for specific use cases.

To me, that sounds like a "fork" of schema.org with your required changes to the definitions. Therefore I said you defined your own schema.

Adding complexity to recipe may not be a good use of resources and may conflict with other specialized ontologies that are increasingly designed to complement schema.org

It might be adding a bit of complexity here. However, I have seen many sites use such a notion of grouped ingredients anyway. They just do not put these groups into the JSON+LD as it is not supported.

That's how we handle it. In simple terms, if a schema @Type and @Property exists, we use it as the default. @Thing is the obvious default, but Recipe/nutrition/NutritionInformation is comprehensive. Then, when more information is required, we use @Property and @Class together with properties such as domainIncludes and rangeIncludes. This technique is not "private". The Google SDTT will properly parse and interpret that entailment. Unfortunatley Google indicates they plan to deprecate SDTT in favor of RRTT (which is like using the Wayback Machine for content search). But that does not invalidate the approach. The JSON-LD Playground (as a representative parser) will properly consume the above entailment.

I am sorry but I do not get my head around your point. Are you talking about JSON+LD or something else? What are @Property and @Thing (sorry, if it is obvious but I did not yet stumble over these terms in the schema.org documentation)?
I get the feeling you are referring to a very general structure of creating an ontology by defining your own types just as you see fit. No offense, but please make more clear what you are thinking of (or give an example if possible).

My understanding is that a standard should be generated such that all can stick with it. The idea behind a standard is to avoid a distributed and inconsistent usage of different flavors of it.

@jaygray0919
Copy link

@christianlupus

For your investigation:
https://schema.org/Thing
https://schema.org/Class
https://schema.org/Property

No "forking" - pure JSON-LD + schema.org

After you've studied how to use those @Types I'll share practical examples.
But you need to understand those concepts first; otherwise examples will not make sense.

@christianlupus
Copy link
Author

@jaygray0919 I think I am pretty sure, I know what properties and classes are. I was not aware that you abbreviate the corresponding schemata by @Foo (for HTTP://schema.org/Foo).

Apart from that, you seem to describe the general modeling used by schema.org. Above you write form Recipe/nutrition/NutritionInformation which is a very specific type. I do not get your point, in your statement, sorry. Maybe the examples might help me to grasp your point. This PR is related to ingredients, aka Recipe.recipeIngredient.

Just to make my point clear: If you have a Recipe (according to schema.org) as JSON+LD with a property Recipe.recipeIngredient of any type that is non-Text, you by definition extend the standard defined by schema.org. As long as you are using the version only internally to e.g. store your recipes without publishing them, I was calling this a private (or forked) schema. This is then completely unrelated to google/SDTT/RRTT.
If it should be published in some way, it should be standardized in order for all others to understand your work.

What is this "Down the Rabbit Hole" experience you fear? The changes are 100% backward compatible. So existing code and pages do not need to be changed. But it generates some flexibility especially as it is formatted on many pages accordingly and would just allow a clean structure of the ingredients.

@jaygray0919
Copy link

@christianlupus Sorry i failed to make my point.

Try this approach. Use @Property and @Class to define your requested extension. There will be two benefits to that exercise.

  1. You can explain an extension using schema.org (schema.org is a self-defining language). You also could do this in RDF, but it's easier with schema.org (after all, schema.org is an RDF "application").

  2. You can validate your extension using either JSON-LD Playground or SDTT. When I say 'validate' i mean: confirm that your entailments are understood by harvesters. JSON-LD Playground will basically "play back" your JSON-LD structure. But SDTT will process the entailments and play them back in a different format (allowing you to compare one format with another to confirm semantic equivalence).

If you follow this approach, you will use schema.org to define new entailments (new logic). Further, you'll find that you don't need to request modifications to the schema data model - you create your own extensions.

Now you might say: but that's a non-standard approach and I am proposing something that should be standardized.

My response is: if a harvester like SDTT (or Structured Data Linter) "knows what you are saying" and plays back a confirmatory representation, you've achieved your objective: full semantic communication among semantic processors.

@christianlupus
Copy link
Author

OK, now we are getting closer, as I see.

First, let me clarify the main culprit for the whole discussion: You are assuming, I am a producer/supplier of JSON data. I see myself both as a supplier but also as a harvester. The app I mentioned in the very first comment allows importing recipes from sites that provide JSON+LD data. Thus we need to be able to read these and are interested in a standardized way to save these. Otherwise, we will have to do fancy stuff exactly in contrast to the idea of a semantic web, where a machine can understand the content of a page. So it is not sufficient that google/SDTT/RRTT can understand our JSON but we must parse other peoples' JSON files.

Nevertheless, I tried to do as you described. We might use this approach for our internal files as well. I ended so far with this file that can be fed directly to SDTT using the raw link of github. As working this out is something completely independent from this PR, I will "outsource" it to another channel and contact @jaygray0919 via mail.

@YoYBaBy
Copy link

YoYBaBy commented Nov 28, 2020

ok

@ffes
Copy link

ffes commented Dec 4, 2020

@christianlupus Any reason why the values of amountOfThisGood are strings and not numbers in your JSON example?

If I understand https://schema.org/TypeAndQuantityNode correctly they should be a numbers.

@christianlupus
Copy link
Author

@ffes I am sorry, you are right. They should be numbers. I changed the example accordingly. Thanks for the enhancement.

@github-actions
Copy link

github-actions bot commented Feb 4, 2021

This pull request is being tagged as Stale due to inactivity.

@github-actions github-actions bot added the no-pr-activity Discuss has gone quiet. Auto-tagging to encourage people to re-engage with the issue (or close it!). label Feb 4, 2021
@ffes
Copy link

ffes commented Feb 4, 2021

Would be a shame of this PR would be closed by a bot without a proper review.

@jaygray0919
Copy link

Would be a shame if this PR would be closed by a bot without a proper review.

I looked at it, discussed it with the OP, feel that the PR has limitations, and presented an alternative approach.

@christianlupus
Copy link
Author

As I just wrote via email, I had a few pressing things at hand and the discussion @jaygray0919 and I had via mail was not so fruitful as I had hoped. It was quite a pack of work to get done to understand his point. I was not able to get it in a short time. Now, I need to revisit things once more.

Nevertheless, I have still doubts about it. As already stated, my intention is to have a standardized way to parse such recipes. For example in nextcloud/cookbook#563, a user wants to import a recipe from a 3rd party page into our app. Normally, we would just parse the JSON+LD that the page is providing and have a logical structure of the recipe (just as the idea behind schema is). As the structuring of the ingredients is not supported by the standard, all data providers (such as in the named example) either fall back to one big list of ingredients (in the example) or do their own thing but this is clearly not unique across all sites.

I am pretty sure that the discussion yet I had with @jaygray0919 results in a local (per file/resource) extension of the schema.org standard. As I am arguing from the data consumer's perspective, any such extensions are worthless unless I have the computational power and knowledge to run some heuristics that allow my parser to understand other people's thoughts (the same way as JSON playground or google's structured data parser do it). In fact, this contradicts the basic idea behind schema.org and a semantic web.

Apart from that, I do not really know what I can do in order to avoid the bot closing this PR without proper review as formulated by @ffes. Of course, I can rebase to make it mergeable again and use a free example id to make travis happy but this seems more like esthetics and less fundamental if this PR should be merged at all. If that's all it takes, I am happily going to make it conflict-free.

@jaygray0919
Copy link

jaygray0919 commented Feb 6, 2021

The OP revisits a common thread with schema.org: for any given @Type and @Property there are values in a specialized domain that are more expressive.

A publisher can handle this by using schema.org @Class and @Property (i.e. create extensions to express meaning). Those extensions are properly evaluated by harvesters such as Google Structured Data Testing Tools and Gregg Kellogg's Structured Data Linter (@gkellogg). We do this routinely.

But that process may not help a subscriber/consumer (e.g @christianlupus) who transforms less-structured data into more-structured data.

The OP is advocating a variation of https://bioschemas.org/, GoodRelations, and similar groups who negotiate to enhance schema.org. But BioSchema/GoodRelations leverage carefully negotiated, arbitrated specifications from other ontologies - i.e. @Type and @Property specifications that have been "worked out" over time.

One possible solution would be to allow pending specifications to be tested (e.g. FDA "field trials"). Pending specifications must be constructed in the same form as schema.org is constructed (an RDF model). Pending specifications must be valid (i.e. properly parsed by SDL or GSDTT if/when it is reincarnated as a new service).

Similar to schema.org 'pending extensions', "pending specifications" are published on GitHub or another sandbox. Sample documents, composed (authored) or recomposed (variations of the ole ETL approach - export, transform, load), would be added there. Over time, contributions would reveal patterns and variations. As new @Type and @Property specifications stabilize (stabilise for @danbri ) they could be moved to 'pending extensions'.

For consideration.

@christianlupus
Copy link
Author

I just wrote a mail to @jaygray0919 in order to sort this out as much as possible.

I'd rather have this PR finished as it causes me quite some headaches. If anyone has a good idea how to proceed from here and get thing going, please give me a hint. I am a bit lost on this.

@RichardWallis
Copy link
Contributor

See issue #2992.

The code in the branch behind this PR needs rebasing against the main branch in the schemaorg repository. This will enable it to pass the CI tests which it currently fails.

cc @christianlupus

@makuser
Copy link

makuser commented Dec 3, 2021

The code in the branch behind this PR needs rebasing against the main branch in the schemaorg repository. This will enable it to pass the CI tests which it currently fails.

As @christianlupus wrote in February already, the question would be whether making CI happy is all that would be required to have this PR merged. Because if not, then the rest that prevents it needs to be discussed anyway and possibly further changes to the PR need to be made.
Quote Rom February:

Of course, I can rebase to make it mergeable again and use a free example id to make travis happy but this seems more like esthetics and less fundamental if this PR should be merged at all. If that's all it takes, I am happily going to make it conflict-free.

@danbri danbri self-assigned this Dec 3, 2021
@danbri
Copy link
Contributor

danbri commented Jan 21, 2022

As I mentioned earlier it is premature to jump straight to a PR of this complexity without any evidence that parties intend to actually consume this new data structure for significant user-facing services.

Having said that, these are sensible issues being raised. If #882 and #2628 don't cover everything it would be good to have those raised, but as an ISSUE not a PULL REQUEST.

@christianlupus - thank you for pursuing this. The issue is not being closed by a bot, but by me, on the basis that it is premature to line up these changes to merge into the site before the consensus has been built to use the new structures. Leaving the PR open creates a situation in which versions drift etc will generate busywork for the PR proposer to maintain the proposal, and that seems a poor use of your time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
no-pr-activity Discuss has gone quiet. Auto-tagging to encourage people to re-engage with the issue (or close it!).
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants