Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

schema.org repo has dropped most RDFa files #133

Closed
Gummibeer opened this issue Jun 3, 2020 · 19 comments · Fixed by #140
Closed

schema.org repo has dropped most RDFa files #133

Gummibeer opened this issue Jun 3, 2020 · 19 comments · Fixed by #140

Comments

@Gummibeer
Copy link
Collaborator

Gummibeer commented Jun 3, 2020

blocking #80 and future type updates.

A benefit if this change is that the files are now part of release folders - so for us it should be easier to decide if a new update is breaking or not.
Following the latest version everything is now released as:

  • JSON-LD
  • NQ
  • NT
  • RDF
  • TTL
  • CSV

Even pending types seem to be versioned. So we could get rid of all the "missing XYZ type" issues.
In my opinion we should decide for the new type to use - for RDF there is a lib easyrdf and it seems similar to RDFa - both are XML, so in worst case we could parse it by our own.
During this I would introduce sub namespaces for Pending and Extensions - so we always parse all available/official types.

Because of the new namespaces, included extension, new major schema version and so on I would say that we should use a new major version for this change.
But it's also a critical one because until this is fixed we can't proceed with #80


Update 2020-08-24: @mallardduck will prepare a JSON-LD parser for current v7 to check if it's working and afterward upgrade it to v8 and v9 in consecutive PRs.
#133 (comment)

@Gummibeer Gummibeer added the bug label Jun 3, 2020
Gummibeer added a commit that referenced this issue Jun 3, 2020
@Gummibeer
Copy link
Collaborator Author

@sebastiandedeyne do you have any opinions/favorites/ideas? Or should I simply recreate the whole generator with, my favorite RDF, plus other breaking changes and tag a new major version?

@boospot
Copy link

boospot commented Jul 22, 2020

Just an update:
version 9.0 release completely skips .rdfa file:
https://github.com/schemaorg/schemaorg/tree/master/data/releases/9.0

I could only see schema.ttl file

@Gummibeer
Copy link
Collaborator Author

@boospot thanks for letting us know! 🙂
I believe that the all-http.rdf and current-http.rdf files are the successors of the before schema.rdf one.

@boospot
Copy link

boospot commented Aug 24, 2020

@Gummibeer

Is it final that this package shall not receive any updates or is there any chance the team will update once they find time?

@Gummibeer
Copy link
Collaborator Author

@boospot

The decision is final that right now no maintainer is available upgrading the generator to a newer version. If @spatie itself or I should need v8 or v9 we will upgrade it.
We also accept PRs upgrading the generator and if you are willing to maintain this package @freekmurze will be the right one to ask.

So if you are in need of v8 or v9 types we would welcome a PR. 🙂

@mallardduck
Copy link
Contributor

I wonder how complex/hard it would be to update from rdfa to jsonld- so I've started poking around on it as I have some free time today. I'll let y'all know how it's going if this seems like a viable patch for me to create.

My goal would be simply switching to the 7.04 version of the schema.jsonld file initially. This way I can be sure it generates identical library code first. Then after that worry about bumping the version of the schema in use.

@Gummibeer
Copy link
Collaborator Author

@mallardduck Good one - after a quick check it shouldn't be too hard. Parsing JSON is implemented by default and the structure and keys are very similar/the same.
So at least you should only need to update the parser but not the data-transfer-objects or generator. 🎉

Thanks for taking a look! 🙂 If you need any help/feedback just ask. I'm still here. 😉

@mallardduck
Copy link
Contributor

@Gummibeer wondering your thoughts on the 'constants' - are those required to preserve, do the serve as simply informational? If they are simply informational, then do they provide enough value that they should be required?

I ask because it seems these are not included in the jsonld files in any significant or consistent way. From what I can see they haven't been included in schema.jsonld since the 3.x releases.

Other than this aspect of the parser things were moving pretty smoothly! So curious if the simple option of simply dropping the constants makes sense?

@Gummibeer
Copy link
Collaborator Author

Hey,

do you mean the enums like RestrictedDiet?
https://schema.org/RestrictedDiet
https://github.com/spatie/schema-org/blob/d44026cdf6fa874b64290d8267cb7acd6a613652/src/RestrictedDiet.php
In this case they are part of the JSON-LD:
https://github.com/schemaorg/schemaorg/blob/c4dafe190bd60b6ef63ee0c0b2ad640ff8b5022f/data/releases/8.0/schema.jsonld#L2815-L2820

And they shouldn't be dropped because they are required to use for example in Recipe.

One "improvement" for them could be to switch to https://github.com/spatie/enum - but this would be a future improvement/change.

@mallardduck
Copy link
Contributor

That wasn't the specific example I was considering - so it seems that there's a mixed bag. Thankfully you're right those examples are included in the JSON-LD version too.

The specific example I was thinking of would be from Organization here:

const ActionCollabClass = 'http://www.w3.org/wiki/WebSchemas/SchemaDotOrgSources#source_ActionCollabClass';

const WikiDoc = 'http://www.w3.org/wiki/WebSchemas/SchemaDotOrgSources#source_WikiDoc';

(The second one was the one I was focusing on debugging with for context.)

@mallardduck
Copy link
Contributor

Thankfully since they're in the JSON-LD definitions the ones you're referring to should be easy enough to get working. 😄

@Gummibeer
Copy link
Collaborator Author

Seems like these constants are part of wrongly imported/parsed/generated code. 🤔
I don't find anything useful about these in the schema.org webpage.

They seem to be parsed from these lines:
https://github.com/schemaorg/schemaorg/blob/c4dafe190bd60b6ef63ee0c0b2ad640ff8b5022f/data/releases/7.04/schema.rdfa#L7681-L7734

I believe that it would be okay to drop them.

@mallardduck
Copy link
Contributor

So I got things to a working point but was quite confused why so many methods were being changed around. Not just a few, but a lot of schema's loosing properties and methods.

After looking into it further I've come to the conclusion that it was an "upstream" issue with how schemaorg/schemaorg was generating the RDFa file. At least that is the most simple explanation I have for so many things changing with my jsonld based generator.

I examined the 7.04 release version of the files and found a few good examples of conflicts between RDFa and JSON-LD. So with that in mind there will be a lot more changes than I expected this to produce. So I understand that this will make reviewing the generated changes a less than ideal, but that's beyond our control it seems.


Examples

offeredBy

<div typeof="rdf:Property" resource="http://schema.org/offeredBy">
      <span class="h" property="rdfs:label">offeredBy</span>
      <span property="rdfs:comment">A pointer to the organization or person making the offer.</span>
      <link property="http://schema.org/inverseOf" href="http://schema.org/makesOffer"/>
      <span>Domain: <a property="http://schema.org/domainIncludes" href="http://schema.org/Offer">Organization</a></span>
      <span>Range: <a property="http://schema.org/rangeIncludes" href="http://schema.org/Person">Person</a></span>
      <span>Range: <a property="http://schema.org/rangeIncludes" href="http://schema.org/Organization">Offer</a></span>
</div>

Note: The "Domain" shows Organization for the inner text, but Offer as the href.

Then in contrast the JSON-LD file is consistent with the href aspect of the RDFa:

{
    "@id": "http://schema.org/offeredBy",
    "@type": "rdf:Property",
    "http://schema.org/domainIncludes": {
        "@id": "http://schema.org/Offer"
    },
    "http://schema.org/inverseOf": {
        "@id": "http://schema.org/makesOffer"
    },
    "http://schema.org/rangeIncludes": [
        {
            "@id": "http://schema.org/Organization"
        },
        {
            "@id": "http://schema.org/Person"
        }
    ],
    "rdfs:comment": "A pointer to the organization or person making the offer.",
    "rdfs:label": "offeredBy"
}

interactionStatistic

In this case the RDFa only had a single Domain, but JSON-LD has many.

<div typeof="rdf:Property" resource="http://schema.org/interactionStatistic">
    <span class="h" property="rdfs:label">interactionStatistic</span>
    <span property="rdfs:comment">The number of interactions for the CreativeWork using the WebSite or SoftwareApplication. The most specific child type of InteractionCounter should be used.</span>
    <span>Domain: <a property="http://schema.org/domainIncludes" href="http://schema.org/CreativeWork">CreativeWork</a></span>
    <span>Range: <a property="http://schema.org/rangeIncludes" href="http://schema.org/InteractionCounter">InteractionCounter</a></span>
</div>

https://github.com/schemaorg/schemaorg/blob/c4dafe190bd60b6ef63ee0c0b2ad640ff8b5022f/data/releases/7.04/schema.rdfa#L10219

 {
    "@id": "http://schema.org/interactionStatistic",
    "@type": "rdf:Property",
    "http://purl.org/dc/terms/source": {
        "@id": "https://github.com/schemaorg/schemaorg/issues/2421"
    },
    "http://schema.org/category": "issue-2421",
    "http://schema.org/domainIncludes": [
        {
            "@id": "http://schema.org/Person"
        },
        {
            "@id": "http://schema.org/Organization"
        },
        {
            "@id": "http://schema.org/CreativeWork"
        }
    ],
    "http://schema.org/rangeIncludes": {
        "@id": "http://schema.org/InteractionCounter"
    },
    "rdfs:comment": "The number of interactions for the CreativeWork using the WebSite or SoftwareApplication. The most specific child type of InteractionCounter should be used.",
    "rdfs:label": "interactionStatistic"
}

https://github.com/schemaorg/schemaorg/blob/c4dafe190bd60b6ef63ee0c0b2ad640ff8b5022f/data/releases/7.04/schema.jsonld#L12431

@mallardduck
Copy link
Contributor

So keeping that all in mind - since that makes comparing a little harder - I'll shoot a PR over soon once I've reviewed things further.

🚀

@Gummibeer
Copy link
Collaborator Author

Hey,
Could you please split the changes to the generator and the src ones?
In best case it will be:
A branch containing the generator changes targeting the upstream master branch.
And the one containing generator and src changes pointing to the branch containing the generator changes.
This way I can easily review the generator itself and also the changes in src.

I would say that the JSON-LD is always right - I already had "endless" issues with the RDFa mixing different releases, listing pending types as released type of properties but not having the pending type definition and so on. Not starting with typos.
No idea if they generating it by hand!?^^

PS: could be that I merge them quickly in new repo branches because it's a lot easier for me than handling forked branches.^^
Thanks again for your help! 🎉🚀

@mallardduck
Copy link
Contributor

@Gummibeer Unfortunately it looks like the JSON-LD is just as prone to them including pending type definitions. 🤦

Forgive the single commit branch, I had things in a more logical flow of commits as I worked. However I accidentally nuked the local repo I was in, so had to restore the work via PhpStorm buffers thus it's all a squashed commit now. But #140 has just the generator changes as requested.

Wasn't exactly sure how you wanted the second one done and my brain was kinda mush after the panic of nuking my local repo and restoring it all. But that's #141 where things fail tests due to the inconsistent publishing.

Specifically an example of the inconsistent publishing is MedicalOrganization (published and core) compared to MedicalBusiness (non-published and extension definition). The Dentist definition on JSON-LD for some reason includes both - thus the generated class does as well. So it fails to find/generate the MedicalBusinessContract.

Trying to look over that specific case now to see how that can be accounted for. Maybe I goofed up refactoring the section you have that helps with marking those as pending.

@Gummibeer
Copy link
Collaborator Author

So far I know I've flagged them as pending if the type is only used but never defined.

And so far I've seen it v9 has a much more reduced file list. They only have files containing everything. So could be that v9 answers the question how we should handle extensions. 🙈

I will checkout your PRs in some hours. 🎉

@yellow1912
Copy link

Hello,

It's been over a year. I understand you guys are very busy and all. Just want to check if it's safe to use this for an active project to generate schema (mainly for search engines)

@Gummibeer Gummibeer linked a pull request Oct 3, 2021 that will close this issue
@Gummibeer
Copy link
Collaborator Author

Hey, I'm sorry for the missing link and wrongfully open issue.
The related PR was also merged ~1 year ago. 🙈
So, yes - everything fine. The schema.org spec is also not that much changing that a 1year old spec would be a massive problem.
Most times only new attributes/types are added - in case you miss anything: you can open an issue and we can trigger a re-generate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants