-
Notifications
You must be signed in to change notification settings - Fork 825
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rationalisation of dump files required #2304
Comments
#2537 shows that the current reduced |
I agree. A few years ago we introduced a fair amount of engineering to support layered extension. In practice it turned out too difficult to disentangle various overlapping areas. Should we have an "extension" for Cultural Heritage? for Galleries / Archives / Libraries / Museums (GLAM)? Or different named efforts for bibliography and archival vocabulary? What about tourism, travel, real estate, and e-commerce? Where should ebooks be described? etc etc. In practice we have gradually migrated back towards a simple structure.
We have already downplayed the use of named subdomain extensions ("pending", "bib", etc.) in the site navigation. We have found little use for the defensive layering structure which attributed each triple in the schema definitions to one of those sections. In practice, "all layers" is the only sensible subset to use. There should be a simple triple representation, a consolidated Turtle representation (which looks as similar as possible to the MCF-flavoured Turtle source files), and I guess there is a desire for JSON-LD. Do we believe the CSV files have proved useful? For naming I'd suggest something like
|
Reviewing the files we currently produce I see we do not reference attic, ether individually or in other dump files. As both schema.* and all-layers.* are often referenced in various issues and other comms, we ideally should not change those names without good reason. To that end I suggest we retain the schema.* name, but update the contents to include all terms. As all-layers.* is often used/recommended it should be retained for now, even though the contents will be identical to the updated schema.* files. The CSV versions have proved useful. Many issues, and confusion about file contents, I have been involved with have often stemmed from someone reading a CSV versions. |
The good reason is that we no longer have a layered architecture for the sections of Schema.org. Please use "schemaorg-current" and "schemaorg-all". This will make it easier to avoid confusing schema.ttl for (data/)schema.ttl, amongst other things. It is also developer friendly, in that it is a stronger reminder that this data comes from schema(.)org. |
Will that influence download locations of URLs like |
@ktk we'll have to update the developer documents, yes. /cc @RichardWallis |
see https://twitter.com/danbri/status/1283767294267731971 for announcement of this proposed change |
* Simplified and expanded dump files and updated associated documentation. Re: issue (#2304) * Adjust all-lays copies to account for limited file builds (as in travis) * Updated dump file names to schemaorg-current & schemaorg-all * Updated to create http & https versions of dumpfiles. * Modified name of dumpfile used to reflect changes Co-authored-by: Dataliberate <rjw@dataliberate.com>
Implemented in PR #2654 |
Checked https://schema.org/version/latest/schemaorg-current-http.ttl, all used classes are defined: PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix schema: <http://schema.org/>
select distinct ?class {
?p schema:domainIncludes|schema:rangeIncludes ?class
filter not exists {?class a rdfs:Class}
} order by ?class |
As identified in issues #2301 & #2302; within the definition download files, there is not full segregation between the terms defined in the core and terms defined in sections (extensions).
eg.The pending defined CssSelectorType, which is defined as the range for cssSelector property, defined in the core vocab. As the schema dump file only contains terms defined in the core, CssSelectorType definition doesn't appear in that file.
Anecdotally, it appears that the main down of dump files are for 'schema' (core only) the default offered on the downloads page and the 'all-layers' which contains all definitions (including those in the attic section).
I recommend that we rationalise the types of dump files offered (currently 8) down to 2 with the following contents:
The text was updated successfully, but these errors were encountered: