-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use XSD datatypes not schema.org datatypes #654
Comments
This use of a schema datatype is perhaps the only valid one:
The reason is that there's no XSD datatype to cover date intervals.
|
Wow, the discussion in schemaorg/schemaorg#1781 is quite amazing and instructive. You make very valid points on the merits of xsd types vs. schema.org basic data types. Personally, I would lean towards supporting both in Croissant, and specifying a clear mapping as you do in that discussion. In general, data typing in Croissant aims to be extensible, and not limited to a single namespace. For instance, users can "semantically" type their data by associate classes from schema.org, wikidata, or other vocabularies. That said, for basic data types we certainly want to favor consistency to reduce the burden on tools and users of the datasets. As you noted, we do inherit some of the fuzzyness of schema.org, but try to make things a bit more precise where necessary. Regarding format, "holds" means "contains". cr:Format is just a marker type, but its values are still strings (err... I mean sc:Text. :-) |
We are definitely going to need to differentiate between int8, int16, uint8... and xsd has short, long, unsignedLong, etc. Looking at numpy types, is xsd enough though? What mechanism do we want to support to describe a field as being a int128, or a complex number for example? |
@pierrot0 For that you'd need custom datatypes. How about large multidimensional arrays (tensors)? NetCDF and HDF5 for example have mechanisms for capturing such in binary and for describing them.
|
@pierrot0 I've reread the discussion above.
What precisely do you mean by this, can you give an example?
|
Schema.org datatypes are not good:
xsd:date, xsd:decimal
etc but not forschema:Date, schema:Number
etcschemaorg/schemaorg#1781 explains in more detail what's wrong with them.
This also leads to confusion, eg in https://github.com/mlcommons/croissant/blob/main/docs/croissant.ttl:
cr:format
, or does it point to a node with typecr:Format
that holds the string?This issue involves the ontologies and JSONLD context.
Here's a count of occurrences in the two ontologies:
Also, I think it's better to distinguish properties between
owl:DatatypeProperty
andowl:ObjectProperty
.Many Schema.org props are permissive and allow either literal or object ("string or thing"), but I think Croissant props are more precise,
The text was updated successfully, but these errors were encountered: