-
Notifications
You must be signed in to change notification settings - Fork 844
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improvements to Dataset #688
Comments
I'm not quite sure what problem this solves. Wouldn't a document containing these assertions be the dataset? Is every entity mentioned in the dataset considered a value of itemListElement? e.g. if Bob here had an affiliation of some Organization, is that Organization an itemListElement of the dataset too? I don't think moving it under ItemList works well, in that Dataset covers a great many kinds of dataset - not all of which have a single obvious conceptualization as a list of items. For example, audio recordings (see http://grh.mur.at/sites/default/files/mir_datasets_0.html), pre-trained artificial neural nets (https://github.com/BVLC/caffe/wiki/Model-Zoo), geo data (http://opendata.arcgis.com/ http://wiki.osgeo.org/wiki/Public_Geodata_for_the_UK ), space data incl. imagery and sensor readings (https://data.nasa.gov/data) etc etc. It's important that we keep this type open and inclusive for all these kinds of data sharing + more. But it is worth taking a closer look at an important subset: datasets whose content can be seen as a set of assertions about the properties of entities. That seems to be where you're heading here. There is some related work over at W3C in the CSV group, see http://www.w3.org/blog/news/archives/4830 especially http://www.w3.org/TR/2015/CR-csv2rdf-20150716/ which includes a framework for mapping table rows (from CSV and similar tabular data) into triples. This is a different approach to "the actual data", but shares with your proposal a concern for treating that data as triples/assertions. Maybe there's some common ground here? |
I had imagined multi-dimensional sets as a list of lists, but maybe that is too complicated. Perhaps instead folks use both Dataset and ItemList as necessary, but we still have DataItem to allow for metadata about individual items. The above example becomes:
|
I realize there is an error in my JSON-LD. It should be:
|
I'm still missing something. Aren't all docs carrying schema.org datasets already? What is the value in explicitly saying "hey, I'm a dataset" and "hey, this is a thing mentioned in the dataset" all the way through? It feels like the overuse of WebPage we've seen, ... an awkward form of reification where you're not entirely sure what is being described or how deep into the sub-graph the properties apply. If you just want to wrap provenance metadata around chunks of schema.org-flavoured RDF, perhaps JSON-LD named graphs are worth a look? http://www.w3.org/TR/json-ld/#named-graphs For multidimensional numeric data, http://www.w3.org/TR/vocab-data-cube/ could be a good fit. |
This is more for data feeds that are not necessarily web pages or email messages. In some cases, the full data set is not sent at once, so it is useful to know the creation time of individual items. |
Here's a quick attempt at using JSON-LD named graphs. Try it in http://json-ld.org/playground/
The quads that come back are:
... where the final value in each row is a graph id (date item, in your terminology) |
I'm not sure I understand "generatedAt" is not valid schema.org, so we would need to change the context to include another vocabulary. |
Yeah that the was the example property name used in the W3C spec, I didn't tweak it. We probably have something appropriate in schema.org or could add, or use a different context. But does the quads / named graph approach look worth consideration? |
|
I am not sure I understand. As written, I have two disconnected graphs: Alice's graph and Bob's. I still need something to say they are actually parts of a larger graph. Do you disagree with adding something to Dataset to join the graphs? |
can't we use the collections stuff for that? |
I spoke with @danbri offline to better understand his concerns. It is probably too much to take on modeling all data sets in one go. To that end, I would like to refocus the discussion on supporting data feeds which may come as JSON-LD instead of web pages. To that end, I propose:
The properties http://schema.org/dateCreated and http://schema.org/dateModified exist on http://schema.org/CreativeWork. The proposal is to expand their domains to include DataFeedItem. The sample JSON-LD becomes:
|
Thanks @vholland this is a lot clearer. Can we just run through the date-related properties. At first glance they seem more alike than on 2nd reading, at least for me:
The problem here is probably just wording: "item" in the definitions could either mean the DataFeedItem, or the actual real world item (e.g. the Person "Bob") that is the value of the item property. Let's try to rephase so it is clearer on first reading. |
…es and release notes.
Good point, regarding wording. In all cases, the dates apply to the proxy item in the feed. I created pull request #765 with the listed changes. I took the liberty of extending the range for dateCreated and dateModified to also accept DateTime, as feeds (and increasingly online content) has creation dates that include times. |
This looks good. I'm merging it in so people have a concrete target to review... |
Issue #688: Added DataFeed and DataFeedItem including examples and
/cc @chaals @ajax-als @tilid @pmika @mfhepp @shankarnat @rvguha Ok, please take a look here: http://sdo-phobos.appspot.com/DataFeed There's a JSON-LD example (thanks, Vicki). The idea is, within the constraints of a normal schema.org description (no fancy multi-graph stuff) to provide more feed-like metadata on the items described, to aid consumption, aggregation etc. I looked into some other options and have ended up more convinced than when I started it :) this is useful... |
I forgot to add that one use for this is the supporting data for a software application. (For example, configuration data.) I'll create a new pull request shortly. |
Implemented in pull request #822. |
Issue #688: Added supportingData to SoftwareApplication.
Will those 'DataFeeds' need paging? I see 4 independent (uncoordinated) developments here
I don't know if the more the merrier applies here 😉 |
Fixed in http://schema.org/docs/releases.html#v2.2 - thanks all. Closing as main issue is addressed, feel free to continue discussions! |
As it stands, http://schema.org/Dataset allows one to describe the metadata for a dataset, but not the actual data. I propose we:
This would allow people to create data catalogs like:
Note in the above example, the dateCreated is the date the record was created not the date when the person joined the company.
One could describe simple datasets by using Number or Text instead of a richer type.
The text was updated successfully, but these errors were encountered: