Consider an extensible metadata model? #31

cboettig · 2020-03-10T16:49:38Z

I like the minimal metadata required by bb_source() that we can search from the bb_data_sources() table. For large collections though, I wonder if it would make sense to support some additional optional fields that users could specify to make it easier to search their collections later, e.g. a keyword field, or file type, etc?

Going further -- much ink has been spelt over metadata descriptions for scientific data, but I am curious if it would be worth crimping from some of those. e.g. bowerbird could adopt the https://schema.org/Dataset or DCAT2 as the basis for it's metadata representation. I imagine most fields would still be optional, but this would allow for a bit greater expressiveness. Perhaps more relevantly, these fields could be auto-populated when importing data from sources that already expose metadata in these formats (e.g. Zenodo, data.gov, and many others serve the schema.org/Dataset metadata descriptions).

The text was updated successfully, but these errors were encountered:

raymondben · 2020-03-15T22:33:52Z

The original motivation for BB was for mirroring external data sets that generally already have a full metadata record elsewhere. BB's minimal metadata was intended to be enough to trigger users to know where the data actually came from and cite it if appropriate, and link them to documentation (that full MD record).
So while I'm not at all averse to a richer metadata framework within BB, I'm not sure that manually duplicating already-existing metadata information is ideal. I wonder how practicable it would be to auto-fill some bowerbird metadata from an existing record? e.g.

my_source_skeleton <- bb_src_from_eml(url_to_eml_record)
## or
my_source_skeleton <- bb_src_from_iso(url_to_iso19115_record)

to give a user a skeleton record from an EML/ISO/DIF/whatever record, which they then edit as appropriate? (And accommodate some extra optional fields within that, too.)

cboettig · 2020-03-16T03:42:28Z

Yeah, that makes sense. I could still imagine that it might be useful for a user to add their own keyword or tags, e.g. to associate data with a project etc, but of course there's other places they could record that.

Yeah, an automated import of the minimal data might make sense (e.g. datacite API will generate a citation for anything with a DOI) but probably involves supporting too many different formats, so I'm happy to close this as out of scope but your call.

raymondben · 2020-04-02T00:52:39Z

Chipping away at this - I have implemented a 'source generator' function for Zenodo data sets (bb_zenodo_source). Given the Zenodo identifier of a data set, it will pull what it needs from the data descriptor and generate the (pretty much complete) bb_source object. Also a similar function to handle Australian Antarctic Data Centre data sets. I had hoped (per commments above) that something like that might even be possible for general EML or DIF metadata records, but have decided that this is probably impractical. Or at least more effort than I can manage for the time being.

cboettig · 2020-04-02T20:18:10Z

👏 very cool! I think having even one of these is a nice proof of concept. A user who commonly accesses data through a specific platform could then more easily template off your example at least. And given the ease of depositing data in Zenodo it seems like a good one to start with. nice work!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider an extensible metadata model? #31

Consider an extensible metadata model? #31

cboettig commented Mar 10, 2020

raymondben commented Mar 15, 2020

cboettig commented Mar 16, 2020

raymondben commented Apr 2, 2020

cboettig commented Apr 2, 2020

Consider an extensible metadata model? #31

Consider an extensible metadata model? #31

Comments

cboettig commented Mar 10, 2020

raymondben commented Mar 15, 2020

cboettig commented Mar 16, 2020

raymondben commented Apr 2, 2020

cboettig commented Apr 2, 2020