Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider an extensible metadata model? #31

Open
cboettig opened this issue Mar 10, 2020 · 3 comments
Open

Consider an extensible metadata model? #31

cboettig opened this issue Mar 10, 2020 · 3 comments

Comments

@cboettig
Copy link
Member

I like the minimal metadata required by bb_source() that we can search from the bb_data_sources() table. For large collections though, I wonder if it would make sense to support some additional optional fields that users could specify to make it easier to search their collections later, e.g. a keyword field, or file type, etc?

Going further -- much ink has been spelt over metadata descriptions for scientific data, but I am curious if it would be worth crimping from some of those. e.g. bowerbird could adopt the https://schema.org/Dataset or DCAT2 as the basis for it's metadata representation. I imagine most fields would still be optional, but this would allow for a bit greater expressiveness. Perhaps more relevantly, these fields could be auto-populated when importing data from sources that already expose metadata in these formats (e.g. Zenodo, data.gov, and many others serve the schema.org/Dataset metadata descriptions).

@raymondben
Copy link
Member

The original motivation for BB was for mirroring external data sets that generally already have a full metadata record elsewhere. BB's minimal metadata was intended to be enough to trigger users to know where the data actually came from and cite it if appropriate, and link them to documentation (that full MD record).
So while I'm not at all averse to a richer metadata framework within BB, I'm not sure that manually duplicating already-existing metadata information is ideal. I wonder how practicable it would be to auto-fill some bowerbird metadata from an existing record? e.g.

my_source_skeleton <- bb_src_from_eml(url_to_eml_record)
## or
my_source_skeleton <- bb_src_from_iso(url_to_iso19115_record)

to give a user a skeleton record from an EML/ISO/DIF/whatever record, which they then edit as appropriate? (And accommodate some extra optional fields within that, too.)

@cboettig
Copy link
Member Author

Yeah, that makes sense. I could still imagine that it might be useful for a user to add their own keyword or tags, e.g. to associate data with a project etc, but of course there's other places they could record that.

Yeah, an automated import of the minimal data might make sense (e.g. datacite API will generate a citation for anything with a DOI) but probably involves supporting too many different formats, so I'm happy to close this as out of scope but your call.

@raymondben
Copy link
Member

Chipping away at this - I have implemented a 'source generator' function for Zenodo data sets (bb_zenodo_source). Given the Zenodo identifier of a data set, it will pull what it needs from the data descriptor and generate the (pretty much complete) bb_source object. Also a similar function to handle Australian Antarctic Data Centre data sets. I had hoped (per commments above) that something like that might even be possible for general EML or DIF metadata records, but have decided that this is probably impractical. Or at least more effort than I can manage for the time being.

@cboettig
Copy link
Member Author

cboettig commented Apr 2, 2020

👏 very cool! I think having even one of these is a nice proof of concept. A user who commonly accesses data through a specific platform could then more easily template off your example at least. And given the ease of depositing data in Zenodo it seems like a good one to start with. nice work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants