Metadata #3

Closed
noamross opened this Issue Nov 13, 2015 · 9 comments

Projects

None yet

5 participants

@noamross

Not explicitly in the list so far is metadata, explaining the source of data, meaning of fields, who/where/when collected, etc. There are lots of metadata standards 'good enough' practice in this area probably includes some accompanying data-specific README with prose explaining this information.

@PBarmby
PBarmby commented Nov 14, 2015

+1. Have seen too many text data files with no labels, references or anything else.

@gvwilson
Member

Added explicit requirement for metadata in 5c95d5f - please let me know what you think.

@jduckles

The Dublin Core elements might be a good place to start as "good enough"

@gvwilson
Member
@jduckles

@gvwilson Which of these projects you speak of had any metadata standard associated with them?

@gvwilson
Member
@gvwilson
Member

Comment about Dublin Core and other standards included at line 178 - thoughts?

@elliewix

tl;dr: metadata != readme != codebook. Be clear about what type of metadata you are talking about. Formal metadata usually covers just the dataset as a singleton item and is great, but not the complete picture that reusers need. Formal metadata will not replace a good human readable readme file or a good codebook (if applicable). Follow your research community's standards for how to write a good readme. Look into depositing your data to a repository to get help with providing this information. There are domain (ICPSR), non-domain (figshare), and institutional repositories (usually university hosted). Many domain and institutional repositories offer curation services to help you create the more detailed readme files. Even if they don't have personal service, self-deposit forms will have you fill out the metadata for ingest.

Now, for the wordy version...

There are two types of metadata that often get construed in dataset discussions: metadata about the dataset as a whole and metadata about the content within the dataset. Most metadata schemas you'll encounter are for the former use. They are to describe the dataset as a unit. E.g. author, funder, relevant papers, etc. Pretty much every schema but for DDI does not have elements to explicitly hold codebook-like information.

Dublin Core is one of the most generic schemas around and almost so generic when it comes to data and code that even a hastily written readme file will cover more ground. Qualified Dublin Core might be better, but the elements are so not in line with data or scientific computing that you won't find a good place to put everything you know you should be describing.

Formally structured metadata is often a valueless effort if the dataset will be stored independently and not somewhere in a formal repository. There are some great domain specific metadata schemas out there, and you can certainly use one as a guideline for writing your readme. Beautifully filled out metadata XML files is for ingestion into a repository and/or directory. If the audience is humans, write it for humans. If the audience includes metadata harvesters, fill out the formal metadata and do a readme for the humans.

A formal data repository will already be using these metadata schemes in some capacity, so adding your dataset to one will usually automatically generate that metadata. Bonus yet, they usually forward it on to a harvester so your data will show up in data search engines (example: DataCite DOIs, google scholar, etc.). Many repositories have curators who can help prepare more detailed and formal metadata (sometimes at a price) and other repositories will have self-deposit where you fill out a form based on that metadata.

@gvwilson
Member

See #29 - @elliewix @jduckles @PBarmby comments welcome.

@gvwilson gvwilson closed this Nov 21, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment