Vocabulary File Structure

John Wieczorek edited this page Nov 8, 2016 · 3 revisions

A vocabulary file is used to provide recommendations of standard values for a field given an original value from a source. Each vocabulary file contains three columns, with one row for every distinct value that has been found so far for that field from data sets that have been assessed. The first column is the name of the field in which the value was found. The second column is the recommended standard value to use in place of the value given. The third column signifies whether the recommended standard value has been vetted (1) or not (0). Vetting means that someone has looked at the original value and provided either a) a standard value to replace the original value, or b) an empty value signifying that the original value cannot be unambiguously assigned to a standard value.

Following is an example subset of a vocabulary file for the term sex (http://rs.tdwg.org/dwc/terms/index.htm#sex), shown in the format of a comma-separated value file. A complete vocabulary file for sex (formatted as a tab-separated value file) can found at https://github.com/kurator-org/kurator-validation/blob/master/packages/kurator_dwca/data/vocabularies/sex.txt:

sex,standard,vetted
M,male,1
F,female,1
undetermined,unknown,1
male?,in question,1
hembra,female,0
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.