Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Vocabulary File Structure
A vocabulary file is used to provide recommendations of standard values for a field given an original value from a source. Each vocabulary file contains three columns, with one row for every distinct value that has been found so far for that field from data sets that have been assessed. The first column is the name of the field in which the value was found. The second column is the recommended standard value to use in place of the value given. The third column signifies whether the recommended standard value has been vetted (1) or not (0). Vetting means that someone has looked at the original value and provided either a) a standard value to replace the original value, or b) an empty value signifying that the original value cannot be unambiguously assigned to a standard value.
Following is an example subset of a vocabulary file for the term sex (http://rs.tdwg.org/dwc/terms/index.htm#sex), shown in the format of a comma-separated value file. A complete vocabulary file for sex (formatted as a tab-separated value file) can found at https://github.com/kurator-org/kurator-validation/blob/master/packages/kurator_dwca/data/vocabularies/sex.txt:
sex,standard,vetted M,male,1 F,female,1 undetermined,unknown,1 male?,in question,1 hembra,female,0