Taxonomy Matching

Jorrit Poelen edited this page Dec 21, 2016 · 42 revisions

Dealing with a heterogeneous collection of interaction datasets requires flexible taxonomy matching. When importing interaction datasets, eol-globi-importer applies corrections to known mistakes and outdated taxon names. After scrubbing, the corrected taxon name is matched (strict and fuzzy) against external taxonomies, including, but not limited to, EOL, ITIS, and WoRMS. If a taxon name could not be matched, the name is truncated in an attempt to match higher order taxa. For instance, if Homo sapiens does not match, Homo is tried for matching. Finally, an import report is generated along with the normalized and matched taxon names.

Submitting Name Corrections

If you found an unmatched or incorrect name for target (e.g. prey) or source (e.g. predator), please submit a new issue in the format of the correction list provided below. Alternatively, if you are a contributor to this project, you can edit the corrections list directly. Simply login to github, and click on corrections list and use github's web editor.

List of Resources

Name Description Link
scrubber examples of scrubbed conversions TaxonNameCorrectorTest.java
corrections list of most recent corrections taxon-name-mapping.csv
all taxa list of taxon names that occurred (or derived from) interaction datasets taxonCache.tsv.gz
taxon links links taxon ids that are believed to represent the same taxon taxonMap.tsv.gz
unmatched taxa (e.g. predator, prey, parasite names) list of taxon names that did not match against any of the external taxonomies taxonUnmatched.tsv

Taxonomy Matching Process

TaxonomyMatching