Taxonomy Matching
Dealing with a heterogeneous collection of interaction datasets requires flexible taxonomy matching. When importing interaction datasets, eol-globi-importer applies corrections to known mistakes and outdated taxon names. After scrubbing, the corrected taxon name is matched (strict and fuzzy) against external taxonomies, including, but not limited to, EOL, ITIS, and WoRMS. If a taxon name could not be matched, the name is truncated in an attempt to match higher order taxa. For instance, if Homo sapiens
does not match, Homo
is tried for matching. Finally, an import report is generated along with the normalized and matched taxon names.
If you found an unmatched or incorrect name for target (e.g. prey) or source (e.g. predator), please submit a new issue in the format of the correction list provided below. Alternatively, if you are a contributor to this project, you can edit the corrections list directly. Simply login to github, and click on corrections list and use github's web editor.
Name | Description | Link |
---|---|---|
scrubber | examples of scrubbed conversions | TaxonNameCorrectorTest.java |
corrections | list of most recent corrections | taxon-name-mapping.csv |
all taxa | list of taxon names that occurred (or derived from) interaction datasets | taxonCache.tsv.gz |
taxon links | links taxon ids that are believed to represent the same taxon | taxonMap.tsv.gz |
unmatched taxa (e.g. predator, prey, parasite names) | list of taxon names that did not match against any of the external taxonomies | taxonUnmatched.tsv |