Darwin Core Archive Field Value Counter

John Wieczorek edited this page Nov 8, 2016 · 8 revisions

This workflow:

  • creates a given directory as a workspace
  • downloads a Darwin Core Archive from a given URL
  • extracts the core file of a Darwin Core Archive to a tab-separated text file
  • for each field in a given list of fields, creates a report of counts of distinct values

The files produced by this workflow are:

  • dwca.zip - the Darwin Core archive file downloaded from the given URL
  • dwca_extracted_occurrences.txt - the core file of the downloaded Darwin Core Archive as a TXT file
  • count_[field].csv - for each field in the given list of fields, a file containing the distinct values and the number of times they appeared in the extracted core file. Files are named 'count_[field].csv', where [field] is the name of the field for which the report is being made. See https://github.com/kurator-org/kurator-validation/wiki/Field-Value-Count-Report

References

Workflow configuration file: https://github.com/kurator-org/kurator-validation/blob/master/packages/kurator_dwca/workflows/dwca_term_values.yaml

You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.