-
Notifications
You must be signed in to change notification settings - Fork 5
Description
Rationale
In the March 31 VLIZ-INBO meeting I suggested to have write_dwc() run from the CSV files generated by download_acoustic_dataset() rather than from the database. This is in line with the idea that a Marine Data Archive for an animal acoustic project will contain both the source and Darwin Core Archive data:
# source data, generated with download_acoustic_dataset()
datapackage.json
animals.csv
tags.csv
acoustic_detections.csv
archival_data.csv
deployments.csv
receivers.csv
projects.csv
# DwC-A data, generated with write_dwc()
dwc_occurrence.csv
dwc_emof.csv
meta.xmlRunning from the CSV files has several advantages:
- Only need to query data from DB once (in
download_acoustic_dataset()). - Darwin Core data will always be consistent with CSV files. Currently it is possible that there is drift between the two, e.g. when
write_dwc()is ran weeks later (and DB data are updated) or when the scientific_name argument was used indownload_acoustic_dataset()(which is not available inwrite_dwc()) - Can update
datapackage.jsonto reference Darwin Core files. - Once you have the CSV files, it's faster to run
write_dwc()
The process would thus be:
- Run
download_acoustic_dataset() - Quality assurance
- Fix errors in database
- Repeat step 1-3 until all is correct
- Run
write_dwc()on local CSV files
Implementation
Implementation would be similar to https://inbo.github.io/movepub/reference/write_dwc.html, where a Frictionless Data Package is provided.
- Discuss with @PietrH what branch to use
Parameters
-
package(no default): africtionless::read_package(). Alternatively, we ask the user for an input directory. -
: removeconnection -
: remove, context is provided byanimal_project_codepackage -
directory(no default): output directory -
contact(cf. movepub), not sure this is needed. -
rights_holder(defaultNULL) -
license(default"CC-BY")
Error checking
- Check that all required resources are available. I assume those will be at least
animals,detections.
Transformation
- Convert [dwc_occurrence.sql(https://github.com/inbo/etn/blob/main/inst/sql/dwc_occurrence.sql) to dplyr
- Test that all necessary information is available in the source CSVs. If not, then
download_acoustic_dataset()should be updated
Testing
- Create snapshot files for a small animal project with the current implementation of
write_dwc() - Make sure the same snapshot files are created with the new version of
write_dwc() - Test if this resolves write_dwc() hourly subsampling can return different detections between exports #347