Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tests for write dwc #49

Merged
merged 71 commits into from
Nov 9, 2023
Merged

Tests for write dwc #49

merged 71 commits into from
Nov 9, 2023

Conversation

PietrH
Copy link
Member

@PietrH PietrH commented Jul 4, 2023

Todo

  • Test expected Darwin Core mapping
  • Test expected EML file (note that there is a UUID that is expected to change)

data package for testing

o_assen was suggested:

o_assen <-
  frictionless::read_package("https://zenodo.org/record/5653311/files/datapackage.json") %>%
  # Remove the large acceleration resource we won't use (and thus won't download)
  frictionless::remove_resource("acceleration") %>%
  frictionless::write_package()

It can potentially be reduced to fewer animals.

However, this example set only includes one taxon as far as I can see. We could consider a different, or second, test dataset.

library(dplyr)

o_assen <- 
  frictionless::read_package("inst/extdata/o_assen/datapackage.json")

frictionless::read_resource(
  o_assen,
  "reference-data"
) %>%
  rename_with(~ vctrs::vec_as_names(.x, repair = "universal", quiet = TRUE)) |>
  count(animal.taxon)
#> # A tibble: 1 × 2
#>   animal.taxon              n
#>   <chr>                 <int>
#> 1 Haematopus ostralegus     7


frictionless::read_resource(
  o_assen,
  "gps"
) %>%
  rename_with(~ vctrs::vec_as_names(.x, repair = "universal", quiet = TRUE)) |>
  count(individual.taxon.canonical.name)
#> # A tibble: 1 × 2
#>   individual.taxon.canonical.name     n
#>   <chr>                           <int>
#> 1 Haematopus ostralegus           20156

Todo

  • Fix issue where temporary directory paths are not properly deleted from snapshots on windows
  • EOL problem
  • Reduce size of testing data
  • Include testing data as a dataset
  • Improve test order for write_dwc()
  • Change nesting of test order in write_dwc() so they can run out of order
  • Use variant snapshots in write_dwc() so they can run out of order + are easier to review
  • Remove stringr test dependency (test if I can get it to work with gsub())
  • wrap testthat::expect_snapshot_file() in helper, include testthat::announce_snapshot_file() to stop auto-delete of expectation when write_dwc_snapshot() fails (eg. when there is no internet connection)
  • Request review

@PietrH PietrH added the enhancement New feature or request label Jul 4, 2023
@PietrH PietrH linked an issue Jul 4, 2023 that may be closed by this pull request
6 tasks
@PietrH
Copy link
Member Author

PietrH commented Jul 4, 2023

Having some trouble building the vignette because:

reference_data <- "https://www.datarepository.movebank.org/bitstream/handle/10255/move.379/Migration%20timing%20in%20barnacle%20geese%20%28Svalbard%29%20%28data%20from%20Ko%cc%88lzsch%20et%20al.%20and%20Shariatinajafabadi%20et%20al.%202014%29-reference-data.csv?sequence=3"
gps_data <- "https://www.datarepository.movebank.org/bitstream/handle/10255/move.378/Migration%20timing%20in%20barnacle%20geese%20%28Svalbard%29%20%28data%20from%20Ko%cc%88lzsch%20et%20al.%20and%20Shariatinajafabadi%20et%20al.%202014%29.csv?sequence=3"

don't seem to run consistently, sometimes getting http 502 errors from datarepository.movebank.org

@peterdesmet peterdesmet reopened this Jul 10, 2023
@PietrH
Copy link
Member Author

PietrH commented Jul 12, 2023

Having some trouble building the vignette because:

reference_data <- "https://www.datarepository.movebank.org/bitstream/handle/10255/move.379/Migration%20timing%20in%20barnacle%20geese%20%28Svalbard%29%20%28data%20from%20Ko%cc%88lzsch%20et%20al.%20and%20Shariatinajafabadi%20et%20al.%202014%29-reference-data.csv?sequence=3"
gps_data <- "https://www.datarepository.movebank.org/bitstream/handle/10255/move.378/Migration%20timing%20in%20barnacle%20geese%20%28Svalbard%29%20%28data%20from%20Ko%cc%88lzsch%20et%20al.%20and%20Shariatinajafabadi%20et%20al.%202014%29.csv?sequence=3"

don't seem to run consistently, sometimes getting http 502 errors from datarepository.movebank.org

Hi, this is due to recent software/site upgrades. Persistent links to the package are https://doi.org/10.5441/001/1.5k6b1364 https://datarepository.movebank.org/handle/10255/move.377

and links to the files are https://datarepository.movebank.org/bitstreams/a6e123b0-7588-40da-8f06-73559bb3ff6b/download https://datarepository.movebank.org/bitstreams/df28a80e-e0c4-49fb-aa87-76ceb2d2b76f/download

Hope this helps and thanks for the feedback.

I've already changed this to urls that I got from inspecting my browser: see here

reference_data <- "https://datarepository.movebank.org/server/api/core/bitstreams/a6e123b0-7588-40da-8f06-73559bb3ff6b/content"
gps_data <- "https://www.datarepository.movebank.org/server/api/core/bitstreams/df28a80e-e0c4-49fb-aa87-76ceb2d2b76f/content"

Are these links more or less stable than:

https://datarepository.movebank.org/bitstreams/a6e123b0-7588-40da-8f06-73559bb3ff6b/download
https://datarepository.movebank.org/bitstreams/df28a80e-e0c4-49fb-aa87-76ceb2d2b76f/download

I suspect the only difference is directly or indirectly calling the api

@sarahcd
Copy link
Collaborator

sarahcd commented Jul 17, 2023

@PietrH
Copy link
Member Author

PietrH commented Jul 17, 2023

The 3rd option: https://datarepository.movebank.org/server/api/core/bitstreams/<GUID>/content is implemented in this PR.

At Peter's request, I've split off the changes to the vignettes to a separate PR #51, currently under review.

@peterdesmet
Copy link
Member

The vignette at https://inbo.github.io/movepub/articles/movepub.html is now updated and makes use of the /content links

@PietrH
Copy link
Member Author

PietrH commented Jul 17, 2023

I've merged main into this PR, so changes from #51 are reflected here.

e75991c

@PietrH PietrH marked this pull request as ready for review July 19, 2023 11:28
@PietrH PietrH requested a review from peterdesmet July 19, 2023 11:29
Copy link
Member

@peterdesmet peterdesmet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent, will be very helpful when changing the function.

@PietrH
Copy link
Member Author

PietrH commented Oct 31, 2023

Snapshots seem to have changed, let me know if you need me to look into this.

@PietrH
Copy link
Member Author

PietrH commented Nov 9, 2023

  • Observation: You are reducing coverage/defensiveness by not testing for the complete message
  • Question: Why remove the variant subdirectories?

@peterdesmet
Copy link
Member

Observation: You are reducing coverage/defensiveness by not testing for the complete message

Fair, but the exact message is not too important here (doesn't affect functionality and file paths are tested elsewhere). I mainly wanted to remove snaps I don't consider necessary

Question: Why remove the variant subdirectories?

To make it more welcome to other developers (by using the default):

_snaps/
  write_dwc/
    dwc_occurrence.csv
    eml.xml

Is easier to understand and less overwhelming than:

custom_contact_but_no_orcid/
  write_dwc/
    eml.xml
eml/
  write_dwc/
    eml.xml
occurrence/
  write_dwc/
    occurrence.csv
write_dwc_message/
  write_dwc.md

@peterdesmet peterdesmet merged commit 848bf84 into main Nov 9, 2023
6 checks passed
@PietrH
Copy link
Member Author

PietrH commented Nov 9, 2023

Alright, the two eml.xml files were exactly the same? I think I did it this way to make sure the snapshots were distinct for every test. But it's been a while.

@peterdesmet peterdesmet deleted the 46-write-tests-for-write_dwc branch November 9, 2023 14:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Write tests for write_dwc()
3 participants