Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Submission: ruODK, Client for the ODK Central API #335

Closed
10 of 25 tasks
florianm opened this issue Aug 23, 2019 · 87 comments
Closed
10 of 25 tasks

Submission: ruODK, Client for the ODK Central API #335

florianm opened this issue Aug 23, 2019 · 87 comments

Comments

@florianm
Copy link

florianm commented Aug 23, 2019

Submitting Author: Florian Mayer (@florianm)
Repository: https://github.com/dbca-wa/ruODK
Version submitted: 0.6.6.9025 (Jun 2020) (originally 0.6.0)
Editor: @maelle
Reviewer 1: @karissawhiting
Reviewer 2: @jmt2080ad
Archive: 2020-02-11
Version accepted: TBD


  • Paste the full DESCRIPTION file inside a code block below:
Type: Package
Package: ruODK
Title: An R Client for the ODK Central API
Version: 0.6.6.9025
Authors@R: 
    c(person(given = c("Florian", "W."),
             family = "Mayer",
             role = c("aut", "cre"),
             email = "Florian.Mayer@dbca.wa.gov.au",
             comment = c(ORCID = "0000-0003-4269-4242")),
      person(given = "Maëlle",
             family = "Salmon",
             role = "rev",
             email = "maelle.salmon@yahoo.se",
             comment = c(ORCID = "0000-0002-2815-0399")),
      person(given = "Karissa",
             family = "Whiting",
             role = "rev",
             comment = c(ORCID = "0000-0002-4683-1868")),
      person(given = "Jason",
             family = "Taylor",
             role = "rev"),
      person(given = "DBCA",
             role = c("cph", "fnd")),
      person(given = "NWSFTCP",
             role = "fnd"))
Description: Utilities to access and tidy up data from ODK
    Central's API.  ODK Central is OpenDataKit's clearinghouse for
    digitally captured data <https://docs.opendatakit.org/central-intro/>.
    ODK Central's API is documented at
    <https://odkcentral.docs.apiary.io/>.
License: GPL-3
URL: https://dbca-wa.github.io/ruODK/,
    https://github.com/dbca-wa/ruODK
BugReports: https://github.com/dbca-wa/ruODK/issues
Depends: 
    R (>= 3.4)
Imports: 
    clisymbols (>= 1.2.0),
    crayon (>= 1.3.4),
    dplyr (>= 0.8.5),
    fs (>= 1.4.1),
    glue (>= 1.4.0),
    httr (>= 1.4.1),
    janitor (>= 2.0.1),
    lifecycle (>= 0.1.0),
    lubridate (>= 1.7.8),
    magrittr (>= 1.5),
    purrr (>= 0.3.4),
    readr (>= 1.3.1),
    rlang (>= 0.4.5),
    rlist (>= 0.4.6.1),
    stringr (>= 1.4.0),
    tibble (>= 2.1.3),
    tidyr (>= 1.0.3),
    tidyselect (>= 1.0.0),
    xml2 (>= 1.2.2)
Suggests: 
    covr (>= 3.4.0),
    DT (>= 0.9),
    ggplot2 (>= 3.2.1),
    knitr (>= 1.26),
    leaflet (>= 2.0.3),
    listviewer (>= 3.0.0),
    rmarkdown (>= 1.17),
    roxygen2 (>= 7.1.0),
    testthat (>= 2.3.2),
    usethis (>= 1.6.0),
    vcr (>= 0.5.4)
VignetteBuilder: 
    knitr
RdMacros: 
    lifecycle
Encoding: UTF-8
Language: en_AU
LazyData: true
RoxygenNote: 7.1.0
X-schema.org-applicationCategory: Data Access
X-schema.org-keywords: database, open-data, opendatakit, odk, api, data, dataset

Scope

  • Please indicate which category or categories from our package fit policies this package falls under: (Please check an appropriate box below. If you are unsure, we suggest you make a pre-submission inquiry.):

    • data retrieval
    • data extraction
    • database access
    • data munging
    • data deposition
    • reproducibility
    • geospatial data
    • text analysis
  • Explain how and why the package falls under these categories (briefly, 1-2 sentences):

Data retrieval: ruODK retrieves data from ODK Central, a data clearinghouse containing data which have been digitally captured by the data collection app ODK Collect using Xforms.

As mentioned by @annakrystalli, the below categories are touched by ruODK, but don't apply:

Data extraction or munging: ruODK transforms and sanitises the data from ODK Central from the original format (which parses to nested lists in R) into tibbles.

Reproducibility: ruODK allows to script and repeat the data extraction step - the main use case it is being written for.

Geospatial data: while ODK allows to capture location data (points, lines, polygons), and ruODK extracts these values, ruODK is not primarily a spatial package.

  • Who is the target audience and what are scientific applications of this package?

(Scope and use cases are also mentioned in the README.)

Any organisation collecting data with the OpenDataKit suite will need to extract the data from the data clearinghouses, ODK Aggregate (outgoing) or ODK Central (new). Some may want to analyse the data straight out of ODK Central, some may need to transfer the data into another data warehouse for further post-processing, QA, and integration with other data sources.

For both use cases, ruODK bridges and simplifies the gap between the data sitting in ODK Central, and the data being a tibble in R, ready for further processing.

In a nutshell, ruODK aims to be to ODK Central what ckanr is to CKAN.

As per release notes for ODK Central 0.6, next to the other options, ruODK is now the recommended package to access ODK Central data in R.

  • Are there other R packages that accomplish the same thing? If so, how does yours differ or meet our criteria for best-in-category?

  • If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted.

#328
Thanks to @annakrystalli for feedback on the pre-submission.
Changes since then: added remaining functions.

Technical checks

Confirm each of the following by checking the box. This package:

Publication options

Note: I would like to submit a paper about the package in a few weeks, but haven't got the manuscript ready and approved for publication just yet.

  • Do you intend for this package to go on CRAN? > if you see fit, yes please (0.6.6)
  • Do you wish to automatically submit to the Journal of Open Source Software? If so:
JOSS Options
  • The package has an obvious research application according to JOSS's definition.
    • The package contains a paper.md matching JOSS's requirements with a high-level description in the package root or in inst/.
    • The package is deposited in a long-term repository with the DOI:
    • (Do not submit your package separately to JOSS)
MEE Options
  • The package is novel and will be of interest to the broad readership of the journal.
  • The manuscript describing the package is no longer than 3000 words.
  • You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see MEE's Policy on Publishing Code)
  • (Scope: Do consider MEE's Aims and Scope for your manuscript. We make no guarantee that your manuscript will be within MEE scope.)
  • (Although not required, we strongly recommend having a full manuscript prepared when you submit here.)
  • (Please do not submit your package separately to Methods in Ecology and Evolution)

Code of conduct

Update 31 Aug: pasted new DESCRIPTION with version 0.6.1 (addressing all comments below).
Update 14 Sept: 0.6.3 after tidyr 1.0.0 moves from dev to stable dependency
Update 18 Sept: 0.6.4 uses new example data for tests and vignettes
Update 23 Sept: 0.6.5 uses lifecycle badges
Update 25 Sept: 0.6.6 odata_submission_get() parses data, dates, and downloads attachments
Update 10-18 Oct: met ODK developers and users; met @jmt2080ad, met sckott (Scott Chamberlain), held four ruODK workshops (Perth, Seattle, Portland), got constructive feedback
Update 27 Nov: 0.6.6.900* addressing reviewer comments, preparing for ODK Central 0.7 release
Update 12 Feb 2020: reviewer comments addressed, some improvements added, ready for reviewer response
Update 15 Jun 2020: reviewer response addressed, parsers added for geotrace and geoshape

@maelle
Copy link
Member

maelle commented Aug 27, 2019

Editor checks:

  • Fit: The package meets criteria for fit and overlap
  • Automated tests: Package has a testing suite and is tested via Travis-CI or another CI service.
  • License: The package has a CRAN or OSI accepted license
  • Repository: The repository link resolves correctly
  • Archive (JOSS only, may be post-review): The repository DOI resolves correctly
  • Version (JOSS only, may be post-review): Does the release version given match the GitHub release (v1.0.0)?

Editor comments

👋 @florianm! Thanks a lot for your submission! 😁 I have a few questions/asks I'd like to see tackled before I start looking for reviewers. I am also waiting to run the local checks because I don't have credentials yet, see my forum post.

  • I see a few open issues in your repo. Could you clarify here and in their description whether you're seeking advice on the functionalities implementation/usefulness? Why not solve some of them before the reviews of the package? It'd be better for the functionality to be implemented so the reviewers can review them too. It is fine if we put this issue on hold for a bit in order for you to have time to work on the enhancements if needed, and you could ask for help in rOpenSci channels.

  • Reg "Note: I would like to submit a paper about the package in a few weeks, but haven't got the manuscript ready and approved for publication just yet." This is also a good reason to put the issue on hold since the reviewers would read the short JOSS paper. What do you mean by approved?

  • I do not understand "Especially in these trying times, it is important to ask: “ruODK?”" and " u r ODK!" in the README, could you add footnotes or so there? Unless it is very obvious 🙈 (play on words with OK? to my defence and the defence of my suggestion, my not being a native speaker might have made this less clear 😂 )

  • Likewise I do not understand the use of %<% when describing use cases.

  • There is a link missing " [ODK forum]". How should the future reviewers best ask for credentials if they're new to the service? I wasn't too sure about my own post.

  • The current error messages are not informative in the absence of credentials, i.e. I get Error in curl::curl_fetch_memory(url, handle = handle) : <url> malformed if I run project_list() now without credentials. If there's no system variables set and no credentials passed either, the functions should fail with an informative error message.

  • Do not use the Author and Maintainer fields in DESCRIPTION, they are automatically generated from Authors@R.

  • In the example in the README you source .Rprofile. This should not be done, in your code in at least one function I see you use Sys.getenv("NAMEOFTHEVARIABLE") to access the credentials which is the best practice.

  • The reference section of the pkgdwon website could use some grouping. (ignore the CI advice, outdated, cf https://ropensci.org/technotes/2019/06/07/ropensci-docs/). I see you used "usethis goodies", 1) CONTRIBUTING.md still mentions the tidyverse team at least once, 2) For info here is our guidance for accepted packages.

  • There's info about contributing both in CONTRIBUTING.md and in the README (release steps), could you consolidate those?

The following items is rather a suggestion.

  • Is there any diagram of ODK services/a workflow with ODK that you could use/adapt to explain visually where your R package fits, as a complement of the written part of the README?

Reviewers:
Due date:

@florianm

This comment has been minimized.

@maelle

This comment has been minimized.

@florianm

This comment has been minimized.

@maelle

This comment has been minimized.

@maelle

This comment has been minimized.

@maelle

This comment has been minimized.

@florianm

This comment has been minimized.

@florianm

This comment has been minimized.

@maelle

This comment has been minimized.

@florianm

This comment has been minimized.

@maelle

This comment has been minimized.

@florianm

This comment has been minimized.

@maelle

This comment has been minimized.

@florianm

This comment has been minimized.

@maelle

This comment has been minimized.

@florianm

This comment has been minimized.

@maelle

This comment has been minimized.

@florianm

This comment has been minimized.

@maelle

This comment has been minimized.

@florianm
Copy link
Author

florianm commented Feb 21, 2020

For ru_datetime I think we have a workable solution now - could I solicit your opinions?
See ropensci/ruODK#54 for details - after a complete refactor of submission parsing, ru_datetime is only called internally, and superceded by new helpers handle_ru_{datetimes, attachments, geopoints} as shown in vignettes. These are driven by the form schema definition, so they will never mistake cenTIMEmeter for a time field.

If you suggest to go even stricter, we could now drop the default field name component col_contains="date" from ru_datetime without complicating everyday usage.

@jmt2080ad let me know whether that addresses your suggestion suitably, and @karissawhiting I'm ready for your feedback.

@jmt2080ad
Copy link

jmt2080ad commented Mar 7, 2020

Ok, this is looking really good. The handlers are using the form schema, which is great.

Concerning the handler functions and the functions they handle

I think your idea of dropping the default argument for col_contains makes sense. Also, do you see any reason for ru_datetime or isodt_to_local to be exported now that they are only used internally?

I see that there are a few places in the documentation where ru_datetime is mentioned. In the details for form_schema_parse:

form_schema_parse recursively unpacks the form and extracts the name and type of each field. This information then can be used to inform the user which columns require ru_datetime, attachment_get, or attachment_link, respectively

And I see that ru_datatime is used in the example for attachment_link.

If you decide not to export the functions that you are using only internally (totally your choice), then I think you should edit the documentation so that it does not point users to them. If you do export them, and use cases are not demonstrated in the vignettes, then they need to be well documented. Just take an extra moment to make sure that you are saying all you want to in their help.

These same points, exporting internal functions and mentioning internal functions in the documentation, should be considered for the other functions you wrote handlers for: split_geopoint and attachment_get.

I also noticed that attachment_link appears to be a place holder for functionality you intend to bring up in the future. It isn't demonstrated in the vignettes, and it has a TODO note in it that indicates that you will be improving this function to use the form schema to find attachment fields, like you did for ru_datatime. I am not sure the best approach here, but I think it's worth considering putting this in a development branch for now if it is not fully implemented. I might be wrong about your intentions here, so pardon me if that is the case. Would it make sense to do what this function does when a user calls handle_ru_attachments?

Other notes:

Here are some outdated function references I found in the vignettes:

  • Line 171 of odata-api.Rmd references the superseded odata_submission_parse.
  • Lines 193 and 197 of odata-api.Rmd references the superseded parse_datetime.
  • The section "Spatial formats - how granular..." that starts on line 319 of odata-api.Rmd has a few references to parse_submissions, which appears to have been superseded. I think this section needs to be updated to reflect the modifications to the workflow you have made.
  • Line 337 of restful-api.Rmd references the superseded odata_submission_parse.

@noamross
Copy link
Contributor

⚠️⚠️⚠️⚠️⚠️

In the interest of reducing load on reviewers and editors as we manage the COVID-19 crisis, rOpenSci is temporarily pausing new submissions for software peer review for 30 days (and possibly longer). Please check back here again after 17 April for updates.

In this period new submissions will not be handled, nor new reviewers assigned. Reviews and responses to reviews will be handled on a 'best effort' basis, but no follow-up reminders will be sent.

Other rOpenSci community activities continue. We express our continued great appreciation for the work of our authors and reviewers. Stay healthy and take care of one other.

The rOpenSci Editorial Board

⚠️⚠️⚠️⚠️⚠️

florianm pushed a commit to ropensci/ruODK that referenced this issue Apr 16, 2020
* Source ropensci/software-review#335 (comment)
* odata_submission_parse > odata_submission_rectangle
* link vignette odata-api from restful-api
* explain how to disable form group prefixing of field names in odata_submission_rectangle
* Fixes #58
@annakrystalli
Copy link
Contributor

⚠️⚠️⚠️⚠️⚠️
In the interest of reducing load on reviewers and editors as we manage the
COVID-19 crisis, rOpenSci new submissions for software peer review are paused.

In this period new submissions will not be handled, nor new reviewers assigned.
Reviews and responses to reviews will be handled on a 'best effort' basis, but
no follow-up reminders will be sent.
Other rOpenSci community activities continue.

Please check back here again after 25 May when we will be announcing plans to slowly start back up.

We express our continued great
appreciation for the work of our authors and reviewers. Stay healthy and take
care of one other.

The rOpenSci Editorial Board
⚠️⚠️⚠️⚠️⚠️

@maelle
Copy link
Member

maelle commented May 25, 2020

We're back!

@florianm can you please provide an update here on work needed on your side / on what the reviewers should have a look at for being able to check "The author has responded to my review and made changes to my satisfaction. I recommend approving this package."? Thank you!

@florianm
Copy link
Author

Great to hear!
I've split the reviewers' suggestions into issues and have addressed them all. So at a top level, reviewers could revisit the ruODK issues linked here below their comments and should find there good summaries about my rationale.

A few new tests use a new form (spatial types) but use packaged data, so the additional env var is only needed to rebuild the packaged data.

Should I write a comment here to summarise how I've addressed all reviewer suggestions, or are the cross-linked issues sufficient?

@maelle
Copy link
Member

maelle commented May 25, 2020

Thanks! 🐢 ✨

A summary here would be great once you're done with changes! 🙏

@florianm
Copy link
Author

florianm commented Jun 15, 2020

Hi all,

here's a summary of the second round (Jason's comments) of reviewer comments. I've split Jason's comments into the three issues linked under his comments, which I'll also summarise here.

Internal helpers

Details

TLDR Jason asked to re-think which functions to export, and which to keep purely internal. I have changed ruODK from "export all the functions" to export only the top level (odata_submission_get) and the next level of helpers (handle_ru_*). This allows users to choose the level of automation:

  • First level: parse all the things in one go - the default: odata_submission_get()
  • Second level: parse step by step - explained in vignettes under "DIY rectangling"
    • Several functions which are useful for step by step data rectangling have been moved to the reference family "Utilities".
    • handle_ru_{datetimes, attachments, geopoints, geotraces, geoshapes}:
    • split_geo{point, trace, shape} have examples
      to show order of lon / lat / alt (coordinate order is notorious for swap errors vs. geojson)
    • attachment_get is mentioned in vignette odata-api > data rectangling
  • The third level helpers used by the exported handle_ru_* handlers then are all internal, and not mentioned in any of man / vignette / readme:
    • isodt_to_local
    • strip_uuid
    • prepend_uuid
    • attachment_url
    • unnest_all

Update all documentation

Details

TLDR Jason found some outdated references.

All documentation should now be aligned and tell the same story.
Some new functionality brought another round of sweeping changes:

  • split_geo{point, trace, shape}: ruODK can now parse all spatial types,
    GeoJSON and WKT alike.
  • The tests use canned data of a new spatial test form
  • Rebuilding the canned data requires a new env var for the spatial test form. CONTRIBUTING.md has been updated to include the new test form.

Re-think attachment_link

Details

TLDR Jason asked whether attachment_link should be internal or exported.

  • Listed under "RESTful API" in man
  • Demonstrated in vignette restful-api - we're looking at the "export everything including attachments as one giant ZIP file" use case here. The difference to handle_ru_attachments is that the attachments are already downloaded and only need to be linked to the relative path of the downloaded and freshly extracted attachments.

I've just refreshed the GitHub Actions build matrix, adding more environments (11 in total). Plus TravisCI (Ubuntu 16.04 with R devel/release), plus Appveyor (Windows).

@maelle does the above address the reviewer comments suitably?

@maelle
Copy link
Member

maelle commented Jun 24, 2020

Thanks @florianm!

@karissawhiting @jmt2080ad are you both now happy with @florianm's response and if so could you please check the box "The author has responded to my review and made changes to my satisfaction. I recommend approving this package." in your review? If not please comment here. It's also fine to write you don't have time to have another look. Thanks!

@florianm
Copy link
Author

Meanwhile, GHA has a heisen-segfault in ubuntu-latest / R devel and R release. I'm monitoring this over the next few days hoping it will resolve, so the red GHA badge is deliberate.

florianm pushed a commit to florianm/central-frontend that referenced this issue Jul 8, 2020
Closes ropensci/ruODK#80
HT @dmenne for actually reading the fine manual.

* The vignettes of ruODK have been renamed `{odata, restful}-api` as suggested by my reviewers over at ropensci/software-review#335.
* The contributing guide is hosted on gh-pages.
@maelle
Copy link
Member

maelle commented Jul 13, 2020

👋 here! I see @jmt2080ad checked "The author has responded to my review and made changes to my satisfaction. I recommend approving this package."

@karissawhiting do you have further comments for @florianm? If not could you please update this thread or check the box in your review? Thank you!

@maelle
Copy link
Member

maelle commented Jul 16, 2020

Thanks @karissawhiting! I'll start the last checks now!

@maelle
Copy link
Member

maelle commented Jul 16, 2020

Approved! Thanks @florianm for submitting and @karissawhiting @jmt2080ad for your reviews! 🐢 ✨

To-dos:

  • Transfer the repo to rOpenSci's "ropensci" GitHub organization under "Settings" in your repo. I have invited you to a team that should allow you to do so. You'll be made admin once you do.
  • Fix all links to the GitHub repo to point to the repo under the ropensci organization.
  • If you already had a pkgdown website and are ok relying only on rOpenSci central docs building and branding,
    • deactivate the automatic deployment you might have set up
    • remove styling tweaks from your pkgdown config but keep that config file
    • replace the whole current pkgdown website with a redirecting page
    • replace your package docs URL with https://docs.ropensci.org/package_name
    • In addition, in your DESCRIPTION file, include the docs link in the URL field alongside the link to the GitHub repository, e.g.: URL: https://docs.ropensci.org/foobar (website) https://github.com/ropensci/foobar
  • Fix any links in badges for CI and coverage to point to the ropensci URL. We no longer transfer Appveyor projects to ropensci Appveyor account so after transfer of your repo to rOpenSci's "ropensci" GitHub organization the badge should be [![AppVeyor Build Status](https://ci.appveyor.com/api/projects/status/github/ropensci/pkgname?branch=master&svg=true)](https://ci.appveyor.com/project/individualaccount/pkgname).
  • We're starting to roll out software metadata files to all ropensci packages via the Codemeta initiative, see https://github.com/ropensci/codemetar/#codemetar for how to include it in your package, after installing the package - should be easy as running codemetar::write_codemeta() in the root of your package.

Should you want to acknowledge your reviewers in your package DESCRIPTION, you can do so by making them "rev"-type contributors in the Authors@R field (with their consent). More info on this.

Welcome aboard! We'd love to host a post about your package - either a short introduction to it with an example for a technical audience or a longer post with some narrative about its development or something you learned, and an example of its use for a broader readership. If you are interested, consult the blog guide, and tag @stefaniebutland in your reply. She will get in touch about timing and can answer any questions.

We've put together an online book with our best practice and tips, this chapter starts the 3d section that's about guidance for after onboarding. Please tell us what could be improved, the corresponding repo is here.

@maelle
Copy link
Member

maelle commented Jul 16, 2020

@jmt2080ad Could you please add a rough estimate of the time you spent reviewing in your review near Estimated hours spent reviewing:? Thank you.

florianm pushed a commit to ropensci/ruODK that referenced this issue Jul 17, 2020
* Rename links from dbca-wa to ropensci
* CI and docs likely broken for now and will be fixed next
* TODO: docs swapperoo to docs.ropensci.org
* Following instructions at ropensci/software-review#335 (comment)
@florianm
Copy link
Author

florianm commented Jul 21, 2020

@maelle I've completed the TODO list and followed the "life after onboarding" chapter. GH actions are passing, codemeta and all docs are updated, and I've minted a DOI. Did I do this roughly in the correct order?

All, would you mind getting mentioned in the Zenodo record? Edit: this is a question to Jason and Karissa, not editor Maelle ;-)

@maelle
Copy link
Member

maelle commented Jul 21, 2020

I think so. What will you use the DOI for in the near future btw? codemetar doesn't use those yet.

Don't list me as an author, as written in the dev guide, "Please do not list editors as contributors. Your participation in and contribution to rOpenSci is thanks enough!" 😸

@maelle maelle closed this as completed Jul 21, 2020
@florianm
Copy link
Author

I'll use the DOI to cite ruODK as a publication in our annual report. Having peer reviewed software is a first at DBCA, so a DOI helps here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants