Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Submission: popler R package #254

Closed
4 tasks
AldoCompagnoni opened this issue Oct 1, 2018 · 26 comments
Closed
4 tasks

Submission: popler R package #254

AldoCompagnoni opened this issue Oct 1, 2018 · 26 comments

Comments

@AldoCompagnoni
Copy link

AldoCompagnoni commented Oct 1, 2018

Submitting Author: Aldo Compagnoni (@gAldoCompagnoni)
Repository: https://github.com/AldoCompagnoni/popler/issues
Version submitted: 0.0.0.9001
Editor: @noamross
Reviewer 1: @cgries
Reviewer 2: @bpbond
Archive: TBD
Version accepted: TBD


Summary

  • What does this package do? (explain in 50 words or less):

popler provides a way to browse and download population and community time-series data associated with the popler online database, which is currently hosted on a public AWS server. This database formats about 300 time-series datasets from the USA Long-Term Ecological Research Netowork (LTER) using a common structure.

  • Paste the full DESCRIPTION file inside a code block below:
Package: popler
Title: Popler R Package
Version: 0.0.0.9001
Authors@R: c(
  person("Compagnoni", "Aldo", email = "aldo.compagnoni@aggiemail.usu.edu",
  role = c("cre","aut")),
  person("Bibian", "Andrew", email = "ajbibian@rice.edu", role= c("aut")),
  person("Ochocki", "Brad", email = "brad.ochocki@rice.edu", role= c("aut")),
  person("Levin", "Sam", email = "sam.levin@idiv.de", role = c("aut")),
  person("Miller", "Tom", email = "tom.miller@rice.edu", role= c("aut")))
Description: Browse and query the popler database.
URL: https://github.com/AldoCompagnoni/popler
BugReports: https://github.com/AldoCompagnoni/popler/issues
Depends: R (>= 3.0.0)
Imports:
    dbplyr (>= 1.1.0), 
    dplyr (>= 0.5.0), 
    ggplot2 (>= 3.0.0),
    grid (>= 3.3.0),
    lazyeval (>= 0.2.0),
    lubridate (>= 1.7.4),
    magrittr (>= 1.5.0),
    purrr (>= 0.2.5),
    rlang (>= 0.2.1),
    rmarkdown (>= 1.2),
    RPostgreSQL (>= 0.4.1),
    stats (>= 3.3.0),
    stringr (>= 1.2.0),
    tidyr (>= 0.6.1),  
    utils (>= 3.3.0)
License: MIT + file LICENSE
Encoding: UTF-8
Suggests: 
    fs (>= 1.2.2),
    knitr (>= 1.15.1),
    maps (>= 3.3.0),
    mapproj (>= 1.2.6),
    testthat(>= 2.0.0),
    covr (>= 3.1.0)
LazyData: true
RoxygenNote: 6.1.0
VignetteBuilder: knitr
  • URL for the package (the development repository, not a stylized html page):

https://github.com/AldoCompagnoni/popler

  • Please indicate which category or categories from our package fit policies this package falls under *and why(? (e.g., data retrieval, reproducibility. If you are unsure, we suggest you make a pre-submission inquiry.):

popler is a data retrieval package. It allows to identify and download population or community time-series data with specified characteristics (e.g. temporal and spatial replication, taxonomic coverage, ect.).

  •   Who is the target audience and what are scientific applications of this package?  

Population and community ecologists.

Yes, the rdataretriever, which similarly to popler facilitates discovering and downloading ecological data. popler differs from the rdataretriever in its focus on population and community level time-series. First, popler provides a database structure designed to accommodate all population and community time-series data (see database description in the manuscript, https://github.com/texmiller/popler-ms/blob/master/popler_ms.pdf). Its future growth could make it the to-go database for synthesis research on ecological time-series data. Second, popler focuses specifically on data from the USA Long-Term Ecological Research Netowork (LTER), particularly the data provided in the LTER data portal (https://portal.lternet.edu). The more general target of the rdataretriever does not make it a comprehensive source of LTER time-series ecological data (a handful of LTER time series datasets were provided in fall 2016).

  •   If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted.

N/A

Requirements

Confirm each of the following by checking the box. This package:

  • [x ] does not violate the Terms of Service of any service it interacts with.
  • [x ] has a CRAN and OSI accepted license.
  • [x ] contains a README with instructions for installing the development version.
  • [x ] includes documentation with examples for all functions.
  • [x ] contains a vignette with examples of its essential functions and uses.
  • [x ] has a test suite.
  • [x ] has continuous integration, including reporting of test coverage, using services such as Travis CI, Coveralls and/or CodeCov.
  • [x ] I agree to abide by ROpenSci's Code of Conduct during the review process and in maintaining my package should it be accepted.

Publication options

  • [x ] Do you intend for this package to go on CRAN?
  • Do you wish to automatically submit to the Journal of Open Source Software? If so:
    • The package has an obvious research application according to JOSS's definition.
    • The package contains a paper.md matching JOSS's requirements with a high-level description in the package root or in inst/.
    • The package is deposited in a long-term repository with the DOI:
    • (Do not submit your package separately to JOSS)
  • [x ] Do you wish to submit an Applications Article about your package to Methods in Ecology and Evolution? If so:
    • [x ] The package is novel and will be of interest to the broad readership of the journal.
    • [x ] The manuscript describing the package is no longer than 3000 words.
    • [x ] You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see MEE's Policy on Publishing Code)
    • (Scope: Do consider MEE's Aims and Scope for your manuscript. We make no guarantee that your manuscript will be within MEE scope.)
    • (Although not required, we strongly recommend having a full manuscript prepared when you submit here.)
    • (Please do not submit your package separately to Methods in Ecology and Evolution)

Detail

  • [x ] Does R CMD check (or devtools::check()) succeed? Paste and describe any errors or warnings:

  • [x ] Does the package conform to rOpenSci packaging guidelines? Please describe any exceptions:

  • If this is a resubmission following rejection, please explain the change in circumstances:

N/A

  • If possible, please provide recommendations of reviewers - those with experience with similar packages and/or likely users of your package - and their GitHub user names:

N/A

@noamross noamross self-assigned this Oct 1, 2018
@cboettig cboettig changed the title Submission: popler R pacakge Submission: popler R package Oct 11, 2018
@noamross noamross removed the holding label Jan 23, 2019
@noamross
Copy link
Contributor

noamross commented Jan 23, 2019

Editor checks:

  • Fit: The package meets criteria for fit and overlap
  • Automated tests: Package has a testing suite and is tested via Travis-CI or another CI service.
  • License: The package has a CRAN or OSI accepted license
  • Repository: The repository link resolves correctly
  • Archive (JOSS only, may be post-review): The repository DOI resolves correctly
  • Version (JOSS only, may be post-review): Does the release version given match the GitHub release (v1.0.0)?

Editor comments

Thanks for your (restarted) submission!

Here is goodpractice::gp() output. I see a few NOTES that should be fixed.

── GP popler ───────────────────────────────

It is good practice to

  ✖ write unit tests for all functions, and all package code
    in general. 19% of code lines are covered by test cases.

    R/browse.R:58:NA
    R/browse.R:59:NA
    R/browse.R:60:NA
    R/browse.R:61:NA
    R/browse.R:62:NA
    ... and 1062 more lines

  ✖ not import packages as a whole, as this can cause name
    clashes between the imported packages. Instead, import only the
    specific functions you need.
  ✖ fix this R CMD check NOTE: Namespace in Imports field not
    imported from: ‘RPostgreSQL’ All declared Imports should be used.
  ✖ fix this R CMD check NOTE: pplr_get_data: no visible
    binding for global variable ‘.’
    (/private/var/folders/82/07j7dctj31lg1tw6w1c_fdw40000gn/T/RtmplJK6qV/file104d3edc4f8e/popler.Rcheck/00_pkg_src/popler/R/get_data.R:87-90)
    pplr_get_data: no visible binding for global variable ‘year’
    (/private/var/folders/82/07j7dctj31lg1tw6w1c_fdw40000gn/T/RtmplJK6qV/file104d3edc4f8e/popler.Rcheck/00_pkg_src/popler/R/get_data.R:104-107)
    pplr_get_data: no visible binding for global variable ‘month’
    (/private/var/folders/82/07j7dctj31lg1tw6w1c_fdw40000gn/T/RtmplJK6qV/file104d3edc4f8e/popler.Rcheck/00_pkg_src/popler/R/get_data.R:104-107)
    pplr_get_data: no visible binding for global variable ‘day’
    (/private/var/folders/82/07j7dctj31lg1tw6w1c_fdw40000gn/T/RtmplJK6qV/file104d3edc4f8e/popler.Rcheck/00_pkg_src/popler/R/get_data.R:104-107)
    pplr_query: no visible binding for global variable ‘.’
    (/private/var/folders/82/07j7dctj31lg1tw6w1c_fdw40000gn/T/RtmplJK6qV/file104d3edc4f8e/popler.Rcheck/00_pkg_src/popler/R/get_data.R:257-261)
    pplr_query: no visible global function definition for
    ‘txtProgressBar’
    (/private/var/folders/82/07j7dctj31lg1tw6w1c_fdw40000gn/T/RtmplJK6qV/file104d3edc4f8e/popler.Rcheck/00_pkg_src/popler/R/get_data.R:269)
    pplr_query : downld_dataset: no visible global function definition
    for ‘setTxtProgressBar’
    (/private/var/folders/82/07j7dctj31lg1tw6w1c_fdw40000gn/T/RtmplJK6qV/file104d3edc4f8e/popler.Rcheck/00_pkg_src/popler/R/get_data.R:273)
    pplr_query: no visible binding for global variable ‘.’
    (/private/var/folders/82/07j7dctj31lg1tw6w1c_fdw40000gn/T/RtmplJK6qV/file104d3edc4f8e/popler.Rcheck/00_pkg_src/popler/R/get_data.R:285-287)
    pplr_summary_table_update: no visible global function definition
    for ‘txtProgressBar’
    (/private/var/folders/82/07j7dctj31lg1tw6w1c_fdw40000gn/T/RtmplJK6qV/file104d3edc4f8e/popler.Rcheck/00_pkg_src/popler/R/util.R:322)
    pplr_summary_table_update : downld_summary: no visible global
    function definition for ‘setTxtProgressBar’
    (/private/var/folders/82/07j7dctj31lg1tw6w1c_fdw40000gn/T/RtmplJK6qV/file104d3edc4f8e/popler.Rcheck/00_pkg_src/popler/R/util.R:326)
    Undefined global functions or variables: . day month
    setTxtProgressBar txtProgressBar year Consider adding
    importFrom("utils", "setTxtProgressBar", "txtProgressBar") to your
    NAMESPACE file.
  ✖ avoid 'T' and 'F', as they are just variables which are
    set to the logicals 'TRUE' and 'FALSE' by default, but are not
    reserved words and hence can be overwritten by the user.  Hence,
    one should always use 'TRUE' and 'FALSE' for the logicals.

    R/util.R:NA:NA

These notes are not major and can be fixed while I seek reviewers. A see a few are due to using non-standard evaluation inside functions. These should be able to be dealt with by quoting arguments and avoiding the . construction, instead using dplyr::pull

Test coverage is actually 76%, as reported by CodeCov (the reported number above is from run --as-cran). This is OK. I will instruct reviewers to examine coverage reports and check that uncovered code sufficiently straightforward to not need testing.

Please also run spelling::spell_check_package() and look for true typos.

@noamross
Copy link
Contributor

Thank you Corinna Gries (@cgries) and Ben Bond-Lamberty (@bpbond) for agreeing to review!
Review due: 2018-02-18

@cgries
Copy link

cgries commented Feb 13, 2019

Package Review

  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

  • A statement of need clearly stating problems the software is designed to solve and its target audience in README
  • Installation instructions: for the development version of package and any non-standard dependencies in README
  • Vignette(s) demonstrating major functionality that runs successfully locally
  • Function Documentation: for all exported functions in R help
  • Examples for all exported functions in R Help that run successfully locally
  • Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).
For packages co-submitting to JOSS
  • The package has an obvious research application according to JOSS's definition
    The package contains a paper.md matching JOSS's requirements with:

  • A short summary describing the high-level functionality of the software

  • Authors: A list of authors with their affiliations

  • A statement of need clearly stating problems the software is designed to solve and its target audience.

  • References: with DOIs for all those that have one (e.g. papers, datasets, software).

Functionality

  • Installation: Installation succeeds as documented.
  • Functionality: Any functional claims of the software been confirmed.
  • Performance: Any performance claims of the software been confirmed.
    not sure there are any claims and downloads are very slow
  • Automated tests: Unit tests cover essential functions of the package
    and a reasonable range of inputs and conditions. All tests pass on the local machine.
  • Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

Final approval (post-review)

  • The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 5


Review Comments

This R package provides access to a great data resource for population ecologists. The authors compiled a large number of datasets into a standardized data format with standardized metadata on which the user can query. This effort will open the door to important meta-analyses of data from the NSF Long-term Ecological Research Network.

README:
Although there is one sentence about what this package is doing, there are no further information about the actual database or the research applications this is built for. If this is the first point of entry for someone not familiar with the Popler project it would be very hard to judge what this package does.
There are no links to the vignettes. However, important information on the actual data is only available in the vignettes.
The vignettes are well written and help understand the intent and functionality of the functions and structure of the databsae, the latter I was missing in the online information available before download.
Somewhere I came across a publication on Popler, but I can't find the link anymore. Would be great to have that in the README.

To really test out the capabilities, I would need to know much more about the database itself. What do all those metadata mean that I could use to subset the data? E.g., what are structure_types 'types of indidivual structure' 1 - 880?

the pplr_report_metadata() and pplr_browse(report=TRUE) functions are pretty slick and produce a very nice display of metadata

Testing examples:

SEV_studies <- pplr_get_data(lterid == 'SEV')
Error: Bad Request (HTTP 400)

BNZ <- pplr_get_data(lterid == 'BNZ')
Error: Bad Request (HTTP 400)

when it works data download is really slow:

poa_data <- pplr_get_data(poa_metadata)
took almost 10 minutes to complete on a pretty fast university connection

penguin_raw_data <- pplr_get_data(penguins, cov_unpack = TRUE)
took over 15 minutes

Antarctica <- pplr_get_data(lterid == 'PAL')
took about 15 minutes

just a curious observation:
test_1 <- pplr_browse(proj_metadata_key == 133)
test_1_raw_data <- pplr_get_data(test_1)
produces a dataset where the author column is completley filled with 'Sherry Johnson'

while pplr_citation(test_1_raw_data) produces this citation
$bibliography
Farrand M (2014). “Aquatic insect sampling in Lookout Creek at the H.J. Andrews
Experimental Forest, 2001. Environmental Data Initiative.” Andrew Forest LTER, <URL:
http://dx.doi.org/10.6073/pasta/babf56fed5882756432f28982312a551>.

devtools::test()

Testing popler
ok | OK F W S | Context
ok | 4 | browse(): Informational messages
ok | 3 | browse(): Select by criteria
ok | 13 | browse(): taxa_nest [2.4 s]
ok | 7 | pplr_browse(): error functions
ok | 13 | browse() function [8.4 s]
x | 0 1 | cov_unpack [141.9 s]
ok | 12 | dictionary() function [4.5 s]
x | 0 1 | dplyr verbs [472.2 s]
ok | 12 | lter_map functions [29.3 s]
ok | 5 | popler_citation [6.8 s]
x | 0 1 | test report_dictionary [131.7 s]
x | 0 1 | pplr_site_rep [13.3 s]

== Results
OK: 69
Failed: 4
Warnings: 0
Skipped: 0
Warning messages:
1: object ‘jsonlite’ is not exported by 'namespace:crul'
2: object ‘tibble’ is not exported by 'namespace:crul'

devtools::check()

failed for me, but that may be my system

goodpractice::gp()

GP popler

It is good practice to

avoid 'T' and 'F', as they are just variables which are set to the logicals
'TRUE' and 'FALSE' by default, but are not reserved words and hence can be overwritten
by the user. Hence, one should always use 'TRUE' and 'FALSE' for the logicals.

R/util.R:NA:NA

@bpbond
Copy link

bpbond commented Feb 15, 2019

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

  • A statement of need clearly stating problems the software is designed to solve and its target audience in README
  • Installation instructions: for the development version of package and any non-standard dependencies in README
  • Vignette(s) demonstrating major functionality that runs successfully locally
  • Function Documentation: for all exported functions in R help
  • Examples for all exported functions in R Help that run successfully locally
  • Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).
For packages co-submitting to JOSS

The package contains a paper.md matching JOSS's requirements with:

  • A short summary describing the high-level functionality of the software
  • Authors: A list of authors with their affiliations
  • A statement of need clearly stating problems the software is designed to solve and its target audience.
  • References: with DOIs for all those that have one (e.g. papers, datasets, software).

Functionality

  • Installation: Installation succeeds as documented.
  • Functionality: Any functional claims of the software been confirmed.
  • Performance: Any performance claims of the software been confirmed.
  • Automated tests: Unit tests cover essential functions of the package
    and a reasonable range of inputs and conditions. All tests pass on the local machine.
  • Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

Final approval (post-review)

  • The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing:
3

Review Comments

Thanks for the effort you put into this package @AldoCompagnoni – it has a lot of potential and many parts are very well done.

Preliminaries: I agree with (and so won't repeat in detail) @cgries 's comments about the lack of easy documentation about the popler database itself, research applications, and other background information; the extremely slow downloads; and generally good vignettes. Here are some other
issues that jumped out at me while evaluating:

Dependencies

popler has a lot of dependencies. In particular, I see it imports both plyr and dplyr
...is this necessary, in particular the pretty old and (I think) no-longer-updated plyr
that is only used (as far as I see) in one place?

Duplication of functionality

I personally like the philosophy that packages should be as simple and focused as possible.
From that point of view, I don't see that

pplr_browse(duration_years > 20) 

gives any advantage over

pplr_browse() %>% filter(duration_years > 20) 

In other words, why duplicate the dplyr::filter functionality in pplr_browse()? (A similar argument applies to its vars parameter.) This is particularly true since you're already providing dplyr method overrides for some of the package output classes.

Documentation

...is OK, but many minor grammatical mistakes and incomplete things. For example, the
documentation for pplr_browse() says: "keyword A string that selects". This is not very informative. In this case, at least note that keyword is a regular expression that is passed to grepl and used to search across all fields.

Code style

The popler code is generally clear, consistent, and easy to follow. In-code comments are minimal but usually useful when present.

Testing

The tests take a long time to run, and when NOT_CRAN is TRUE, a super long time. (So long that I never could look closely at the coverage reports as @noamross requested above.) I assume this is because of the slow downloading issues. It would be great if most tests used network stubs so that tests run quickly–this will make them much more useful!

Report capability

The report generation of pplr_browse() is neat and potentially very useful.
But the console output from the example is kind of scary:

> pplr_browse(report=TRUE)
NOTE! Studies 297 are linked to a total of58 URLsNOTE! Studies 298 are linked to a total of58 URLsNOTE! Studies 299 are linked to a total of58 URLsNOTE! Studies 300 are linked to a total of58 URLsNOTE! Studies 313 are linked to a total of58 URLs
NOTE! Studies 297 are linked to a total of58 URLsNOTE! Studies 298 are linked to a total of58 URLsNOTE! Studies 299 are linked to a total of58 URLsNOTE! Studies 300 are linked to a total of58 URLsNOTE! Studies 313 are linked to a total of58 URLs
...(etc)...
Warning message:
In withCallingHandlers(.External2(C_parseRd, tcon, srcfile, "UTF-8",  :
  <connection>:4: unexpected END_OF_INPUT '.
'

Can this be improved?

In summary, I greatly appreciate the goals of popler. I do think it needs better documentation about what the underlying database is; to be simplified in terms of dependencies and functionality; network stubs for the tests; and improved documentation in many places.

@AldoCompagnoni
Copy link
Author

Dear all,
Thank you for the very useful feedback!

I have identified several points on which to focus our work in the next few weeks:

  1. There needs to be the ABC on the database structure, probably in a separate Vignette, or in an existing Vignette.
    We're already working on this, and we estimate uploading the schema and description by March 10th.
    To address a comment by Corinna, if you are curious to know how the database works, a description of the package is already available online as part of the to-be-submitted article connected to popler:
    https://github.com/texmiller/popler-ms/blob/master/popler_ms.pdf

  2. Downloads are too long.
    This is a problem caused by the combination of API working on an Amazon instance with very little Ram (1GB). There are two ways to address this issue:
    First, upgrading to an instance with more RAM will speed things up. Unfortunately, however, I do not have an estimate of how much faster the queries would become.
    Second, we could change the main query we use to fetch data on popler. During development of the API, I developed a query using the SQL clause "WITH" which sped thing up by a factor 10. However, Scott Chamberlain - and anybody else I know - do not know how to implement this SQL query in Ruby, which is the language in which the API is programmed.
    Of course I lean towards solution two, and I am actively asking everybody I know to solve this issue. Little success so far, but a couple of weeks of time will tell!

  3. Documentation files have grammatical mistakes/imprecise language:
    This will be a simple fix.

  4. We need to work on how to handle URLs/citations:
    I realize that we need a better way to handle the multiple-URL datasets provided by the Georgia Coastal Ecosystems LTER. This came out prominently in the comment by Bob on "Report capability". So far, I might remove the warnings when using popler_browse(report=T), and I might prompt the user for a "yes" or "no" when using pplr_metadata_url()
    Also, there seem to be a few apparent mistakes in citations - thank you for pointing those out. This is another simple fix.

  5. Finally, Bob suggested to substitute

pplr_browse(duration_years > 20)
with
pplr_browse() %>% filter(duration_years > 20)

The reason that we use pplr_browse(duration_years > 20), is that if we store pplr_browse(duration_years > 20) in an object, this object can directly be passed on to pplr_get_data() for download. This is not possible with an object produced via the synthax pplr_browse() %>% filter(duration_years > 20). Admittedly, at the top of my head I cannot answer whether it is possible to retain the same functionality using the pplr_browse() %>% filter(duration_years > 20) synthax, but I am currently working on finding a reliable answer!

@noamross
Copy link
Contributor

noamross commented May 4, 2019

Hello @AldoCompagnoni, I'm just checking in on the current status of the changes you describe above. Please let us know when you have an update to review, and if there's anything we can help with.

@AldoCompagnoni
Copy link
Author

Dear @noamross ,
Thanks for checking in! We are done with points 1-4, but we are currently de-bugging issues with the database itself. A bit after the review, the database broke due to hacking/lack of space. We just set a new instance of the database with increased security settings, but we are still working to set up the API; at the same time, we are planning to upgrade to a AWS instance with twice as much RAM and disk space in the upcoming 2 months. This upgrade is needed to prevent new crashes due to disk space, and increase the speed of downloads.

We decided not to address point 5 for a reason that we consider very important: we realized that addressing it would interphere with the way we designed the package to work. Writing an expression such as pplr_browse( duration_years > 20 ) does not just subset a data frame, but produces an S3 class with attributes that can be used by functions pplr_get_data(), pplr_metadata_url(), and pplr_citation(). We present this workflow in Figure 2 of popler's manuscript draft (https://github.com/texmiller/popler-ms/blob/master/popler_ms.pdf).

I have a writing retreat next week, so I expect to get back by May 19th at the latest.

@AldoCompagnoni
Copy link
Author

I am checking in as announced, but regretfully we have found a substantial issue with the API. We think it is more practical to provide a rundown of the changes we've made to the package once the package itself is running perfectly.

We'll be back to this as soon as possible, hopefully within mid week.

@noamross
Copy link
Contributor

Thanks for letting us know @AldoCompagnoni

@AldoCompagnoni
Copy link
Author

AldoCompagnoni commented Jun 10, 2019

Dear @cgries, @bpbond, and @noamross,
Below I illustrate our revision of the code for the popler R package. I will split this report point by point, as opposted to the 4 points in which I had originally summarized the reviews. In order:

goodpractice::gp()
These checks show only issues with the tests - but our codecov shows 72% of our code is covered by tests, which @noamross suggested is fine.

devtools::test()
All 99 tests pass without warnings/notes on my machine.
Following the wonderful suggestion by @bpbond, we have implemented all tests that rely on a download using STUBS. Tests now run in about 140 seconds.

devtools::check()
Checks pass, but we get this NOTE:

Non-standard file/directory found at top level:
'README_files'

because we use a directory, README_files, to store one of the two figures that we added to the README. We are unaware if this is a substantial issue, so please let us know!

Documentation: README and VIGNETTES

All reviewers, @cgries, @bpbond, and @noamross, expressed concerns about our documentation. Therefore:

  1. We included a new vignette dedicated to the structure of the popler database, we included a succinct representation of the schema in the README, and we pointed to the vignettes and manuscript draft whenever logical. We also added an additional figure to the README to show the output of function pplr_site_rep_plot().
  2. We reviewed all the documentation to improve clarity, catch grammatical mistakes, and typos.

Issues with the downloads

The issues originated from:

  1. First, and foremost, our AWS instance initially did not have enough disk space: only 8GB. We increased disk space to 20GB, and we are now using only 44% of the disk space available, which increased the speed of downloads. From the examples mentioned above, Antarctica <- pplr_get_data(lterid == 'PAL') takes less than 2 minutes, poa_data <- pplr_get_data(poa_metadata) takes 2 minutes. Note that BNZ <- pplr_get_data(lterid == 'BNZ') and SEV_studies <- pplr_get_data(lterid == 'SEV') take 6 and 10. We consider these download times to be tolerable, because the size of these datasets is 70MB and 120MB, respectively.
  2. The downloader contained in popler had two mundane mistakes associated with downloads of large datasets, and one dataset whose number of rows was an exact multiple of 1000. This is my bad, as I had not carefully tested the new downloader associated with the API on each single dataset. Rather, I had relied solely on the unit tests.

Report capability

We removed the warnings found by @bpbond when running pplr_browse(report=TRUE). This was caused by an issue with the encoding of a single study title, and on the standard warnings on multiple URLS produced by function pplr_metadata_url().
Regading the warnings on multiple URLs, we have retained these when using pplr_metadata_url() with objects produced by both pplr_browse() and pplr_get_data(). We think these warnings are informative (e.g., execute pplr_browse(proj_metadata_key == 297 ) %>% pplr_metadata_url() ).

Dependencies

Thank you @bpbond for commenting on dependencies: we did find that plyr and lubridate were not needed.

Duplication of functionality

Regarding substituting
pplr_browse(duration_years > 20)
with
pplr_browse() %>% filter(duration_years > 20)

We decided not to implement this because it would remove one of the characteristics that we purpusefully built into popler: pplr_browse(duration_years > 20) provides an object of class popler that contains, as one of its attributes, the logical statement of the query (in this specific case duration_years > 20). This attribute is then used inside of pplr_get_data().

Curious observation about dataset number 133

This is a good catch: we've noticed that in 14 datasets the authors/authors_contact in our project metadata, and the author reported in the citation do not correspond.
This occurs whenever the original data does not report the email of the first author of the dataset. In that case, we reported as "author", the person for which we had an email contact (usually one of the co-authors).
We think this is Ok, but @cgries please let us know if there are reasons for this to change: the alternative it to list the author, but have an NA in the authors_contact column.

@noamross
Copy link
Contributor

Thanks for the update, @AldoCompagnoni! @cgries and @bpbond, can you look at the changes and tell us if they address your previous comments?

@bpbond
Copy link

bpbond commented Jun 27, 2019

The changes laid out by @AldoCompagnoni above sound like they'll fully address my comments, yes.

@cgries
Copy link

cgries commented Jun 27, 2019

I agree, the changes look like they are addressing my concerns. To answer the last question, I would resort to calling the person you have an e-mail address for the contact rather than author. But that is so minor, that I don't feel it needs any further addressing.

@noamross
Copy link
Contributor

Thank you @bpbond and @cgries.

@AldoCompagnoni, I am trying to do final editor checks, but in building the vignettes, and also any pplr_get_data() call, I'm getting Error: Bad Request (HTTP 400). Is the service down?

@AldoCompagnoni
Copy link
Author

@noamross, @bpbond and @cgries,
Thanks for getting back!

@noamross: yes, the service is down, but this time it is the API and not the database (I bet it is something with the web server). I am traveling right now, but I'll fix this issue tomorrow, and I'll try to find a way to get notified when the API stops working.

Apologies for the inconvenience, but this is part of the process (I'm pretty sure I downloaded data from popler just two mondays ago, so this came out of the blue).

@noamross
Copy link
Contributor

Thanks for the update @AldoCompagnoni. Note that I am on vacation next week, so Editor @annakrystalli will be watching this issue and can wrap things up.

@AldoCompagnoni
Copy link
Author

Thank you @noamross,
And enjoy your vacation!

@annakrystalli, we have resolved the issue reported above by Noam: as expected, the web server was down. We are still evaluating options to automatically check when the API stops working (suggestions are welcome).

Note that I am on vacation until July 13th, so I might be slow in replying up to that date.

@annakrystalli
Copy link
Contributor

Approved! 🙌 Thanks @AldoCompagnoni for submitting and @bpbond & @cgries for your reviews!

To-dos:

  • Transfer the repo to rOpenSci's "ropensci" GitHub organization under "Settings" in your repo. I have invited you to a team that should allow you to do so. You'll be made admin once you do.
  • Add a rOpenSci review badge to your README:
    [![](https://badges.ropensci.org/254_status.svg)](https://github.com/ropensci/onboarding/issues/254)
    
  • Add the rOpenSci footer to the bottom of your README
    [![ropensci_footer](https://ropensci.org/public_images/ropensci_footer.png)](https://ropensci.org)
    
  • Fix any links in badges for CI and coverage to point to the ropensci URL. We no longer transfer Appveyor projects to ropensci Appveyor account so after transfer of your repo to rOpenSci's "ropensci" GitHub organization the badge should be:
    [![AppVeyor Build Status](https://ci.appveyor.com/api/projects/status/github/ropensci/popler?branch=master&svg=true)](https://ci.appveyor.com/project/AldoCompagnoni/popler)
    
  • We're starting to roll out software metadata files to all ropensci packages via the Codemeta initiative, see https://github.com/ropensci/codemetar/#codemetar for how to include it in your package, after installing the package - should be easy as running codemetar::write_codemeta() in the root of your package.

Should you want to acknowledge your reviewers in your package DESCRIPTION, you can do so by making them "rev"-type contributors in the Authors@R field (with their consent). More info on this here.

Welcome aboard! We'd love to host a blog post about your package - either a short introduction to it with one example or a longer post with some narrative about its development or something you learned, and an example of its use. If you are interested, review the instructions, and tag @stefaniebutland in your reply. She will get in touch about timing and can answer any questions.

We've started putting together a gitbook with our best practice and tips, this chapter starts the 3d section that's about guidance for after onboarding. Please tell us what could be improved, the corresponding repo is here.

Any questions, just let me know. And enjoy your holidays! 🏖

@annakrystalli
Copy link
Contributor

Actually, one last thing @AldoCompagnoni. I would expect, given the development during the review process that the package version would have changed since submission to reflect that. For more details on how to track changes in your package through versioning and NEWS see the relevant section in the dev guide.

@stefaniebutland
Copy link
Member

stefaniebutland commented Jul 4, 2019

first, enjoy your vacation 🏖

Hi @AldoCompagnoni. I'm rOpenSci's Community Manager, following up on blog post invitation. Noam Ross suggested a post "talking about managing both the service and the client package, but also the LTER network and such" would be well-received.

If you're interested, right now I have Tues July 23 available for publication. Ideally with a draft for me to review submitted by July 16. Later publication dates are ok too.

Let me know what you think. Happy to answer any questions.

@AldoCompagnoni
Copy link
Author

Dear @annakrystalli,
Thank you for your helpful last comments, which I believe we have addressed in full!

@stefaniebutland, we would be happy to submit a blog post for rOpenSci, but as you see, I am well over the July 16 deadline! What other dates do you have available? I'll be at Ecological Society of America 2019 (August 11st-19th), so a deadline before or after the ESA meeting would be optimal. Thank you in advance!

@annakrystalli
Copy link
Contributor

Thanks @AldoCompagnoni! I've returned admin rights to you on the repository.

Could you just tick the boxes in the Approval comment to confirm what has been done?

@AldoCompagnoni
Copy link
Author

AldoCompagnoni commented Jul 26, 2019

@annakrystalli,
I hope that the ticks below, quoting your approval comment, address your request as hoped!

Moreover, I have credited both reviewers, and I have also updated the NEWS.md file. However, in this last case I have not yet produced an release as suggested by the relevant section of the development guide. I would like to associate that with a CRAN release (the package has not been pushed to CRAN yet).

Let me know if I have left out relevant actions!

Approved! raised_hands Thanks @AldoCompagnoni for submitting and @bpbond & @cgries for your reviews!

To-dos:

* [ x ]  Transfer the repo to rOpenSci's "ropensci" GitHub organization under "Settings" in your repo.  I have invited you to a team that should allow you to do so.  You'll be made admin once you do.

* [ x ]  Add a rOpenSci review badge to your README:
  ```
  [![](https://badges.ropensci.org/254_status.svg)](https://github.com/ropensci/onboarding/issues/254)
  ```

* [ x ]  Add the rOpenSci footer to the bottom of your README
  ```
  [![ropensci_footer](https://ropensci.org/public_images/ropensci_footer.png)](https://ropensci.org)
  ```

* [ x ]  Fix any links in badges for CI and coverage to point to the ropensci URL. We no longer transfer Appveyor projects to ropensci Appveyor account so after transfer of your repo to rOpenSci's "ropensci" GitHub organization the badge should be:
  ```
  [![AppVeyor Build Status](https://ci.appveyor.com/api/projects/status/github/ropensci/popler?branch=master&svg=true)](https://ci.appveyor.com/project/AldoCompagnoni/popler)
  ```

* [ x ]  We're starting to roll out software metadata files to all ropensci packages via the Codemeta initiative, see https://github.com/ropensci/codemetar/#codemetar for how to include it in your package, after installing the package - should be easy as running codemetar::write_codemeta() in the root of your package.

Should you want to acknowledge your reviewers in your package DESCRIPTION, you can do so by making them "rev"-type contributors in the Authors@R field (with their consent). More info on this here.

Welcome aboard! We'd love to host a blog post about your package - either a short introduction to it with one example or a longer post with some narrative about its development or something you learned, and an example of its use. If you are interested, review the instructions, and tag @stefaniebutland in your reply. She will get in touch about timing and can answer any questions.

We've started putting together a gitbook with our best practice and tips, this chapter starts the 3d section that's about guidance for after onboarding. Please tell us what could be improved, the corresponding repo is here.

Any questions, just let me know. And enjoy your holidays! beach_umbrella

@stefaniebutland
Copy link
Member

Hi @AldoCompagnoni. Would you be able to submit a draft blog post via pull request before ESA? I may have a publication slot open for Aug 6 or 13th and prefer to receive a draft one week prior to publication so I can review.

Please suggest a date for draft submission and we can proceed from there. Thank you!

@AldoCompagnoni
Copy link
Author

@stefaniebutland,
A draft on August 6th to be published on August 13th would work for me! Expect my draft, and please let me know if there is additional information I should be aware of!

@stefaniebutland
Copy link
Member

stefaniebutland commented Aug 2, 2019

Sounds good @AldoCompagnoni. I've marked my calendar to look for draft on Aug 6.

Content suggestion from Noam Ross: a post "talking about managing both the service and the client package, but also the LTER network and such" would be well-received. It's important not to repeat information that's in the vignette; refer to it where needed.

Editorial and technical information about preparing your submission, including a template: https://github.com/ropensci/roweb2#contributing-a-blog-post

Don't hesitate to ask me questions here or in Slack. I'll be away Mon Aug 5.

Thank you for doing this extra work. I hope the post will get more eyes on your work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants