Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Submission: ramlegacy #264

Closed
11 of 19 tasks
kshtzgupta1 opened this issue Nov 21, 2018 · 42 comments
Closed
11 of 19 tasks

Submission: ramlegacy #264

kshtzgupta1 opened this issue Nov 21, 2018 · 42 comments

Comments

@kshtzgupta1
Copy link

kshtzgupta1 commented Nov 21, 2018

Summary

  • What does this package do? (explain in 50 words or less):

ramlegacy helps users access the excel version of the RAM Legacy Stock Assessment Database from www.ramlegacy.org. Data is freely available from the databse website, but downloading and reading in the data by hand can be time-consuming. ramlegacy is capable of downloading, reading and caching all the available versions of the database.

  • Paste the full DESCRIPTION file inside a code block below:
Type: Package
Package: ramlegacy
Title: Download and Read RAM Legacy Stock Assessment DataBase
Version: 0.1.0
Authors@R: 
    c(person(given = "Kshitiz",
             family = "Gupta",
             role = c("aut", "cre", "cph"),
             email = "kshtzgupta1@berkeley.edu"),
      person(given = "Carl",
             family = "Boettiger",
             role = c("aut", "cph"),
             email = "cboettig@gmail.com", comment="http://orcid.org/0000-0002-1642-628X"))
Description: Contains functions to download, cache and read in Excel version of
    the RAM Legacy Stock Assessment Data Base, an online compilation of
    stock assessment results for commercially exploited marine populations
    from around the world. More information about the database can be
    found at <http://ramlegacy.org/>.
License: MIT + file LICENSE
URL: https://github.com/kshtzgupta1/ramlegacy
BugReports: https://github.com/kshtzgupta1/ramlegacy/issues
Depends: 
    R (>= 2.10)
Imports: 
    cli (>= 1.0.0),
    crayon (>= 1.3.4),
    httr (>= 1.3.1),
    rappdirs (>= 0.3.1),
    readxl (>= 1.1.0)
Suggests:
    covr,
    testthat,
    httptest,
    knitr,
    rmarkdown
VignetteBuilder: 
    knitr
Encoding: UTF-8
LazyData: true
RoxygenNote: 6.1.0
X-schema.org-keywords: ramlegacy, marine, database, assessment, RAM, Legacy, download, read, cache, ropensci

  • URL for the package (the development repository, not a stylized html page):

www.github.com/kshtzgupta1/ramlegacy

  • Please indicate which category or categories from our package fit policies this package falls under *and why(? (e.g., data retrieval, reproducibility. If you are unsure, we suggest you make a pre-submission inquiry.):

data retrieval: ramlegacy downloads and reads in multiple versions of the excel version of the RAM Legacy Stock Assessment Database from the database's website.

  •   Who is the target audience and what are scientific applications of this package?  

Any one who needs access to RAM Legacy Stock Assessment Database: Fisheries Biologists, Conservationists, Students, Teachers, etc.

Sean Anderson has a namesake package not published to CRAN and it appears to be a stalled project on GitHub (last updated 9 months ago). However, unlike this package which supports downloading and reading in the Excel version of the database, Sean Anderson's project downloads the Microsoft Access copy of the database and converts it to a local sqlite3 database.

There is also RAMlegacyr, an older package last updated in 2015. Similar to Sean Anderson's project, the package seems to be an R interface only for the Microsoft Access version of the RAM Legacy Stock Assessment Database and provides a set of functions using RPostgreSQL to connect to the database.

  •   If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted.

Requirements

Confirm each of the following by checking the box. This package:

  • does not violate the Terms of Service of any service it interacts with.
  • has a CRAN and OSI accepted license.
  • contains a README with instructions for installing the development version.
  • includes documentation with examples for all functions.
  • contains a vignette with examples of its essential functions and uses.
  • has a test suite.
  • has continuous integration, including reporting of test coverage, using services such as Travis CI, Coveralls and/or CodeCov.
  • I agree to abide by ROpenSci's Code of Conduct during the review process and in maintaining my package should it be accepted.

Publication options

  • Do you intend for this package to go on CRAN?
  • Do you wish to automatically submit to the Journal of Open Source Software? If so:
    • The package has an obvious research application according to JOSS's definition.
    • The package contains a paper.md matching JOSS's requirements with a high-level description in the package root or in inst/.
    • The package is deposited in a long-term repository with the DOI:
    • (Do not submit your package separately to JOSS)
  • Do you wish to submit an Applications Article about your package to Methods in Ecology and Evolution? If so:
    • The package is novel and will be of interest to the broad readership of the journal.
    • The manuscript describing the package is no longer than 3000 words.
    • You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see MEE's Policy on Publishing Code)
    • (Scope: Do consider MEE's Aims and Scope for your manuscript. We make no guarantee that your manuscript will be within MEE scope.)
    • (Although not required, we strongly recommend having a full manuscript prepared when you submit here.)
    • (Please do not submit your package separately to Methods in Ecology and Evolution)

Detail

  • Does R CMD check (or devtools::check()) succeed? Paste and describe any errors or warnings:

Please note that running R CMD check on ramlegacy may result in the following warning if the user has not yet downloaded a version of the RAM Legacy Stock Assessment Database.

No version of the database has yet been downloaded. Use function download_ramlegacy() to download a version now.

Also note that R CMD check on this package will result in the following error in the testing suite if the user is not online when checking the package.

Error: Could not connect to the internet. Please check your connection settings and try again.

  • Does the package conform to rOpenSci packaging guidelines? Please describe any exceptions:

  • If this is a resubmission following rejection, please explain the change in circumstances:

  • If possible, please provide recommendations of reviewers - those with experience with similar packages and/or likely users of your package - and their GitHub user names:

@sckott sckott added the package label Nov 26, 2018
@sckott
Copy link
Contributor

sckott commented Nov 26, 2018

thanks for your submission @kshtzgupta1 We are discussing now and will get back to you soon

@sckott
Copy link
Contributor

sckott commented Dec 3, 2018

sorry for delay on this @kshtzgupta1 - we have some conflicts of interest we're thinking through ...

@geanders
Copy link

geanders commented Dec 17, 2018

Editor checks:

  • Fit: The package meets criteria for fit and overlap
  • Automated tests: Package has a testing suite and is tested via Travis-CI or another CI service.
  • License: The package has a CRAN or OSI accepted license (MIT + LICENSE)
  • Repository: The repository link resolves correctly
  • Archive (JOSS only, may be post-review): The repository DOI resolves correctly
  • Version (JOSS only, may be post-review): Does the release version given match the GitHub release (v1.0.0)?

Editor comments

@kshtzgupta1: Thanks for submitting a very interesting package. This is the first time I'm serving as an editor for ROpenSci, so I will be learning some parts of the editorial process as I edit your submission. Please let me know if you have any questions throughout.

I have a bit of feedback based on an initial look at the submission and some initial editorial checks.

Submission summary:

In your submission summary, could you please:

  • Add a hyperlink to the RAM Legacy Stock Assessment Database where the current summary has "[website.]"

Checks from devtools::spell_check():

There seem to be two legitimate spelling errors in the package code:

  WORD            FOUND IN
Assesment       load_ramlegacy.Rd:18
availabl        ram_dir.Rd:11
  • “Assesment” to “Assessment” in “load_ramlegacy.Rd”
  • “availabl” to “available” in “ram_dir.Rd”

Checks from covr::package_coverage()

ramlegacy Coverage: 79.33%
R/zzz.R: 23.33%
R/download_ramlegacy.R: 83.02%
R/utils.R: 88.41%
R/load_ramlegacy.R: 88.46%
R/read_ramlegacy.R: 100.00%
R/style.R: 100.00%

Checks from goodpractice::gp()

── GP ramlegacy ──────────────

It is good practice to

  ✖ write unit tests for all functions, and all package code
    in general. 80% of code lines are covered by test cases.

    R/download_ramlegacy.R:58:NA
    R/download_ramlegacy.R:59:NA
    R/download_ramlegacy.R:69:NA
    R/download_ramlegacy.R:70:NA
    R/download_ramlegacy.R:71:NA
    ... and 35 more lines

  ✖ avoid long code lines, it is bad for readability. Also,
    many people prefer editor windows that are about 80 characters
    wide. Try make your lines shorter than 80 characters

    R/download_ramlegacy.R:106:1
  • Check code line length on the line highlighted by this check

Checks from devtools::check()

  • This check brings up an initial warning about “mismatched braces or quotes” in
    R/utils.R#40, I believe in the sentence “This is also the location from where
    \code{\link{load_ramlegacy} loads the database from.” Please add the closing brace
    within that sentence.

Other

  • The first of the two “build passing” badges on the project’s GitHub page does not seem to lead
    to an active Travis CI project. I suggest removing this first badge to prevent confusion.
  • In the "Details" section, paste under the first item a note that running R CMD check on this package may result in a warning if the user has not yet downloaded a version of the RAM Legacy Stock Assessment Database, as well as copy in the warning that results in this case: "No version of the database has yet been downloaded. Use function download_ramlegacy() to download a version now."
  • Also in the "Details" section, add a note that running R CMD check on this package will result in errors in the testing suite if the user is not online when checking the package.

In the meantime, I will begin the process of assigning reviewers.

Reviewer: @boshek
Reviewer: @jafflerbach
Due date: January 25, 2018

@kshtzgupta1
Copy link
Author

kshtzgupta1 commented Dec 19, 2018

@geanders Thanks for your response. I have updated the submission as per your suggestions.

@geanders
Copy link

geanders commented Dec 20, 2018

Thank you, @kshtzgupta1!

Could I also ask you to add an rOpenSci review badge to the README file for your package? The full link for this should be:

[![](https://badges.ropensci.org/264_status.svg)](https://github.com/ropensci/software-review/issues/264)

@kshtzgupta1
Copy link
Author

kshtzgupta1 commented Dec 21, 2018

@geanders I have added the badge.

@boshek
Copy link

boshek commented Jan 8, 2019

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

  • A statement of need clearly stating problems the software is designed to solve and its target audience in README

  • Installation instructions: for the development version of package and any non-standard dependencies in README

    • Though this is likely to end up on CRAN, the current installation instructions don't automatically build the vignette so accessing the vignette locally fails. I'd just suggest remove the line:

    The vignette can also be viewed by calling vignette(package = "ramlegacy")

    Until the installation instruction are for the CRAN version. Alternatively, one could install by removing --no-build-vignettes
    from the build_opts in install_github

  • Vignette(s) demonstrating major functionality that runs successfully locally

  • Function Documentation: for all exported functions in R help

    • Note that this function does not support vectorization so please don't pass in a vector of version numbers

    • Consider revising the above since you are unable to do so anyways.
  • Examples for all exported functions in R Help that run successfully locally

  • Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

    • Consider adding either suggested reference or the maintainer of ramlegacy to the DESCRIPTION file in Authors@R
    • Consider adding a CODE OF CONDUCT and CONTRIBUTING file directly to the repo.

Functionality

  • Installation: Installation succeeds as documented.
    • It does with the exception of the vignettes not being build (discussed above)
  • Functionality: Any functional claims of the software been confirmed.
  • Performance: Any performance claims of the software been confirmed.
  • Automated tests: Unit tests cover essential functions of the package
    and a reasonable range of inputs and conditions. All tests pass on the local machine.
  • Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

Final approval (post-review)

  • The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 5


Review Comments

Summary

I think this is a great package that provides a clear set of instruction on how to download the ramlegacy database. The biggest issue in the package I see is the reliance of the library call to load data. The rest of the items identified below are relatively easy to address. Please let me know if anything is unclear or there is any disagreement.

  • I think that you are adding too much onto the library call. Consider any loading of data should be left solely to the load_ramlegacy function. I think this constrains future development of the package as future may entail functionality outside the database. This is also outside what a user would expect from a typical library call. We are into stylistic territory here but I personally think objects should not be loaded with the library call.

  • I wonder if load_ramlegacy should gain an argument where a user can specify a vector that controls which dataset are loaded. This might be a useful feature for production code, particularly a shiny app.

  • Consider not even exposing a raw_url nor ram_path argument here as described here:
    https://github.com/kshtzgupta1/ramlegacy/blob/8c26a196cc573a03574ed084a7e6c9d0b951e7a2/R/download_ramlegacy.R#L23-L25
    If a user is never meant to change it, then perhaps the argument could be omitted abstracting that possible confusion away from the user. What is the utility of those arguments if they are never meant to change?

  • I would strongly consider a NEWS.md file to track changes between versions. rOpenSci provides a great template

  • I think the package lacks some connection with the actual data. Right now a bunch of dataframes are loaded but a new user is expected to have prior knowledge of the ram database. I think this package could provide some path for users to learn about the data itself. Options include:

    • Use some of the documentation that is downloaded in the .zip file to clarify (perhaps via internal data) what each table means. Many of the column names aren't terribly descriptive. Somehow ramlegacy should provide a path for users to figure that out.

    • Perhaps the vignette/README could link to documentation. As it stands, the current documentation is very clear to load some tables in the global environment but this only takes the user partially up the learning curve. ramlegacy should provide users a path to take the next steps.

    • the vignette to include a very brief sample analysis where two tables are joined. A trivial example would be something like this:

library(ramlegacy)
#> * Multiple versions found including the latest one: 4.3 . Loading the latest version.
#> * Loading version 4.3 ...
#> v Version 4.3 has been successfully loaded.

TC_with_names <- merge(
  timeseries_values_views_v4.3[!is.na(timeseries_values_views_v4.3$TC), c("stockid", "TC")],
  stock_v4.3[stock_v4.3$region == "West Africa",],
  by = "stockid"
)

hist(TC_with_names$TC)

Created on 2019-01-08 by the reprex package (v0.2.1)

  • It isn't clear to me why you need different versions of the database. This functionality is highlighted quite prominently but it's utility is not clear.

  • RE: the alternate download location on github. I would consider if you have acknowledged and licensed this data appropriately in your assets repo. I would contact the RAM maintainers directly and seek direction on this. Fully reproducing the data might cause some concern with those maintainers.

  • The docs directory, _pkgdown.yml and _config.yml all needed to be added to .Rbuildignore for R CMD check to pass on my machine.

  • I am wondering if this is intentional behaviour. I think this is a carryover from readxl::read_excel but
    given the class of the data object it prints either as a dataframe or as a tibble depending on whether tibble is loaded or not. One thing to consider is even just adding tibble to Suggests for nicer printing in a vignette:

library(ramlegacy)
#> * Multiple versions found including the latest one: 4.3 . Loading the latest version.
#> * Loading version 4.3 ...
#> v Version 4.3 has been successfully loaded.

class(stock_v4.3)
#> [1] "tbl_df"     "tbl"        "data.frame"

head(stock_v4.3, 10)
#>             stockid    tsn     scientificname              commonname
#> 1       ACADRED2J3K 166774 Sebastes fasciatus         Acadian redfish
#> 2  ACADRED3LNO-UT12 166774 Sebastes fasciatus         Acadian redfish
#> 3      ACADREDGOMGB 166774 Sebastes fasciatus         Acadian redfish
#> 4        ACADREDUT3 166774 Sebastes fasciatus         Acadian redfish
#> 5        ACMACKSARG 172413     Scomber colias Argentine chub mackerel
#> 6           AFLONCH 166156    Beryx splendens               Alfonsino
#> 7            ALBAIO 172419   Thunnus alalunga           albacore tuna
#> 8           ALBAMED 172419   Thunnus alalunga           albacore tuna
#> 9          ALBANATL 172419   Thunnus alalunga           Albacore tuna
#> 10         ALBANPAC 172419   Thunnus alalunga           Albacore tuna
#>                      areaid                                    stocklong
#> 1           Canada-DFO-2J3K                    Acadian redfish NAFO-2J3K
#> 2      Canada-DFO-3LNO-UT12      Acadian redfish Units 1-2 and NAFO-3LNO
#> 3              USA-NMFS-5YZ Acadian redfish Gulf of Maine / Georges Bank
#> 4            Canada-DFO-UT3                       Acadian redfish Unit 3
#> 5       Argentina-CFP-ARG-S   Argentine chub mackerel Southern Argentina
#> 6   multinational-SPRFMO-CH                              Alfonsino Chile
#> 7     multinational-IOTC-IO                   Albacore tuna Indian Ocean
#> 8   multinational-ICCAT-MED                  Albacore tuna Mediterranean
#> 9  multinational-ICCAT-NATL                 Albacore tuna North Atlantic
#> 10   Multinational-ISC-NPAC                  Albacore tuna North Pacific
#>                     region inmyersdb myersstockid
#> 1        Canada East Coast         0         <NA>
#> 2        Canada East Coast         0         <NA>
#> 3            US East Coast         0         <NA>
#> 4        Canada East Coast         0         <NA>
#> 5            South America         0         <NA>
#> 6            South America         0         <NA>
#> 7             Indian Ocean         0         <NA>
#> 8  Mediterranean-Black Sea         0         <NA>
#> 9           Atlantic Ocean         0         <NA>
#> 10           US West Coast         0         <NA>

library(tibble)

stock_v4.3
#> # A tibble: 1,294 x 9
#>    stockid    tsn scientificname commonname areaid stocklong region
#>    <chr>    <dbl> <chr>          <chr>      <chr>  <chr>     <chr> 
#>  1 ACADRE~ 166774 Sebastes fasc~ Acadian r~ Canad~ Acadian ~ Canad~
#>  2 ACADRE~ 166774 Sebastes fasc~ Acadian r~ Canad~ Acadian ~ Canad~
#>  3 ACADRE~ 166774 Sebastes fasc~ Acadian r~ USA-N~ Acadian ~ US Ea~
#>  4 ACADRE~ 166774 Sebastes fasc~ Acadian r~ Canad~ Acadian ~ Canad~
#>  5 ACMACK~ 172413 Scomber colias Argentine~ Argen~ Argentin~ South~
#>  6 AFLONCH 166156 Beryx splende~ Alfonsino  multi~ Alfonsin~ South~
#>  7 ALBAIO  172419 Thunnus alalu~ albacore ~ multi~ Albacore~ India~
#>  8 ALBAMED 172419 Thunnus alalu~ albacore ~ multi~ Albacore~ Medit~
#>  9 ALBANA~ 172419 Thunnus alalu~ Albacore ~ multi~ Albacore~ Atlan~
#> 10 ALBANP~ 172419 Thunnus alalu~ Albacore ~ Multi~ Albacore~ US We~
#> # ... with 1,284 more rows, and 2 more variables: inmyersdb <dbl>,
#> #   myersstockid <chr>

Created on 2019-01-07 by the reprex package (v0.2.1)

  • I get these errors when I run goodpractice::gp
  ✖ fix this R CMD check WARNING: LaTeX errors when creating PDF
    version. This typically indicates Rd problems.
  ✖ fix this R CMD check ERROR: Re-running with no redirection of
    stdout/stderr. Hmm ... looks like a package You may want to clean
    up by 'rm -Rf
    C:/Users/salbers/AppData/Local/Temp/Rtmp4yG4PY/Rd2pdf43d827bd526'
  • I think some slight improvement could be made by running styler on the package though it is marginal.

Possible future work

I wonder if it would be useful to engage Sean Anderson to see if he would be willing to submit a pull request with his code that converts the Access database into an sqlite database. I think there would be an appetite for the sqlite option which could exist alongside the Excel version. His process requires an additional utility outside of R but maybe a R-only solution is available.

@geanders
Copy link

geanders commented Jan 8, 2019

Thank you for the very thorough review, @boshek! @kshtzgupta1, there will be a second review coming, as well, so you may want to wait for that before you begin addressing reviewer comments.

@jamiecmontgomery
Copy link

jamiecmontgomery commented Jan 18, 2019

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

  • A statement of need clearly stating problems the software is designed to solve and its target audience in README
  • Installation instructions: for the development version of package and any non-standard dependencies in README
  • Vignette(s) demonstrating major functionality that runs successfully locally
  • The Vignette is good except it does not explain how to install the package. You can only find installation instructions in the README
  • Function Documentation: for all exported functions in R help
  • Examples for all exported functions in R Help that run successfully locally
  • Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

Functionality

  • Installation: Installation succeeds as documented.
  • Functionality: Any functional claims of the software been confirmed.
  • Although this package functions as expected, there is a new version of RAM that it is not able to retrieve yet. See comments below for more information.
  • Performance: Any performance claims of the software been confirmed.
  • Automated tests: Unit tests cover essential functions of the package
    and a reasonable range of inputs and conditions. All tests pass on the local machine.
  • When running goodpractice::gp() I got the following output:
  ✖ write unit tests for all functions, and all package code in general. 69% of code lines are covered by test
    cases.

    R/download_ramlegacy.R:53:NA
    R/download_ramlegacy.R:58:NA
    R/download_ramlegacy.R:59:NA
    R/download_ramlegacy.R:68:NA
    R/download_ramlegacy.R:69:NA
    ... and 58 more lines

  ✖ fix this R CMD check WARNING: LaTeX errors when creating PDF version. This typically indicates Rd
    problems.
  • Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

Final approval (post-review)

  • The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 4


General Comments

There is only one version of the RAM database now available on ramlegacy.org, version 4.4. I believe the timing of this package going out for review and the recent website update were around the same time (December 20). That being said, there must be a reason the maintainers of the website no longer host the older versions. I encourage you to reach out to them to understand why. I know them and can make the connection if needed. As it is now, everyone using this package to download older versions will be retrieving the backups on GitHub. I believe this package should only allow retrieval of data that is also available on the site.

Sean Anderson has a package of the same name and same basic functionality at https://github.com/seananderson/ramlegacy. I have used this package for my own work in the past although in recent years I’ve been receiving updated RAM data directly via a DropBox link from the maintainers. I wholeheartedly support development of an R package to access the data such as this one and the one from Sean because the current data distribution methods are unreliable and opaque. That being said, I strongly encourage the developers of this package to work with Sean and the maintainers of the RAM legacy database to get this package functional. Again, I can help facilitate the connection as I’ve worked with Sean closely over the past few years.

As a frequent user of the RAM data, I know where to go to find information about what each of the loaded tables are. I understand the comment from the other reviewer about providing more information, but if this package is purely intended to just retrieve the RAM data from the website, I think these functions are good as is. I can come up with a myriad of other functions I would like to see from this R package based on the common wrangling tasks I do with this database, but I believe those could be added through feature requests and aren’t a reason to hold this package up.

More specific comments

I first tried download_ramlegacy() when the site was down (1/17/2019). I downloaded from the backup location on github. This worked well. A couple of hours later, the website was backup and when I tried to download the only version now available on the site (4.4) I got an error message

> download_ramlegacy(version = "4.4")
Error: Invalid version number. Available versions are 1.0, 2.0, 2.5, 3.0, 4.3,

The maintainers of this package will have to be monitoring ramlegacy.org carefully to know when a new version is released to then update this package.

I can forsee some user issues with the ram_url() argument in download_ramlegacy():

download_ramlegacy(version = NULL, ram_path = NULL,
  ram_url = "https://depts.washington.edu/ramlegac/wordpress/databaseVersions")

First, it looks like there is a typo (ramlegac instead of ramlegacy) so if the download_ramlegacy() function isn't acting as the user would expect (whether it's their fault or not) they might see the URL in the Help document and manually try to fix the typo. This may be an actual typo, I don't know. Either way, if you don't want the user to ever touch it I would hardcode that ram_url into the function itself rather than having it as an argument.

Also the path argument in load_ramlegacy() is again one that should not be touched by the user. I tried to mess with this path argument and when I entered an incorrect path I received this:

> load_ramlegacy(version = 4.3, path = "~/github")
★ Loading version 4.3 ...
Error in readRDS(path) : error reading from connection
In addition: Warning message:
In readRDS(path) : error reading the file

I suggest changing the function so that if something is entered in as a path argument, it returns a message reminding the user to not change the path argument. Or, remove this as an argument entirely.

@jamiecmontgomery
Copy link

jamiecmontgomery commented Jan 24, 2019

I just wanted to follow up and it seems within the past week, the maintainers of ramlegacy.org have changed where the database is kept. It is now hosted on Zenodo, which will require some changes to this package for accessing the data.

@geanders
Copy link

geanders commented Jan 24, 2019

Thank you to @boshek and @jafflerbach for your very thoughtful reviews!

@kshtzgupta1 : We now have both reviews in, so if you haven't already, you can begin in responding to the reviewers' comments and suggestions. In particular, it sounds like some changes to ram legacy.org, brought up by @jafflerbach, will require some package changes.

Please let me know if you have any questions as you prepare your response.

@boshek
Copy link

boshek commented Jan 24, 2019

Also worth noting that the authors are now including (or maybe always had) an .RData file will and an R script to load the database into a users session much like what this package does albeit in a less refined way.

@geanders
Copy link

geanders commented Jan 24, 2019

@boshek : Just to clarify, you're referring to the "DBdata.RData" and "loadDBdata (assessment data only).r" files listed here, right?

@cboettig
Copy link
Member

cboettig commented Jan 24, 2019

Thanks everyone for great reviews and discussion, this is very helpful. @kshtzgupta1 and I will need a bit more time to really work through and address all of these properly, but meanwhile will just make a few quick observations relevant to the immediate discussion.

@jafflerbach Thanks! Yes, we've been in discussion with Mike Melnychuk since September about moving the data to Zenodo to avoid the problems created by down-time and re-organization of the ramlegacy.org website, so we were very happy to hear from them about this development and look forward to using those endpoints for the package. While they have plans to upload older versions to Zenodo as well, these aren't currently linked up as related versions (a la https://blog.zenodo.org/2017/05/30/doi-versioning-launched/), hopefully they will be able to do that to provide a source for the original versions. So hopefully soon the Zenodo hosting will provide a much more stable source for any versions of the data but may be a bit of a moving target for a little longer. (This switch to the versioning approach would also allow automated resolution of the most recent version).

@boshek
Copy link

boshek commented Jan 24, 2019

@geanders 👍

@boshek
Copy link

boshek commented Feb 22, 2019

Nice work @kshtzgupta1. @geanders I won't be able to look at this until late next week but can prioritize it then.

@geanders
Copy link

geanders commented Feb 25, 2019

@kshtzgupta1 : Thanks so much for these revisions! Could you insert a link in your comment to the Issues where you give point-by-point reviews? That would help @boshek and @jafflerbach find them more easily.

@boshek : Thanks for letting me know your timeline. That works fine.

@geanders
Copy link

geanders commented Feb 25, 2019

Actually, @kshtzgupta1, could I ask you to copy your point-by-point responses over to this thread? It would be great to keep the full conversation on a single thread.

@kshtzgupta1
Copy link
Author

kshtzgupta1 commented Feb 26, 2019

@boshek Thank you for a detailed review! We have made many changes following your suggestions. I will update the README and vignette to reflect those changes after they are approved.

  • Though this is likely to end up on CRAN, the current installation instructions don't automatically build the vignette so accessing the vignette locally fails. I'd just suggest remove the line:

The vignette can also be viewed by calling vignette(package = "ramlegacy")

Until the installation instruction are for the CRAN version. Alternatively, one could install by removing --no-build-vignettes
from the build_opts in install_github

Fixed that line!

Vignette(s)** demonstrating major functionality that runs successfully locally

Fixed!

Function Documentation: for all exported functions in R help

  • Note that this function does not support vectorization so please don't pass in a vector of version numbers

  • Consider revising the above since you are unable to do so anyways

Revised!

Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

  • Consider adding either suggested reference or the maintainer of ramlegacy to the DESCRIPTION file in Authors@R
  • Consider adding a CODE OF CONDUCT and CONTRIBUTING file directly to the repo.

Added!

  • I think that you are adding too much onto the library call. Consider any loading of data should be left solely to the load_ramlegacy function. I think this constrains future development of the package as future may entail functionality outside the database. This is also outside what a user would expect from a typical library call. We are into stylistic territory here but I personally think objects should not be loaded with the library call.

Agreed! We have removed the loading behavior of library(ramlegacy) and all loading is now solely done
by load_ramlegacy

  • I wonder if load_ramlegacy should gain an argument where a user can specify a vector that controls which dataset are loaded. This might be a useful feature for production code, particularly a shiny app.

Excellent suggestion! We have added a dfs argument in load_ramlegacy to which the user can pass a vector of dataframes to load.

Having ram_url as an argument helps us write tests in which download_ramlegacy deal with internet connection issues. Although the package downloads and caches the database in the user's rappdirs directory by default we have now decided that having ram_path as an argument is useful in download_ramlegacy and load_ramlegacy because it allows the user to download the database to a location of their choice and read from that location if they wish to do so. I think we want to give the user that flexibility.

  • I would strongly consider a NEWS.md file to track changes between versions. rOpenSci provides a great template

Added!

  • I think the package lacks some connection with the actual data. Right now a bunch of dataframes are loaded but a new user is expected to have prior knowledge of the ram database. I think this package could provide some path for users to learn about the data itself.

We will definitely be open to including more educating vignettes if the maintainers of the database wanted to do that and certainly be willing to link more informative documentation in the vignette but at the same time we want to avoid writing things that are out-of-date or not coming from the maintainers. Also, in our opinion educating a new user in the database might be a bit outside the scope of the package which was primarily to improve access to the data.

  • It isn't clear to me why you need different versions of the database. This functionality is highlighted quite prominently but it's utility is not clear.

We believe providing access to older versions can be really useful for users trying to reproduce older research papers and studies.

  • RE: the alternate download location on github. I would consider if you have acknowledged and licensed this data appropriately in your assets repo. I would contact the RAM maintainers directly and seek direction on this. Fully reproducing the data might cause some concern with those maintainers.

Prof. Boettiger was in touch with the RAM maintainers regarding that. While the maintainers have moved the latest versions (4.40, 4.41, 4.44) to Zenodo they still have to do the same for the older versions. So till that happens the package will have to use the github repo to make the older versions available to the users.

  • The docs directory, _pkgdown.yml and _config.yml all needed to be added to .Rbuildignore for R CMD check to pass on my machine.

Added to .Rbuildignore!

  • I am wondering if this is intentional behaviour. I think this is a carryover from readxl::read_excel but
    given the class of the data object it prints either as a dataframe or as a tibble depending on whether tibble is loaded or not. One thing to consider is even just adding tibble to Suggests for nicer printing in a vignette

It was intentional. We wanted to give the user the option to choose whether they wanted the tables as tibbles or dataframe. If for some reason a user prefers data.frame to tibble then we certainly don't want to impose tibbles on them.

  • I get these errors when I run goodpractice::gp
  ✖ fix this R CMD check WARNING: LaTeX errors when creating PDF
    version. This typically indicates Rd problems.
  ✖ fix this R CMD check ERROR: Re-running with no redirection of
    stdout/stderr. Hmm ... looks like a package You may want to clean
    up by 'rm -Rf
    C:/Users/salbers/AppData/Local/Temp/Rtmp4yG4PY/Rd2pdf43d827bd526'

We couldn't reproduce these on our machines. I think they might be specific to your machine and may be occurring because you don't have Tex installed.

I think some slight improvement could be made by running styler on the package though it is marginal.

Agreed! I will run it after all the changes have been approved.

I wonder if it would be useful to engage Sean Anderson to see if he would be willing to submit a pull request with his code that converts the Access database into an sqlite database. I think there would be an appetite for the sqlite option which could exist alongside the Excel version. His process requires an additional utility outside of R but maybe a R-only solution is available.

We are definitely open to working with Sean Anderson. I think Prof. Boettiger has pinged him.

@kshtzgupta1
Copy link
Author

kshtzgupta1 commented Feb 26, 2019

@jafflerbach Thank you for a detailed review! We have made many changes following your suggestions. I will update the README and vignette to reflect those changes after they are approved.

Vignette(s) demonstrating major functionality that runs successfully locally
The Vignette is good except it does not explain how to install the package. You can only find installation instructions in the README

Fixed!

  • When running goodpractice::gp() I got the following output:
  ✖ write unit tests for all functions, and all package code in general. 69% of code lines are covered by test
    cases.

    R/download_ramlegacy.R:53:NA
    R/download_ramlegacy.R:58:NA
    R/download_ramlegacy.R:59:NA
    R/download_ramlegacy.R:68:NA
    R/download_ramlegacy.R:69:NA
    ... and 58 more lines

  ✖ fix this R CMD check WARNING: LaTeX errors when creating PDF version. This typically indicates Rd
    problems.

We couldn't reproduce that warning on our machines. I think it might be specific to your machine and may be occurring because you don't have Tex installed.

That being said, I strongly encourage the developers of this package to work with Sean and the maintainers of the RAM legacy database to get this package functional.

I believe Prof. Boettiger has reached out to Sean regarding that.

I first tried download_ramlegacy() when the site was down (1/17/2019). I downloaded from the backup location on github. This worked well. A couple of hours later, the website was backup and when I tried to download the only version now available on the site (4.4) I got an error message

> download_ramlegacy(version = "4.4")
Error: Invalid version number. Available versions are 1.0, 2.0, 2.5, 3.0, 4.3,

This should be resolved now!

I can forsee some user issues with the ram_url() argument in download_ramlegacy():

download_ramlegacy(version = NULL, ram_path = NULL,
  ram_url = "https://depts.washington.edu/ramlegac/wordpress/databaseVersions")

First, it looks like there is a typo (ramlegac instead of ramlegacy) so if the download_ramlegacy() function isn't acting as the user would expect (whether it's their fault or not) they might see the URL in the Help document and manually try to fix the typo. This may be an actual typo, I don't know. Either way, if you don't want the user to ever touch it I would hardcode that ram_url into the function itself rather than having it as an argument.

It was actually ramlegac in the url. But that is no longer relevant since we are now passing in the zenodo doi url to ram_url. Having ram_url as an argument helps us test for intended behavior of download_ramlegacy in face of internet connection issues and network problems.

Also the path argument in load_ramlegacy() is again one that should not be touched by the user. I tried to mess with this path argument and when I entered an incorrect path I received this:

> load_ramlegacy(version = 4.3, path = "~/github")
★ Loading version 4.3 ...
Error in readRDS(path) : error reading from connection
In addition: Warning message:
In readRDS(path) : error reading the file

I suggest changing the function so that if something is entered in as a path argument, it returns a message reminding the user to not change the path argument. Or, remove this as an argument entirely.

Although the package downloads and caches the database in the user's rappdirs directory by default we have now decided that having ram_path as an argument is useful in download_ramlegacy and load_ramlegacy because it allows the user to download the database to a location of their choice and read from that location if they wish to do so. I think we want to give the user that flexibility.

@boshek
Copy link

boshek commented Feb 28, 2019

Great job @kshtzgupta1! Thanks for such a great package and great job addressing issues raised in this review. You have sufficiently addressed any issues I raised. @geanders I recommend this package for acceptance. I do want to raise some final considerations:

It was intentional. We wanted to give the user the option to choose whether they wanted the tables as tibbles or dataframe. If for some reason a user prefers data.frame to tibble then we certainly don't want to impose tibbles on them.

I am really of two minds on this. I employ this same trick on a couple packages of mine but I also wonder if this introduces some weird ambiguity that relies on the side effect of having a package loaded. Though I don't think the package acceptance is contingent on this, I would in this case recommend imposing a tibble on the user.

Prof. Boettiger was in touch with the RAM maintainers regarding that. While the maintainers have moved the latest versions (4.40, 4.41, 4.44) to Zenodo they still have to do the same for the older versions. So till that happens the package will have to use the github repo to make the older versions available to the users.

I still don't love this solution. To me, this places too much of the package functionality at the mercy of that repo being available rather having a direct line to the data. I think that it is worth the sacrifice of having fewer versions available to drop this solution. That said, I similarly don't think this is sufficient to hold up acceptance.

@cboettig
Copy link
Member

cboettig commented Mar 2, 2019

Hi Sam @boshek , thanks for the excellent review and the kinds words! I don't mean to jump in here but just wanted to add a little context to the decision about accessing older versions. I agree that it would be much more desirable to have all versions on Zenodo, and based on our discussions with them, I think the RAM Legacy team will eventually post those, but it is hard to know exactly when. Meanwhile, I do believe access to the old versions is critical, all the more so for them not being available anywhere else now. Among other reasons, there are dozens of high profile papers based on these older versions; just today there is yet another appearing in Science that uses the version 3 data.

@boshek
Copy link

boshek commented Mar 6, 2019

Great context @cboettig . This rationale makes perfect sense.

@geanders
Copy link

geanders commented Mar 8, 2019

@jafflerbach : I wanted to check in with you to see if you had any thoughts on the response to your initial review for this package?

@cboettig
Copy link
Member

cboettig commented Mar 8, 2019

Also just a note that we did just hear back from Sean Anderson who was very gracious and positive about the package. He gave us a few suggestions (data tables no longer append the version name on the table, which was no longer necesary anyway with new approach of explicitly calling load_ramlegacy() with a version number instead of trying to load automatically on library(). (Also note that as Sean's package no longer works in it's current state, as it still points to the Wordpress URLs which are now defunct and would need to be replaced with the Zenodo DOIs)

Thanks Sam, Jamie & Brooke for your feedback so far! @kshtzgupta1 summarizes the edits in reply to Jamie above, these are waiting on an PR from an ropensci-review branch in the package meanwhile.

@jamiecmontgomery
Copy link

jamiecmontgomery commented Mar 8, 2019

Thanks @geanders, @kshtzgupta1 and @cboettig. Good hear you've been in touch with Sean Anderson as well as the RAM maintainers. No further requests from me on this package now.

@geanders
Copy link

geanders commented Mar 15, 2019

Approved! @kshtzgupta1 : Thanks so much for these thoughtful revisions. Both reviewers agree that the package should be accepted. Further, based on there reviews, there are a few potential changes you might want to consider in future revisions (e.g., removing the reliance on the GitHub repos of older database versions once this is possible, thinking some more about whether you want different behavior based on whether the user has tibble loaded) or through other avenues (e.g., posting some examples of using the downloaded data from the package, perhaps as a blog post, to help new users understand how this package can fit into a pipeline of analysis). Thanks to both @boshek and @jafflerbach for very helpful input and suggestions.

There are some things we'll need to do for the final processing of this package. I'm including the standard to-dos as a check list here. I'd like to add the caveat that this will be my first time doing the editor-side part of this process, so please bear with me as I figure out my end of things for the process!

To-dos:

  • Transfer the repo to rOpenSci's "ropensci" GitHub organization under "Settings" in your repo. I will be inviting you to a team that should allow you to do so. You'll be made admin once you do.
  • Add the rOpenSci footer to the bottom of your README
    " [![ropensci_footer](https://ropensci.org/public_images/ropensci_footer.png)](https://ropensci.org)"
  • Fix any links in badges for CI and coverage to point to the ropensci URL. We no longer transfer Appveyor projects to ropensci Appveyor account so after transfer of your repo to rOpenSci's "ropensci" GitHub organization the badge should be [![AppVeyor Build Status](https://ci.appveyor.com/api/projects/status/github/ropensci/pkgname?branch=master&svg=true)](https://ci.appveyor.com/project/individualaccount/pkgname).
  • We're starting to roll out software metadata files to all ropensci packages via the Codemeta initiative, see https://github.com/ropensci/codemetar/#codemetar for how to include it in your package, after installing the package - should be easy as running codemetar::write_codemeta() in the root of your package.

Should you want to acknowledge your reviewers in your package DESCRIPTION, you can do so by making them "rev"-type contributors in the Authors@R field (with their consent). More info on this here.

Welcome aboard! We'd also love a blog post about your package, either a short-form intro to it (https://ropensci.org/tech-notes/) or long-form post with more narrative about its development. (https://ropensci.org/blog/). If you are interested, @stefaniebutland will be in touch about content and timing.

We've started putting together a gitbook with our best practice and tips, this chapter starts the 3d section that's about guidance for after onboarding. Please tell us what could be improved, the corresponding repo is here.

@geanders
Copy link

geanders commented Mar 15, 2019

Also, since you are planning to submit to CRAN, you may find this list of CRAN gotchas helpful, and I'd be happy to provide support through that process.

@stefaniebutland
Copy link
Member

stefaniebutland commented Mar 18, 2019

Hello @kshtzgupta1 I'm rOpenSci's Community Manager, here to say we would love to feature a post about ramlegacy on our blog. This might bring your work to a larger audience.

This link will give you many examples of blog posts by authors of peer-reviewed packages so you can get an idea of the style and length you prefer: https://ropensci.org/tags/software-peer-review/. Shorter tech notes are here: https://ropensci.org/technotes/

Here are some technical and editorial guidelines for contributing a blog post: https://github.com/ropensci/roweb2#contributing-a-blog-post.

Please let me know if you're interested and we can discuss a deadline. Happy to answer any questions.

@kshtzgupta1
Copy link
Author

kshtzgupta1 commented Apr 3, 2019

My apologies for such a late reply @geanders. I was ill and needed to have a surgery. I have finished all the to-dos. Thank you for being such a helpful editor!

@kshtzgupta1
Copy link
Author

kshtzgupta1 commented Apr 3, 2019

@boshek Thank you for being such a great reviewer! Is it fine if I add you as a "rev"-type contributor in the Authors@R field? If so, can I put in your email as sam.albers@gmail.com and your orcid id as https://orcid.org/0000-0002-9270-7884 ?

@kshtzgupta1
Copy link
Author

kshtzgupta1 commented Apr 3, 2019

@jafflerbach Thank you very much for reviewing the package! I would like to acknowledge you as a "rev"-type contributor in the Authors@R field if that's okay with you. Can I put in your email as afflerbach@nceas.ucsb.edu and orcid id as https://orcid.org/0000-0002-5215-9342 ?

@kshtzgupta1
Copy link
Author

kshtzgupta1 commented Apr 3, 2019

@stefaniebutland Thank you so much for reaching out. My sincere apologies for the late reply. I think Prof. Boettiger and I would definitely be interested in authoring a blog post about ramlegacy. I'll discuss the content of the post with Prof. Boettiger and let you know about a timeline very soon.

@jamiecmontgomery
Copy link

jamiecmontgomery commented Apr 3, 2019

@boshek
Copy link

boshek commented Apr 3, 2019

@kshtzgupta1 👍

@geanders
Copy link

geanders commented Apr 3, 2019

@kshtzgupta1 : No worries at all! Alright, I will work through the next steps on my end. Again, since this is the first package I've edited to reach this point, it might take me a bit longer than usual to make sure I've gotten through any required steps. Congrats on a very nice package!

@stefaniebutland
Copy link
Member

stefaniebutland commented Apr 11, 2019

@kshtzgupta1 Please look for an email from me to follow up about a blog post

@geanders
Copy link

geanders commented May 8, 2019

@kshtzgupta1 : I think we are all set with our post-approval steps, so I'm going to close this issue. Congrats on a really nice package!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants