Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Submission: DataSpaceR #261

Closed
10 of 19 tasks
juyeongkim opened this issue Oct 16, 2018 · 23 comments
Closed
10 of 19 tasks

Submission: DataSpaceR #261

juyeongkim opened this issue Oct 16, 2018 · 23 comments
Assignees

Comments

@juyeongkim
Copy link

juyeongkim commented Oct 16, 2018

Summary

  • What does this package do? (explain in 50 words or less):
    DataSpaceR is an R interface to the CAVD DataSpace, a data sharing and discovery tool that facilitates exploration of HIV immunological data from pre-clinical and clinical HIV vaccine studies.

  • Paste the full DESCRIPTION file inside a code block below:

Package: DataSpaceR
Type: Package
Title: An R Interface to the CAVD DataSpace
Version: 0.5.2
Authors@R: c(person("Ju Yeong", "Kim",
                    email = "jkim2345@fredhutch.org",
                    role = c("aut", "cre")),
             person("The Collaboration for AIDS Vaccine Discovery",
                    role = "cph"))
Description: Provides a convenient API for accessing datasets within
    the CAVD DataSpace (dataspace.cavd.org).
URL: https://github.com/CAVDDataSpace/DataSpaceR
BugReports: https://github.com/CAVDDataSpace/DataSpaceR/issues
License: GPL-3 + file LICENSE
Encoding: UTF-8
LazyData: true
Imports:
    utils,
    R6,
    Rlabkey (>= 2.2.0),
    curl,
    httr,
    assertthat,
    digest,
    rjson,
    data.table
Suggests:
    testthat,
    covr,
    knitr,
    pryr
VignetteBuilder: knitr
RoxygenNote: 6.1.0
  • URL for the package (the development repository, not a stylized html page):
    https://github.com/CAVDDataSpace/DataSpaceR

  • Please indicate which category or categories from our package fit policies this package falls under *and why(? (e.g., data retrieval, reproducibility. If you are unsure, we suggest you make a pre-submission inquiry.):

[e.g., "data extraction, because the package parses a scientific data file format"]

The package fits into the data retrieval category as it provides API access to the database.

  •   Who is the target audience and what are scientific applications of this package?  

The target audience are immunologists, bioinformaticians, or statisticians in HIV vaccine research, and the scientific applications are meta analyses of exploring relationships across assays, studies, and time.

None that I am aware of.

  •   If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted.

#260

Requirements

Confirm each of the following by checking the box. This package:

  • does not violate the Terms of Service of any service it interacts with.
  • has a CRAN and OSI accepted license.
  • contains a README with instructions for installing the development version.
  • includes documentation with examples for all functions.
  • contains a vignette with examples of its essential functions and uses.
  • has a test suite.
  • has continuous integration, including reporting of test coverage, using services such as Travis CI, Coveralls and/or CodeCov.
  • I agree to abide by ROpenSci's Code of Conduct during the review process and in maintaining my package should it be accepted.

Publication options

  • Do you intend for this package to go on CRAN?
  • Do you wish to automatically submit to the Journal of Open Source Software? If so:
    • The package has an obvious research application according to JOSS's definition.
    • The package contains a paper.md matching JOSS's requirements with a high-level description in the package root or in inst/.
    • The package is deposited in a long-term repository with the DOI:
    • (Do not submit your package separately to JOSS)
  • Do you wish to submit an Applications Article about your package to Methods in Ecology and Evolution? If so:
    • The package is novel and will be of interest to the broad readership of the journal.
    • The manuscript describing the package is no longer than 3000 words.
    • You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see MEE's Policy on Publishing Code)
    • (Scope: Do consider MEE's Aims and Scope for your manuscript. We make no guarantee that your manuscript will be within MEE scope.)
    • (Although not required, we strongly recommend having a full manuscript prepared when you submit here.)
    • (Please do not submit your package separately to Methods in Ecology and Evolution)

Detail

  • Does R CMD check (or devtools::check()) succeed? Paste and describe any errors or warnings:

  • Does the package conform to rOpenSci packaging guidelines? Please describe any exceptions:

    • I've used camelCase in this package.
    • The test coverage is currently 92%, and I am getting LaTex errors with goodpractice:
── GP DataSpaceR ────────────────────────────────────────────────────────────────────────────────

It is good practice towrite unit tests for all functions, and all package code in general. 92%
    of code lines are covered by test cases.

    R/DataSpaceConnection.R:161:NA
    R/helpers.R:29:NA
    R/helpers.R:49:NA
    R/helpers.R:52:NA
    R/helpers.R:152:NA
    ... and 34 more linesfix this R CMD check WARNING: LaTeX errors when creating PDF version.
    This typically indicates Rd problems. LaTeX errors found: ! LaTeX Error: File
    `inconsolata.sty' not found. Type X to quit or <RETURN> to proceed, or enter new
    name. (Default extension: sty)
─────────────────────────────────────────────────────────────────────────────────────────────────
  • If this is a resubmission following rejection, please explain the change in circumstances:

  • If possible, please provide recommendations of reviewers - those with experience with similar packages and/or likely users of your package - and their GitHub user names:
    @seaaan

@sckott sckott added the package label Oct 17, 2018
@sckott
Copy link
Contributor

sckott commented Oct 17, 2018

Thanks for your submission @juyeongkim We'll get back to you very soon

@noamross
Copy link
Contributor

noamross commented Oct 24, 2018

Editor checks:

  • Fit: The package meets criteria for fit and overlap
  • Automated tests: Package has a testing suite and is tested via Travis-CI or another CI service.
  • License: The package has a CRAN or OSI accepted license
  • Repository: The repository link resolves correctly
  • Archive (JOSS only, may be post-review): The repository DOI resolves correctly
  • Version (JOSS only, may be post-review): Does the release version given match the GitHub release (v1.0.0)?

Editor comments

Thanks for your submission @juyeongkim and sorry for being a little slow on the pick-up. What a neat package design! Unfortunately I'm getting a few errors while testing. In R CMD Check, I get:

* creating vignettes ... ERROR
Quitting from lines 146-148 (Intro_to_DataSpaceR.Rmd) 
Error: processing vignette 'Intro_to_DataSpaceR.Rmd' failed with diagnostics:
216 is not a valid group ID
Execution halted
Error: Command failed (1)
Execution halted

And running test_package, I get:

Loading DataSpaceR
Loading required package: testthat
By exporting data from the CAVD DataSpace, you agree to be bound by the Terms of Use available on the CAVD DataSpace sign-in page at https://dataspace.cavd.org/cds/CAVD/app.view?
Testing DataSpaceR
✔ | OK F W S | Context
✔ | 40       | DataSpaceConnection [7.6 s]
✖ | 19 1     | helpers [1.1 s]
──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
test-helpers.R:21: failure: `getUserEmail`
DataSpaceR:::getUserEmail(url, NULL) not equal to `email`.
1/1 mismatches
x[1]: "noam.ross@gmail.com"
y[1]: "jkim2345@scharp.org"
──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
✔ |  9       | netrc
✔ | 73       | DataSpaceStudy (CAVD) [191.7 s]
✔ | 64       | DataSpaceStudy (cvd408) [9.2 s]
✖ |  0 2     | DataSpaceStudy (mice)
──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
test-study.R:26: failure: can connect to studies
`cavd` inherits from `try-error` not `DataSpaceStudy`.

test-study.R:27: failure: can connect to studies
`cavd` inherits from `try-error` not `R6`.
──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
✖ |  0 2     | DataSpaceStudy (CAVD 242)
──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
test-study.R:26: failure: can connect to studies
`cavd` inherits from `try-error` not `DataSpaceStudy`.

test-study.R:27: failure: can connect to studies
`cavd` inherits from `try-error` not `R6`.
──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
✔ | 73       | DataSpaceStudy (NYVAC durability comparison) [67.8 s]

══ Results ═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════
Duration: 280.0 s

OK:       278
Failed:   5
Warnings: 0
Skipped:  0

I note that in the neither in web GUI nor in con$availableGroups do I see group 216. I suspect that I don't have the same permissions you do and that your tests and vignettes attempting to access studies not available to all users. (I see the first test failure is because you've hard-coded your email into the tests, so it doesn't work once I've made my own netrc file). If this is the case, you should probably either select a different subset of studies, or skip tests conditionally based on user permissions.

Everything else checks out fairly well, though, so I'm happy to start seeking reviewers while you patch this.


Reviewers: @seaaan @czibman
Due date: 2018-11-26

@juyeongkim
Copy link
Author

juyeongkim commented Oct 26, 2018

Thanks @noamross! I will fix these tests today.

juyeongkim added a commit to ropensci/DataSpaceR that referenced this issue Oct 26, 2018
@noamross
Copy link
Contributor

noamross commented Nov 3, 2018

Great, with the update all tests and diagnostics passing. Reviewers assigned: @seaaan and @czibman. Due date 2018-11-26.

@seaaan
Copy link
Contributor

seaaan commented Nov 26, 2018

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

  • A statement of need clearly stating problems the software is designed to solve and its target audience in README
  • Installation instructions: for the development version of package and any non-standard dependencies in README
  • Vignette(s) demonstrating major functionality that runs successfully locally
  • Function Documentation: for all exported functions in R help
  • Examples for all exported functions in R Help that run successfully locally
  • Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

Functionality

  • Installation: Installation succeeds as documented.
  • Functionality: Any functional claims of the software been confirmed.
  • Performance: Any performance claims of the software been confirmed.
  • Automated tests: Unit tests cover essential functions of the package
    and a reasonable range of inputs and conditions. All tests pass on the local machine.
  • Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

Final approval (post-review)

  • The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing:

4


Review Comments

Package summary: DataSpaceR makes data from the CAVD DataSpace available in R. It connects to the database, provides a stuctured data framework (studies, assay types), and manages caching and data transfer. It also allows the user to supply filters to the database, so only the needed data is transferred.

Review summary: Overall, the package is easy to use and the vignette is comprehensive. I was especially pleased and impressed about how easy it is to connect to the database and deal with authentication. Getting data out just works. My most major suggestion for change is to consider abstracting the R6 objects away from the user and providing a tidyverse-like interface. I also have a number of minor documentation-related comments.

Major suggestion: replace R6 interface with tidyverse-like interface

The R6 objects used here are not difficult to use after reading the vignette, though to me as a base R and tidyverse user, they are unfamiliar. I suggest changing the interface for the following reasons:

  • R6 is less familiar to most R users
  • You can't get help on functions the normal way (e.g. to get help on getStudy, I can't type ?getStudy, I have to know to go to ?DataSpaceConnection and look in the methods section)
  • The current structure is not pipeable.

I suggest replacing it with a set of functions that take the object as the first argument.

For example:

con <- connectDS() 

con %>% 
    getStudy("cvd256") %>% 
    getDataSet("NAb")

so getStudy() takes a connection and a study descriptor and getDataSet takes a study and an assay descriptor. Same would go for availableGroups, availableStudies, and all the other functions. You could still use R6 under the hood, I'm only suggesting changing the interface exposed to users.

This is just a suggestion, as it would make it easier for me to use. But I am not all R users! And it's up to you how you want the interface to be.

Minor suggestions

  • Rename getVariableInfo to something more descriptive, like getDataSetDescription

  • Is it necessary for the user to run connectDS? Could this be run when the package is loaded? Then the user doesn't need a connection object at all.

  • Currently, connection and study objects expose a bunch of methods that I don't think the user is likely to use. For example, the connection methods clone, config, initialize, print, refresh, and .__enclos_env__ are all probably not needed by the user. If you keep the R6 interface, I suggest hiding these methods so that the user only sees the ones they are likely to use (e.g. the ones described in the vignette).

  • Could you reexport makeFilter() so the user doesn't have to call library(Rlabkey)?

  • Data column names are not always written with consistent case (e.g. cvd256$getVariableInfo("NAb") has camel case (ParticipantId) and snake case (antigen_type)). I'm not sure if that's inherent to DataSpace or can be changed in DataSpaceR but it would be nice to have consistent case.

  • When you run library(DataSpaceR), this message appears: By exporting data from the CAVD DataSpace, you agree to be bound by the Terms of Use available on the CAVD DataSpace sign-in page at https://dataspace.cavd.org/cds/CAVD/app.view? It's a little weird that it ends with a question mark, I would remove that. The link still works without it.

  • Various wording suggestions in a pull request: Minor changes to wording in the vignette. DataSpaceR#17

  • If you don't remove study$refresh(), document it better and make the example in ?DataSpaceStudy better. Right now it just says "Refresh DataSpaceStudyClass" and it's not clear to me what that means. Same in ?DataSpaceConnection

  • The "Format" description for DataSpaceConnection in the help page isn't very informative to me. It says "An object of class R6ClassGenerator of length 24." but I don't really know what that means. Why is the length important?

Configuration and connecting

I was very impressed and pleased at how easy it was to connect to the database. The package is well designed for making this easy and the vignette is well-written.

A couple of suggestions:

  • With no netrc file set up, this message A netrc file is required to connect to the DataSpace. is printed when you run library(DataSpaceR). It would be helpful for the message to give more guidance about how to set up a netrc file (eg link to vignette). If you don't know what a netrc file is and haven't found the vignette, you'll be completely stuck at this point otherwise.

  • Similarly, with no netrc file set up, if you load the package and run connectDS(), you get Error: Invalid credential or deactivated account. Check your account in the portal. That message seems appropriate for the case where the netrc file contains an invalid user name or password, but not for a missing netrc file. I suggest that you check for that and if you find a missing netrc file, give a helpful message (like above) about how to set up a netrc file.

Vignette comments

  • The vignette and README are both generally very good and comprehensive. Nice job!

  • Before initiating a connection to cvd256, show con$availableStudies

  • "Cross-study connection" is kind of a confusing name to me. It seems more like you're getting data from all studies?

  • In the "Connect to a saved group" section, provide more explanation of what a saved group is and why you would want to connect to it. I think I get it, but that's based on playing around and guessing. For that matter, how do you save a group?

  • In the "Caching" section, add more explanatory text. E.g. explain that datasets are automatically cached when first loaded, saving time, and that you can clear the cache if you want to (although why would you want to?). As this section is currently written, you have to read between the lines to understand that.

  • It would be helpful to have a table of all functions/methods that work on DataSpaceConnection (availableGroups, availableStudies, getGroup, getStudy) and DataSpaceStudy (studyInfo, treatmentArm, availableDatasets, getVariableInfo,
    clearCache, getDataset) objects. This table could be used as a quick reference.

Code suggestions

  • In DataSpaceConnection.R: (study == "" && is.null(groupId)) || (study == "" && is.number(groupId)) || (study != "" && is.null(groupId)) simplifies to
    is.null(groupId) || (study == "" && is.number(groupId))

R CMD check

R CMD check fails for me because the vignette doesn't build because group 216 isn't available to me. Changing it to another group (220) allows the check to succeed.

I had a problem where I couldn't run the check without have a netrc file set up. Does it succeed on winbuilder? I wonder how to provide that access for CRAN to build the vignette?

Once I resolved those issues, I got the following note (on Ubuntu 16.04, with R 3.4.4):

Unexported object imported by a ':::' call: ‘Rlabkey:::labkey.getRequestOptions’

My understanding is that CRAN won't accept packages using the triple-colon operator, but I may be mistaken.

Packaging guide

The packaging guide suggests all lowercase for the package name. It also suggests snake_case. Further, it advises againist package startup messages. Is yours absolutely necessary?

@noamross
Copy link
Contributor

noamross commented Nov 26, 2018

Thanks for your thorough review, @seaaan! @juyeongkim, I suggest waiting for @czibman's review so you can address comments together.

@czibman
Copy link

czibman commented Nov 27, 2018

DataSpaceR review

Package Review

Please check off boxes as applicable, and elaborate in comments below.
Your review is not limited to these topics, as described in the reviewer
guide

  • As the reviewer I confirm that there are no conflicts of
    interest for me to review this work (If you are unsure whether you
    are in conflict, please speak to your editor before starting your
    review).

Documentation

The package includes all the following forms of documentation:

  • A statement of need clearly stating problems the software
    is designed to solve and its target audience in README
  • Installation instructions: for the development version of
    package and any non-standard dependencies in README
  • Vignette(s) demonstrating major functionality that runs
    successfully locally
  • Function Documentation: for all exported functions in R
    help
  • Examples for all exported functions in R Help that run
    successfully locally
  • Community guidelines including contribution guidelines in
    the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports
    and Maintainer (which may be autogenerated via Authors@R).
For packages co-submitting to JOSS

N/A

Functionality

  • Installation: Installation succeeds as documented.
  • Functionality: Any functional claims of the software been
    confirmed.
  • Performance: Any performance claims of the software been
    confirmed.
  • Automated tests: Unit tests cover essential functions of
    the package and a reasonable range of inputs and conditions. All
    tests pass on the local machine.
  • Packaging guidelines: The package conforms to the rOpenSci
    packaging guidelines

Final approval (post-review)

  • The author has responded to my review and made changes to my
    satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 8 —

Review Comments

Summary

Thank you so much for making these data more accessible to researchers
This appears to be a well-designed package. I note below where some
improvements can be make the use of this package more dummy-proof.

README

Per ROpenSci’s requirement that the Statement of Need specify the user
population, consider adding a line or two about who a typical user of
the CAVD data might be.

Please also consider more explicitly describing contribution guidelines
in the README (in keeping with the “Community Guidelines” Requirement).

Installation

I had to use “force = TRUE” to get the package to
load.

devtools::install_github("CAVDDataSpace/DataSpaceR")
## Skipping install of 'DataSpaceR' from a github remote, the SHA1 (a139df3d) has not changed since last install.
##   Use `force = TRUE` to force installation

Creation of CAVD account

The package creator instructs the user for a CAVD account at
https://dataspace.cavd.org. I recognize that the registration process
is beyond the influence of the package creator, but registering for a
CAVD account was tricky since, when I clicked “submit” on the “member
details” page, nothing happened. Submitting member details appears
optional, however.

Vignette

Note that, when loading the package, a statement appears reminding the
user of the Terms of Use. That’s fine, but it ends in a question mark
(that is part of the url name). Makes me think I’m being prompted for
something. This can be reworded as “see [url] for the
terms”.

library(DataSpaceR)
## By exporting data from the CAVD DataSpace, you agree to be bound by the Terms of Use available on the CAVD DataSpace sign-in page at https://dataspace.cavd.org/cds/CAVD/app.view?

The Vignette has the reader look at the list of available groups and
choose group 216. There is no group 216 (see below). Is this because the
group was deleted from the CAVD dataset? Is there a way to ensure that
the documentation for this package stays current when the dataset
changes?

knitr::kable(con$availableGroups)
id label originalLabel description createdBy shared n studies
220 NYVAC durability comparison NYVAC_durability Compare durability in 4 NHP studies using NYVAC-C (vP2010) and NYVAC-KC-gp140 (ZM96) products. ehenrich TRUE 78 c(“cvd281”, “cvd434”, “cvd259”, “cvd277”)
228 HVTN 505 case control subjects HVTN 505 case control subjects Participants from HVTN 505 included in the case-control analysis drienna TRUE 189 vtn505
230 HVTN 505 polyfunctionality vs BAMA HVTN 505 polyfunctionality vs BAMA Compares ICS polyfunctionality (CD8+, Any Env) to BAMA mfi-delta (single Env antigen) in the HVTN 505 case control cohort drienna TRUE 170 vtn505

I was able to follow the rest of the vignette without trouble. As a very
minor quibble, the vignette reference at least one other package
(pryr). To allow for a faster run through the vignette that can be
done entirely by cutting and pasting, it would be helpful to add code in
the vignette to install any referenced packages if the user has not
already (as in below).

if (!("pryr" %in% installed.packages())) install.packages("pryr")

Documentation and testing

The package includes 3 functions: writeNetrc, checkNetrc, and connectDS.
Help files are provided for:

  • DataSpaceConnection
  • DataSpaceR-package
  • DataSpaceStudy
  • checkNetrc
  • connectDS
  • writeNetrc

The “DataSpaceR-package” help file does not appear to be loading
correctly when I type ?DataSpaceR-package(I get the help file for
Arithmetic Operators instead).

All three function help files include examples. The information in the
README and vignette on creating a .netrc file is helpful. The help file
for the writeNetrc function, in contrast, could benefit from
containing more of the information that is in the vignette (for example
on how to create and save the file). In addition, it would be helpful if
the writeNetrc command provided the option to overwrite an existing
netrc file (finding and changing a file with a non-standard name was
harder than it should have been on my mac).

There are four files containing tests:

  • test-connection.R

  • test-helpers.R

  • test-netrc.R

  • test-study.R

Additional comments:

It’s not clear to me what the purpose of a group is. I would have
thought it was a group of studies, but it is a group of datasets.

When you connect, you are provided with information regarding the number
of studies, subjects, assays, datapoints, and groups in the CAVD
database. It would be helpful if the documentation explained the role of
the five assays and why the number of assays is a meaningful number to
report (unless this information is obvious to people who’d use this
data). Perhaps also report the number of datasets.

@czibman
Copy link

czibman commented Nov 27, 2018

@juyeongkim
Copy link
Author

juyeongkim commented Nov 27, 2018

Thanks for your feedback, @seaaan and @czibman! I will address your reviews later this week.

@noamross
Copy link
Contributor

noamross commented Nov 28, 2018

Thanks for your review, @czibman!

@juyeongkim
Copy link
Author

juyeongkim commented Dec 19, 2018

Response to @seaaan's review

Major suggestion: replace R6 interface with tidyverse-like interface

This is a good suggestion, but I'd like to engage with the users and gather feedback from them before implementing a big change to the package.

Minor suggestions

Rename getVariableInfo to something more descriptive, like getDataSetDescription

I renamed it to getDatasetDescription.
ropensci/DataSpaceR@2b466c8
ropensci/DataSpaceR@cd35008

Is it necessary for the user to run connectDS? Could this be run when the package is loaded? Then the user doesn't need a connection object at all.

It's necessary for the users to run connectDS if they want to enter in their credential instead of using netrc file, specify the server (production or staging), and set verbosity.

Currently, connection and study objects expose a bunch of methods that I don't think the user is likely to use. For example, the connection methods clone, config, initialize, print, refresh, and .enclos_env are all probably not needed by the user. If you keep the R6 interface, I suggest hiding these methods so that the user only sees the ones they are likely to use (e.g. the ones described in the vignette).

It's a good suggestion, but this is inherit to R6, and I think there is no way to hide them (clone, initialize, print, and .__enclos_env__) yet (r-lib/R6#122). config is actually a field in DataSpaceConnection class, and it stores configuration of the connection object such as URL, path and username. refresh is a method that refreshes the connection object to update available studies and groups.

Could you reexport makeFilter() so the user doesn't have to call library(Rlabkey)?

Yes, that's a good idea!
ropensci/DataSpaceR@3c46588

Data column names are not always written with consistent case (e.g. cvd256$getVariableInfo("NAb") has camel case (ParticipantId) and snake case (antigen_type)). I'm not sure if that's inherent to DataSpace or can be changed in DataSpaceR but it would be nice to have consistent case.

It's inherent to DataSpace.

When you run library(DataSpaceR), this message appears: By exporting data from the CAVD DataSpace, you agree to be bound by the Terms of Use available on the CAVD DataSpace sign-in page at https://dataspace.cavd.org/cds/CAVD/app.view? It's a little weird that it ends with a question mark, I would remove that. The link still works without it.

Good catch! I removed the question mark.
ropensci/DataSpaceR@af8a2ba

Various wording suggestions in a pull request: ropensci/DataSpaceR#17

Thanks for the PR!

If you don't remove study$refresh(), document it better and make the example in ?DataSpaceStudy better. Right now it just says "Refresh DataSpaceStudyClass" and it's not clear to me what that means. Same in ?DataSpaceConnection

refresh() refreshes the connection object by re-retrieving available studies and groups. For example, in the UI, the user filtered the species by guinea pig and created a group. This new group will not be included in availableGroups until the user refreshes the connection object.

The "Format" description for DataSpaceConnection in the help page isn't very informative to me. It says "An object of class R6ClassGenerator of length 24." but I don't really know what that means. Why is the length important?

The "Format" description is inherit to how roxygen handles R6 class documentation. I don't know what it means either, but I removed this section.
ropensci/DataSpaceR@996c414

Configuration and connecting

With no netrc file set up, this message A netrc file is required to connect to the DataSpace. is printed when you run library(DataSpaceR). It would be helpful for the message to give more guidance about how to set up a netrc file (eg link to vignette). If you don't know what a netrc file is and haven't found the vignette, you'll be completely stuck at this point otherwise.

That's a good point. I changed the message to be more helpful.
ropensci/DataSpaceR@de9a65d

Similarly, with no netrc file set up, if you load the package and run connectDS(), you get Error: Invalid credential or deactivated account. Check your account in the portal. That message seems appropriate for the case where the netrc file contains an invalid user name or password, but not for a missing netrc file. I suggest that you check for that and if you find a missing netrc file, give a helpful message (like above) about how to set up a netrc file.

I added a check with a helpful message if mising a netrc file.
ropensci/DataSpaceR@507e21c

Vignette comments

The vignette and README are both generally very good and comprehensive. Nice job!

Thanks!

Before initiating a connection to cvd256, show con$availableStudies

Good idea. I added a code block with con$availableStudies.

"Cross-study connection" is kind of a confusing name to me. It seems more like you're getting data from all studies?

I changed the title from "Cross-study connection" to "Creating a connection to all studies".

In the "Connect to a saved group" section, provide more explanation of what a saved group is and why you would want to connect to it. I think I get it, but that's based on playing around and guessing. For that matter, how do you save a group?

A group is a curated collection of participants from filtering of treatments, products, studies, or species, and it is created in the DataSpace App. Let's say you are using the App to filter and visualize data and want to save them for later or explore in R with DataSpaceR. You can save a group by clicking the Save button on the Active Filter Panel. We can browse available the saved groups or the curated groups by DataSpace Team via availableGroups.

In the "Caching" section, add more explanatory text. E.g. explain that datasets are automatically cached when first loaded, saving time, and that you can clear the cache if you want to (although why would you want to?). As this section is currently written, you have to read between the lines to understand that.

I wrote this section earlier in the development process, and I was really excited to include this section because I thought it was cool, but now that I think about it, the users don't really need to know about this especially in an introductory vignette. I removed this section.

It would be helpful to have a table of all functions/methods that work on DataSpaceConnection (availableGroups, availableStudies, getGroup, getStudy) and DataSpaceStudy (studyInfo, treatmentArm, availableDatasets, getVariableInfo,
clearCache, getDataset) objects. This table could be used as a quick reference.

That's a great idea. I added quick reference tables for connection and study classes.
ropensci/DataSpaceR@02e1032

Code suggestions

In DataSpaceConnection.R: (study == "" && is.null(groupId)) || (study == "" && is.number(groupId)) || (study != "" && is.null(groupId)) simplifies to
is.null(groupId) || (study == "" && is.number(groupId))

Wow, that was some busy code. I actually deprecated groupId argument and remove this from getStudy. Phew.
ropensci/DataSpaceR@2fcfded

R CMD check

R CMD check fails for me because the vignette doesn't build because group 216 isn't available to me. Changing it to another group (220) allows the check to succeed.

Good catch! The group 216 was just for me and wasn't shared with others. I changed it to the group 220.
ropensci/DataSpaceR@02e1032

I had a problem where I couldn't run the check without have a netrc file set up. Does it succeed on winbuilder? I wonder how to provide that access for CRAN to build the vignette?

That's a good point. I disabled running tests and building vignettes in CRAN by following this suggestion.
ropensci/DataSpaceR@68f3e06

Once I resolved those issues, I got the following note (on Ubuntu 16.04, with R 3.4.4):

Unexported object imported by a ':::' call:Rlabkey:::labkey.getRequestOptions

My understanding is that CRAN won't accept packages using the triple-colon operator, but I may be mistaken.

I wasn't aware of that. It seems like I have three options according to this stackoverflow post: ask author to export the functions, copy the function source code, or use getFromNamespace to "import" the functions. I picked the option #3 for now, but I will ask the author of Rlabkey to export these functions in future.
ropensci/DataSpaceR@4dae8b9

Packaging guide

The packaging guide suggests all lowercase for the package name. It also suggests snake_case. Further, it advises againist package startup messages. Is yours absolutely necessary?

I wasn't aware of the package name suggestion when I started working on the package. Camel case is my lab's convention. We already have people using DataSpaceR, and I think it will be confusing to the users if I change the package name now. Plus, we have stickers now. It's official. :) As for package startup messages, we need to let the users know about the terms of use and check if netrc is present.

@juyeongkim
Copy link
Author

juyeongkim commented Dec 19, 2018

Response to @czibman's review

README

Per ROpenSci’s requirement that the Statement of Need specify the user population, consider adding a line or two about who a typical user of the CAVD data might be.

I added more info to README file.
ropensci/DataSpaceR@a8a3713

Please also consider more explicitly describing contribution guidelines in the README (in keeping with the “Community Guidelines” Requirement).

I added rOpenSci .github files.
ropensci/DataSpaceR@e537edf

Installation

I had to use “force = TRUE” to get the package to load.

That's because you have the latest version of DataSpaceR installed, and devtools is just telling you that there is no need to install the packpage that you have the latest version for.

Creation of CAVD account

The package creator instructs the user for a CAVD account at https://dataspace.cavd.org. I recognize that the registration process is beyond the influence of the package creator, but registering for a CAVD account was tricky since, when I clicked “submit” on the “member details” page, nothing happened. Submitting member details appears optional, however.

That's a good catch. I notified the developer to fix that. Thanks!

Vignette

Note that, when loading the package, a statement appears reminding the user of the Terms of Use. That’s fine, but it ends in a question mark (that is part of the url name). Makes me think I’m being prompted for something. This can be reworded as “see [url] for the terms”.

I removed the question mark.
ropensci/DataSpaceR@af8a2ba

The Vignette has the reader look at the list of available groups and choose group 216. There is no group 216 (see below). Is this because the group was deleted from the CAVD dataset? Is there a way to ensure that the documentation for this package stays current when the dataset changes?

The group 216 was just for me and wasn't shared with others. I changed it to the group 220.
ropensci/DataSpaceR@02e1032

I was able to follow the rest of the vignette without trouble. As a very minor quibble, the vignette reference at least one other package (pryr). To allow for a faster run through the vignette that can be done entirely by cutting and pasting, it would be helpful to add code in the vignette to install any referenced packages if the user has not already (as in below).

Documentation and testing

The “DataSpaceR-package” help file does not appear to be loading correctly when I type ?DataSpaceR-package(I get the help file for Arithmetic Operators instead).

?ggplot2-package or ?base-package results in the same page. Maybe you are thinking of ?DataSpaceR-package`` or ??DataSpaceR?

All three function help files include examples. The information in the README and vignette on creating a .netrc file is helpful. The help file for the writeNetrc function, in contrast, could benefit from containing more of the information that is in the vignette (for example on how to create and save the file). In addition, it would be helpful if the writeNetrc command provided the option to overwrite an existing netrc file (finding and changing a file with a non-standard name was harder than it should have been on my mac).

I added more info and overwrite argument to writeNetrc.
ropensci/DataSpaceR@63426a2
ropensci/DataSpaceR@507e21c

Additional comments:

It’s not clear to me what the purpose of a group is. I would have thought it was a group of studies, but it is a group of datasets.

A group is a curated collection of participants from filtering of treatments, products, studies, or species, and it is created in the DataSpace App. Let's say you are using the App to filter and visualize data and want to save them for later or explore in R with DataSpaceR. You can save a group by clicking the Save button on the Active Filter Panel. We can browse available the saved groups or the curated groups by DataSpace Team via availableGroups.

When you connect, you are provided with information regarding the number of studies, subjects, assays, datapoints, and groups in the CAVD database. It would be helpful if the documentation explained the role of the five assays and why the number of assays is a meaningful number to report (unless this information is obvious to people who’d use this data). Perhaps also report the number of datasets.

You are right this information is obvisus to the users, and it doesn't need to be included in the print statement.
ropensci/DataSpaceR@73b6f92

@juyeongkim
Copy link
Author

juyeongkim commented Dec 19, 2018

Thank you @seaaan and @czibman for reviewing the package!

@seaaan
Copy link
Contributor

seaaan commented Dec 28, 2018

Nice changes! Sorry it took me a bit to get a chance to review them. The updated package builds fine and passes R CMD CHECK. I have a few minor comments--and I do encourage you to consult with users about changing the interface as discussed above--but other than those, I don't have anything else and I approve the package. Nice job!

Vignette: how to get help

I suggest adding a short section in the vignette (and possibly README) explaining how to access documentation, since the help for R6 classes works a bit differently than for most of R.

Reference tables

For the reference tables of the vignette, add a little explanatory text explaining what the reference tables are meant to be used for.

Also, I suggest dropping a number of little-used methods from the tables. This will make the tables more useful for finding the most commonly used functions. Specifically, for DataSpaceConnection: drop config, initialize, print, and possibly refresh; for DataSpaceStudy: drop study, group, config, cache, clearCache, initialize, print, and possibly refresh.

I don't think you need the "type" column in these tables.

@juyeongkim
Copy link
Author

juyeongkim commented Dec 31, 2018

@seaaan Thanks! I made the changes here.

Happy new year! 🎆 🍾

@czibman
Copy link

czibman commented Jan 3, 2019

Apologies for taking so long to look at this. I don't have any further comments. Thanks @juyeongkim again for making this package, and happy new year to you all as well!

@juyeongkim
Copy link
Author

juyeongkim commented Jan 3, 2019

Thank you @czibman!

@juyeongkim
Copy link
Author

juyeongkim commented Jan 30, 2019

Ping @noamross

@noamross
Copy link
Contributor

noamross commented Jan 31, 2019

@juyeongkim Sorry I let this slip off the radar!

Approved! Thanks for submitting and thanks @czibman and @seaaan for your reviews!

To-dos:

  • Transfer the repo to rOpenSci's "ropensci" GitHub organization under "Settings" in your repo. I have invited you to a team that should allow you to do so. You'll be made admin once you do.
  • Add the rOpenSci footer to the bottom of your README
    " [![ropensci_footer](https://ropensci.org/public_images/ropensci_footer.png)](https://ropensci.org)"
  • Fix any links in badges for CI and coverage to point to the ropensci URL. We no longer transfer Appveyor projects to ropensci Appveyor account so after transfer of your repo to rOpenSci's "ropensci" GitHub organization the badge should be [![AppVeyor Build Status](https://ci.appveyor.com/api/projects/status/github/ropensci/pkgname?branch=master&svg=true)](https://ci.appveyor.com/project/individualaccount/pkgname).
  • We're starting to roll out software metadata files to all ropensci packages via the Codemeta initiative, see https://github.com/ropensci/codemetar/#codemetar for how to include it in your package, after installing the package - should be easy as running codemetar::write_codemeta() in the root of your package.

Should you want to acknowledge your reviewers in your package DESCRIPTION, you can do so by making them "rev"-type contributors in the Authors@R field (with their consent). More info on this here.

Welcome aboard! We'd also love a blog post about your package, either a short-form intro to it (https://ropensci.org/tech-notes/) or long-form post with more narrative about its development. (https://ropensci.org/blog/). If you are interested, @stefaniebutland will be in touch about content and timing.

We've put together a gitbook with our best practice and tips, this chapter starts the 3d section that's about guidance for after onboarding. Please tell us what could be improved, the corresponding repo is here.

@juyeongkim
Copy link
Author

juyeongkim commented Jan 31, 2019

@noamross Awesome! 🍾

I've checked off everything in the to-do list:
[x] Transfer the repo to rOpenSci
[x] Add the rOpenSci header to README
[x] Fix links
[x] Add codemeta.json

I'd like to acknowledge my reviewers in my package DESCRIPTION and/or README. Would you like to be acknowledged, @seaaan and @czibman?

I would love to write a blog post about my package. (I still need to write blog posts on RSelenium...)

Thank you @noamross, @seaaan, and @czibman for the review and feedback. It's been undoubtedly productive. 💯

@seaaan
Copy link
Contributor

seaaan commented Jan 31, 2019

@stefaniebutland
Copy link
Member

stefaniebutland commented Jan 31, 2019

I would love to write a blog post about my package

woohoo!

Give me an idea of when you would like to submit a draft via pull request. Don't worry about the RSelenium posts - would love to have one but there's no obligation and you could do one any time it works for you.

Happy to answer any questions @juyeongkim.

@stefaniebutland
Copy link
Member

stefaniebutland commented Feb 8, 2019

@juyeongkim following up. I know it's 💯hard to do all the work AND write a post. We would really love to help get more eyes on your work but I don't want to bug you. Let me know if you want me to pester you about scheduling a post of your choice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants