Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Submission: cRegulome #149

Closed
MahShaaban opened this issue Sep 6, 2017 · 54 comments
Closed

Submission: cRegulome #149

MahShaaban opened this issue Sep 6, 2017 · 54 comments

Comments

@MahShaaban
Copy link

@MahShaaban MahShaaban commented Sep 6, 2017

Summary

  • What does this package do? (explain in 50 words or less):
    Obtains a database of pre-calculated microRNA and transcription facor-gene correlations in cancer. In addition, the package defines methods for handling and visualizing the data.

  • Paste the full DESCRIPTION file inside a code block below:

Package: cRegulome
Type: Package
Title: Obtain and Visualize regulome-gene Expression Correlations in Cancer
Version: 0.99.0
Author: Mahmoud Ahmed
Maintainer: Mahmoud Ahmed <mahmoud.s.fahmy@students.kasralainy.edu.eg>
Description: Obtain a database of pre-calculated microRNA and
  transcription facor-gene correlations in cancer. In addition, the package
  define basic methods for handling and visualizing the data.
License: GPL-2
URL: https://github.com/MahShaaban/cRegulome
BugReports: https://github.com/MahShaaban/cRegulome/issues
Encoding: UTF-8
LazyData: true
RoxygenNote: 6.0.1
Imports: DBI,
   RCurl,
   RSQLite,
   R.utils,
   UpSetR,
   VennDiagram,
   dplyr,
   dbplyr,
   ggplot2,
   ggjoy,
   tidyr,
   reshape2,
   purrr,
   grid
VignetteBuilder: knitr
Suggests: BiocStyle, 
    knitr, 
    rmarkdown, 
    testthat,
    covr
biocViews: GeneExpression,
    Transcription
  • URL for the package (the development repository, not a stylized html page):
    https://github.com/MahShaaban/cRegulome

  • Please indicate which category or categories from our [package fit policies]
    Data retrieval, because the package obtains a local database of Cistrome Cancer and miRCancerdb databases

  • Who is the target audience?
    Biologists and computational biologists with little knowledge of are and interested in investigating gene expression regulation in cancer.

  • Are there other R packages that accomplish the same thing? If so, how does
    yours differ or meet our criteria for best-in-category?

Requirements

Confirm each of the following by checking the box. This package:

  • does not violate the Terms of Service of any service it interacts with.
  • has a CRAN and OSI accepted license.
  • contains a README with instructions for installing the development version.
  • includes documentation with examples for all functions.
  • contains a vignette with examples of its essential functions and uses.
  • has a test suite.
  • has continuous integration, including reporting of test coverage, using services such as Travis CI, Coeveralls and/or CodeCov.
  • I agree to abide by ROpenSci's Code of Conduct during the review process and in maintaining my package should it be accepted.

Publication options

  • Do you intend for this package to go on CRAN?
  • Do you wish to automatically submit to the Journal of Open Source Software? If so:
    • The package has an obvious research application according to JOSS's definition.
    • The package contains a paper.md matching JOSS's requirements with a high-level description in the package root or in inst/.
    • The package is deposited in a long-term repository with the DOI:
    • (Do not submit your package separately to JOSS)

Detail

  • Does R CMD check (or devtools::check()) succeed? Paste and describe any errors or warnings:

  • Does the package conform to rOpenSci packaging guidelines? Please describe any exceptions:

  • If this is a resubmission following rejection, please explain the change in circumstances:

  • If possible, please provide recommendations of reviewers - those with experience with similar packages and/or likely users of your package - and their GitHub user names:

@karthik

This comment has been minimized.

Copy link
Member

@karthik karthik commented Sep 9, 2017

👋 @MahShaaban
I will be your editor on this submission. I'm just letting you know that I'm on travel at the moment and will update the thread by Tuesday the 12th when I'm back.

@MahShaaban

This comment has been minimized.

Copy link
Author

@MahShaaban MahShaaban commented Sep 9, 2017

That's alright @karthik. Safe travels.

@karthik

This comment has been minimized.

Copy link
Member

@karthik karthik commented Sep 13, 2017

Thanks for the submission! Looking for reviewers now. I also have two comments about one vignette and one test below.

Editor checks:

  • Fit: The package meets criteria for fit and overlap
  • Automated tests: Package has a testing suite and is tested via Travis-CI or another CI service.
  • License: The package has a CRAN or OSI accepted license
  • Repository: The repository link resolves correctly
  • Archive (JOSS only, may be post-review): The repository DOI resolves correctly
  • Version (JOSS only, may be post-review): Does the release version given match the GitHub release (v1.0.0)?

Here are some suggestions from goodpractice. I would suggest increasing test coverage as the most important issue.

── GP cRegulome ─────────────

It is good practice to

✖ write unit tests for all functions, and all package code in
general. 63% of code lines are covered by test cases.

R/get_data.R:43:NA
R/get_data.R:48:NA
R/get_data.R:53:NA
R/get_data.R:59:NA
R/get_data.R:60:NA
... and 140 more lines

✖ not import packages as a whole, as this can cause name clashes between the imported packages. Instead, import only the specific functions you need.
✖ fix this R CMD check ERROR: Packages suggested but not available: ‘BiocStyle’ ‘rmarkdown’ The suggested packages are required for a complete check. Checking can be attempted without them by setting the environment variable R_CHECK_FORCE_SUGGESTS to a false value.
See section ‘The DESCRIPTION file’ in the ‘Writing R Extensions’ manual.
────────────────────────────────────

Minor comments:

  • I would recommend ignoring .DS_Store in your global gitignore (or just for this project at the least).

  • Consider replacing Rcurl with curl.

@MahShaaban

This comment has been minimized.

Copy link
Author

@MahShaaban MahShaaban commented Sep 13, 2017

Thanks @karthik. I am looking forward.
Regarding your comments:

  • I put the .DS_Store to the .gitignore file
  • I am using Rcurl only because of the function url.exists. I am not aware of an alternative in curl.
  • When I run either devtools::check() or goodpractice::gp() I don't get any errors. I am not sure what to do since I can't reproduce the same goodpractice report.
  • Regarding the package coverage, I am working on it. Are you looking for a 100% coverage?!
@karthik

This comment has been minimized.

Copy link
Member

@karthik karthik commented Sep 13, 2017

For a replacement to url.exists, there are also options in httr
e.g. http_error("http://www.google.com")

Are you looking for a 100% coverage?!

No! As reasonable an amount of coverage as possible. We realize that 100% coverage is hard to achieve for most packages.

I'm currently seeking reviewers and have sent out requests. I will update the thread when I'm ready to assign reviewers.

@karthik

This comment has been minimized.

Copy link
Member

@karthik karthik commented Sep 14, 2017

Assigning @PeteHaitch as reviewer 1. (Note: Pete is away till the 21st and his review will be due 3 weeks from then which is Oct 12th)

@karthik

This comment has been minimized.

Copy link
Member

@karthik karthik commented Sep 14, 2017

Assigning @mmulvahill as reviewer 2. (Matt is also away next week and will return and begin 9/25). Review due by 10/16

@MahShaaban Please stay tuned till the reviews show up.

@karthik

This comment has been minimized.

Copy link
Member

@karthik karthik commented Sep 14, 2017

@MahShaaban If you wish to add a badge to your README, you can do so with the following code

[![](https://badges.ropensci.org/149_status.svg)](https://github.com/ropensci/onboarding/issues/149)

@MahShaaban

This comment has been minimized.

Copy link
Author

@MahShaaban MahShaaban commented Sep 15, 2017

Great. I am looking forward to hearing from them. Thanks @karthik for your efforts.

@PeteHaitch

This comment has been minimized.

Copy link

@PeteHaitch PeteHaitch commented Sep 20, 2017

Hi @MahShaaban,

I've just started to look at cRegulome. I first tried running devtools::check() but I get an error during the creation of the vignette (see below). I note that cRegulome is building fine on Travis. The error occurs on my machine in the call upset(ob_tf, study = 'ACC') in the vignette:

upset(ob_tf, study = 'ACC')
#> Error in mutate_impl(.data, dots) : 
#> Evaluation error: object 'x' not found. 

Stepping through this function, the error arises in the upset.CTF() method, specifically https://github.com/MahShaaban/cRegulome/blob/d5aae84a7a6fc9d8daac3ee2653fa932e30804fb/R/methods.R#L411-L413

Are you able to reproduce this error?

Cheers,
Pete

Output of devtools::check() on my machine
> devtools::check()
Updating cRegulome documentation
Loading cRegulome
Setting env vars --------------------------------------------------------------------------------------
CFLAGS  : -Wall -pedantic
CXXFLAGS: -Wall -pedantic
Building cRegulome ------------------------------------------------------------------------------------
'/Library/Frameworks/R.framework/Resources/bin/R' --no-site-file --no-environ --no-save --no-restore  \
  --quiet CMD build '/Users/Peter/GitHub/cRegulome' --no-resave-data --no-manual 

* checking for file/Users/Peter/GitHub/cRegulome/DESCRIPTION... OK
* preparingcRegulome:
* checking DESCRIPTION meta-information ... OK
* installing the package to build vignettes
* creating vignettes ... ERROR
Loading required package: R.oo
Loading required package: R.methodsS3
R.methodsS3 v1.7.1 (2016-02-15) successfully loaded. See ?R.methodsS3 for help.
R.oo v1.21.0 (2016-10-30) successfully loaded. See ?R.oo for help.

Attaching package: 'R.oo'

The following objects are masked from 'package:methods':

    getClasses, getMethods

The following objects are masked from 'package:base':

    attach, detach, gc, load, save

R.utils v2.5.0 (2016-11-07) successfully loaded. See ?R.utils for help.

Attaching package: 'R.utils'

The following object is masked from 'package:utils':

    timestamp

The following objects are masked from 'package:base':

    cat, commandArgs, getOption, inherits, isOpen, parse, warnings


Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

    filter, lag

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union


Attaching package: 'dbplyr'

The following objects are masked from 'package:dplyr':

    ident, sql


Attaching package: 'tidyr'

The following object is masked from 'package:R.utils':

    extract


Attaching package: 'cRegulome'

The following objects are masked from 'package:graphics':

    hist, plot

The following object is masked from 'package:base':

    print

trying URL 'https://www.dropbox.com/s/t8ga5j8o81jkcuv/test.db.gz?raw=1'
Content type 'application/octet-stream' length 875072 bytes (854 KB)
==================================================
downloaded 854 KB

Joining, by = c("mirna_base", "feature")
Joining, by = c("mirna_base", "feature")
Picking joint bandwidth of 0.0665
Quitting from lines 260-262 (using_cRegulome.Rmd) 
Error: processing vignette 'using_cRegulome.Rmd' failed with diagnostics:
Evaluation error: object 'x' not found.
Execution halted
Error: Command failed (1)
Session info
devtools::session_info()
#> Session info ------------------------------------------------------------------------------------------
#>  setting  value                       
#>  version  R version 3.4.1 (2017-06-30)
#>  system   x86_64, darwin15.6.0        
#>  ui       RStudio (1.1.331)           
#>  language (EN)                        
#>  collate  en_AU.UTF-8                 
#>  tz       America/New_York            
#>  date     2017-09-20                  

#> Packages ----------------------------------------------------------------------------------------------
#>  package          * version    date       source                             
#>  assertthat         0.2.0      2017-04-11 CRAN (R 3.4.0)                     
#>  base             * 3.4.1      2017-07-07 local                              
#>  bindr              0.1        2016-11-13 CRAN (R 3.4.0)                     
#>  bindrcpp           0.2        2017-06-17 CRAN (R 3.4.0)                     
#>  BiocGenerics       0.23.1     2017-09-05 Bioconductor                       
#>  bit                1.1-12     2014-04-09 CRAN (R 3.4.0)                     
#>  bit64              0.9-7      2017-05-08 CRAN (R 3.4.0)                     
#>  bitops             1.0-6      2013-08-17 CRAN (R 3.4.0)                     
#>  blob               1.1.0      2017-06-17 CRAN (R 3.4.0)                     
#>  codetools          0.2-15     2016-10-05 CRAN (R 3.4.1)                     
#>  colorout         * 1.1-2      2017-07-18 Github (jalvesaq/colorout@020a14d) 
#>  colorspace         1.3-2      2016-12-14 CRAN (R 3.4.0)                     
#>  commonmark         1.4        2017-09-01 CRAN (R 3.4.1)                     
#>  compiler           3.4.1      2017-07-07 local                              
#>  crayon             1.3.4      2017-09-16 CRAN (R 3.4.1)                     
#>  cRegulome        * 0.99.0     <NA>       Bioconductor                       
#>  datasets         * 3.4.1      2017-07-07 local                              
#>  DBI                0.7        2017-06-18 CRAN (R 3.4.0)                     
#>  dbplyr             1.1.0      2017-06-27 CRAN (R 3.4.0)                     
#>  devtools         * 1.13.3     2017-08-02 CRAN (R 3.4.1)                     
#>  digest             0.6.12     2017-01-27 CRAN (R 3.4.0)                     
#>  dplyr              0.7.3      2017-09-09 CRAN (R 3.4.1)                     
#>  fortunes           1.5-4      2016-12-29 CRAN (R 3.4.0)                     
#>  futile.logger      1.4.3      2016-07-10 CRAN (R 3.4.0)                     
#>  futile.options     1.0.0      2010-04-06 CRAN (R 3.4.0)                     
#>  GenomeInfoDb       1.13.4     2017-06-06 Bioconductor                       
#>  GenomeInfoDbData   0.99.1     2017-06-07 Bioconductor                       
#>  GenomicRanges      1.29.14    2017-09-15 Bioconductor                       
#>  ggjoy              0.4.0      2017-09-15 CRAN (R 3.4.1)                     
#>  ggplot2            2.2.1      2016-12-30 CRAN (R 3.4.0)                     
#>  ggridges           0.4.1      2017-09-15 CRAN (R 3.4.1)                     
#>  glue               1.1.1      2017-06-21 CRAN (R 3.4.0)                     
#>  graphics         * 3.4.1      2017-07-07 local                              
#>  grDevices        * 3.4.1      2017-07-07 local                              
#>  grid               3.4.1      2017-07-07 local                              
#>  gridExtra          2.3        2017-09-09 CRAN (R 3.4.1)                     
#>  gtable             0.2.0      2016-02-26 CRAN (R 3.4.0)                     
#>  httr               1.3.1      2017-08-20 CRAN (R 3.4.1)                     
#>  IRanges            2.11.16    2017-09-15 Bioconductor                       
#>  lambda.r           1.2        2017-09-16 CRAN (R 3.4.1)                     
#>  lazyeval           0.2.0      2016-06-12 CRAN (R 3.4.0)                     
#>  magrittr           1.5        2014-11-22 CRAN (R 3.4.0)                     
#>  memoise            1.1.0      2017-05-26 Github (hadley/memoise@e372cde)    
#>  methods          * 3.4.1      2017-07-07 local                              
#>  munsell            0.4.3      2016-02-13 CRAN (R 3.4.0)                     
#>  parallel           3.4.1      2017-07-07 local                              
#>  pkgconfig          2.0.1      2017-03-21 CRAN (R 3.4.0)                     
#>  plyr               1.8.4      2016-06-08 CRAN (R 3.4.0)                     
#>  pryr               0.1.2      2015-06-20 CRAN (R 3.4.0)                     
#>  purrr              0.2.3      2017-08-02 CRAN (R 3.4.1)                     
#>  R.methodsS3        1.7.1      2016-02-16 CRAN (R 3.4.0)                     
#>  R.oo               1.21.0     2016-11-01 CRAN (R 3.4.0)                     
#>  R.utils            2.5.0      2016-11-07 CRAN (R 3.4.0)                     
#>  R6                 2.2.2      2017-06-17 CRAN (R 3.4.0)                     
#>  Rcpp               0.12.12    2017-07-15 CRAN (R 3.4.1)                     
#>  RCurl              1.95-4.8   2016-03-01 CRAN (R 3.4.0)                     
#>  repete           * 0.0.0.9009 2017-08-18 Github (PeteHaitch/repete@f82233c) 
#>  reshape2           1.4.2      2016-10-22 CRAN (R 3.4.0)                     
#>  rlang              0.1.2.9000 2017-09-13 Github (tidyverse/rlang@ff02f2a)   
#>  roxygen2           6.0.1      2017-02-06 CRAN (R 3.4.0)                     
#>  RSQLite            2.0        2017-06-19 CRAN (R 3.4.0)                     
#>  rstudioapi         0.7.0-9000 2017-09-13 Github (rstudio/rstudioapi@8e8bfb0)
#>  S4Vectors          0.15.8     2017-09-14 Bioconductor                       
#>  scales             0.5.0      2017-08-24 CRAN (R 3.4.1)                     
#>  stats            * 3.4.1      2017-07-07 local                              
#>  stats4             3.4.1      2017-07-07 local                              
#>  stringi            1.1.5      2017-04-07 CRAN (R 3.4.0)                     
#>  stringr            1.2.0      2017-02-18 CRAN (R 3.4.0)                     
#>  testthat           1.0.2      2016-04-23 CRAN (R 3.4.0)                     
#>  tibble             1.3.4      2017-08-22 CRAN (R 3.4.1)                     
#>  tidyr              0.7.1      2017-09-01 CRAN (R 3.4.1)                     
#>  tools              3.4.1      2017-07-07 local                              
#>  UpSetR             1.3.3      2017-03-21 CRAN (R 3.4.0)                     
#>  utils            * 3.4.1      2017-07-07 local                              
#>  VennDiagram        1.6.17     2016-04-18 CRAN (R 3.4.0)                     
#>  withr              2.0.0      2017-07-28 CRAN (R 3.4.1)                     
#>  xml2               1.1.1      2017-01-24 CRAN (R 3.4.0)                     
#>  XVector            0.17.1     2017-08-19 Bioconductor                       
#>  yaml               2.1.14     2016-11-12 CRAN (R 3.4.0)                     
#>  zlibbioc           1.23.0     2017-04-27 Bioconductor
@MahShaaban

This comment has been minimized.

Copy link
Author

@MahShaaban MahShaaban commented Sep 20, 2017

Hi @PeteHaitch, thanks a lot for taking the time to check the package.
I am unable to reproduce the error.
I can think of one thing, devtools::check() overwrites the NAMESPACE file which has three manual entries import(RSQLite), import(R.utils) and import(dbplyr). I still don't get an error when removing these entries but you may try re-clone the package and run devtools::check(document=F) to see if the error goes away.
Have you tried knitting the vignette directly to see if you get the same error?

Thanks again

Mahmoud

@PeteHaitch

This comment has been minimized.

Copy link

@PeteHaitch PeteHaitch commented Sep 21, 2017

Starting fresh and running devtools::check(document = FALSE) still resulted in the same error. The error occurs when I run the vignette code manually, too.

I'll take a closer look tomorrow. In the meantime, can you please post the output of devtools::session_info() on the machine you are able to build the vignette so I can compare R version, package versions etc. with what I'm running (I tend to run bleeding edge, which may be causing issues).

@MahShaaban

This comment has been minimized.

Copy link
Author

@MahShaaban MahShaaban commented Sep 21, 2017

I ran update.packages() followed by devtools::check(document=F), the test passes with no errors.
Then, I ran the vignette code in the same session followed by devtools::session_info()

Session info
Session info -------------------------------------------------------------------------------
 setting  value                       
 version  R version 3.4.1 (2017-06-30)
 system   x86_64, darwin15.6.0        
 ui       RStudio (1.0.143)           
 language (EN)                        
 collate  en_US.UTF-8                 
 tz       Asia/Seoul                  
 date     2017-09-21                  

Packages -----------------------------------------------------------------------------------
package * version date source
assertthat 0.2.0 2017-04-11 CRAN (R 3.4.0)
base * 3.4.1 2017-07-07 local
bindr 0.1 2016-11-13 CRAN (R 3.4.0)
bindrcpp * 0.2 2017-06-17 CRAN (R 3.4.0)
bit 1.1-12 2014-04-09 CRAN (R 3.4.0)
bit64 0.9-7 2017-05-08 CRAN (R 3.4.0)
blob 1.1.0 2017-06-17 CRAN (R 3.4.0)
colorspace 1.3-2 2016-12-14 CRAN (R 3.4.0)
compiler 3.4.1 2017-07-07 local
cRegulome * 0.99.0 2017-09-15 Bioconductor
curl 2.8.1 2017-07-21 CRAN (R 3.4.1)
datasets * 3.4.1 2017-07-07 local
DBI * 0.7 2017-06-18 CRAN (R 3.4.0)
dbplyr * 1.1.0 2017-06-27 CRAN (R 3.4.1)
devtools 1.13.3 2017-08-02 CRAN (R 3.4.1)
digest 0.6.12 2017-01-27 CRAN (R 3.4.0)
dplyr * 0.7.3 2017-09-09 CRAN (R 3.4.1)
futile.logger 1.4.3 2016-07-10 CRAN (R 3.4.0)
futile.options 1.0.0 2010-04-06 CRAN (R 3.4.0)
ggjoy 0.4.0 2017-09-15 CRAN (R 3.4.1)
ggplot2 * 2.2.1 2016-12-30 CRAN (R 3.4.0)
ggridges 0.4.1 2017-09-15 CRAN (R 3.4.1)
glue 1.1.1 2017-06-21 CRAN (R 3.4.0)
graphics * 3.4.1 2017-07-07 local
grDevices * 3.4.1 2017-07-07 local
grid 3.4.1 2017-07-07 local
gridExtra 2.3 2017-09-09 CRAN (R 3.4.1)
gtable 0.2.0 2016-02-26 CRAN (R 3.4.0)
httr 1.3.1 2017-08-20 CRAN (R 3.4.1)
knitr 1.17 2017-08-10 CRAN (R 3.4.1)
labeling 0.3 2014-08-23 CRAN (R 3.4.0)
lambda.r 1.2 2017-09-16 CRAN (R 3.4.1)
lazyeval 0.2.0 2016-06-12 CRAN (R 3.4.0)
magrittr 1.5 2014-11-22 CRAN (R 3.4.0)
memoise 1.1.0 2017-04-21 CRAN (R 3.4.0)
methods * 3.4.1 2017-07-07 local
munsell 0.4.3 2016-02-13 CRAN (R 3.4.0)
pkgconfig 2.0.1 2017-03-21 CRAN (R 3.4.0)
plyr 1.8.4 2016-06-08 CRAN (R 3.4.0)
purrr 0.2.3 2017-08-02 CRAN (R 3.4.1)
R.methodsS3 * 1.7.1 2016-02-16 CRAN (R 3.4.0)
R.oo * 1.21.0 2016-11-01 CRAN (R 3.4.0)
R.utils * 2.5.0 2016-11-07 CRAN (R 3.4.0)
R6 2.2.2 2017-06-17 CRAN (R 3.4.0)
Rcpp 0.12.12 2017-07-15 CRAN (R 3.4.1)
reshape2 1.4.2 2016-10-22 CRAN (R 3.4.0)
rlang 0.1.2 2017-08-09 CRAN (R 3.4.1)
RSQLite * 2.0 2017-06-19 CRAN (R 3.4.0)
rstudioapi 0.7 2017-09-07 CRAN (R 3.4.1)
scales 0.5.0 2017-08-24 CRAN (R 3.4.1)
stats * 3.4.1 2017-07-07 local
stringi 1.1.5 2017-04-07 CRAN (R 3.4.0)
stringr 1.2.0 2017-02-18 CRAN (R 3.4.0)
tibble 1.3.4 2017-08-22 CRAN (R 3.4.1)
tidyr * 0.7.1 2017-09-01 CRAN (R 3.4.1)
tidyselect 0.2.0 2017-08-30 CRAN (R 3.4.1)
tools 3.4.1 2017-07-07 local
UpSetR 1.3.3 2017-03-21 CRAN (R 3.4.0)
utils * 3.4.1 2017-07-07 local
VennDiagram 1.6.17 2016-04-18 cran (@1.6.17)
withr 2.0.0 2017-07-28 CRAN (R 3.4.1)

@PeteHaitch

This comment has been minimized.

Copy link

@PeteHaitch PeteHaitch commented Sep 21, 2017

After downgrading some github-version packages to CRAN/BioC release-version the error has gone away (I suspect it was changes in rlang not being compatible with release-versions of other tidyverse pkgs. In any case, this was my problem not that of cRegulome). I can now proceed with my review. Thanks for your patience!

@PeteHaitch

This comment has been minimized.

Copy link

@PeteHaitch PeteHaitch commented Oct 12, 2017

@MahShaaban, @karthik here is my review

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

  • A statement of need clearly stating problems the software is designed to solve and its target audience in README
  • Installation instructions: for the development version of package and any non-standard dependencies in README
  • Vignette(s) demonstrating major functionality that runs successfully locally
  • Function Documentation: for all exported functions in R help
  • Examples for all exported functions in R Help that run successfully locally
  • Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).
For packages co-submitting to JOSS

Package not being co-submitted to JOSS

The package contains a paper.md matching JOSS's requirements with:

  • A short summary describing the high-level functionality of the software
  • Authors: A list of authors with their affiliations
  • A statement of need clearly stating problems the software is designed to solve and its target audience.
  • References: with DOIs for all those that have one (e.g. papers, datasets, software).

Functionality

  • Installation: Installation succeeds as documented.
  • Functionality: Any functional claims of the software been confirmed.
  • Performance: Any performance claims of the software been confirmed.
  • Automated tests: Unit tests cover essential functions of the package
    and a reasonable range of inputs and conditions. All tests pass on the local machine.
  • Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

Final approval (post-review)

  • The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 5 hours


Review Comments

cRegulome provides access to a database of pre-computed correlations of transcription factor-gene pairs and microRNA-gene pairs in cancer. It also provides several options for plotting these data.

Unfortunately, what is lacking right now is a clear statement of need and example(s) illustrating the potential use cases of cRegulome. The documentation is insufficient for a new user to know when they might make use of cRegulome and how it would be useful. The README appears to be incomplete (the 'More' and 'Citation' sections, in particular).

How are you planning to distribute the package, @MahShaaban? You didn't tick the box saying you plan to upload to CRAN; is that correct? Are you instead planning a Bioconductor submission? If submitting to Bioconductor then I anticipate that you will require much more integration with existing infrastructure and packages. I'm happy to discuss this further, as I have quite a bit of experience with BioC. If not CRAN or BioC, then how do you plan to distribute the package?

The package claims to support all cancer types studied by the TCGA (a consortium whose data are widely used in bioinformatics/computational biology). However, in testing the package I was only able to successfully query 2/34 cancer types (the 2 types used in the examples were the only ones I could get to work). Here's a reproducible example (I'm not 100% sure this isn't user error on my part):

library(cRegulome)
#> 
#> Attaching package: 'cRegulome'
#> The following objects are masked from 'package:graphics':
#> 
#>     hist, plot
#> The following object is masked from 'package:base':
#> 
#>     print

get_db(test = FALSE)
R.utils::gunzip('cRegulome.db.gz', overwrite = TRUE)
conn <- DBI::dbConnect(RSQLite::SQLite(), "cRegulome.db")

# All study IDs listed at 
# https://tcga-data.nci.nih.gov/docs/publications/tcga/
dat <- get_mir(conn,
               mir = c('hsa-let-7b', 'hsa-mir-134'),
               study = c("LAML", "ACC", "BLCA", "LGG", "BRCA", "CESC", "CHOL",
                         "COAD", "ESCA", "FPPP", "GBM", "HNSC", "KICH", "KIRC", 
                         "KIRP", "LIHC", "LUAD", "LUSC", "DLBC", "MESO", "OV", 
                         "PAAD", "PCPG", "PRAD", "READ", "SARC", "SKCM", "STAD",
                         "TGCT", "THYM", "THCA", "UCS", "UCEC", "UVM"),
               min_cor = .5,
               max_num = 100,
               targets_only = TRUE)
#> Error: Strings must match column names. Unknown columns: LAML, LGG, BRCA, CESC, CHOL, COAD, ESCA, FPPP, GBM, HNSC, KICH, KIRC, KIRP, LIHC, LUAD, LUSC, DLBC, MESO, OV, PAAD, PCPG, PRAD, READ, SARC, SKCM, STAD, TGCT, THYM, THCA, UCS, UCEC, UVM

The SQL databases downloaded by cRegulome:: get_db() are hosted on a Dropbox account. This feels somewhat fragile. Also, the URL of the resource is not displayed to the user unless they examine the source code for cRegulome:: get_db(). I'm unsure of rOpenSci best practices in this area, but I feel like this could be improved; perhaps @karthik can advise.

Some further questions on the contents of the database:

  • Did you create the database @MahShaaban?
  • What are the details of how these pre-computed correlations computed? Are the documented somewhere?
  • Is cRegulome.db simply re-packaging and re-distributing data from Cistrome Cancer and mirCancerdb data? Is that permitted under their respective licenses?

The get_db() function could avoid re-downloading the database for each project each time by using a cache. There are several R packages available for doing this, e.g. BiocFileCache.

Can users add additional databases? If so, then some guidance should be provided (e.g., format of database).

You should import the pipe operator into the package namespace and not define it inline (e.g. as in https://github.com/MahShaaban/cRegulome/blob/d5aae84a7a6fc9d8daac3ee2653fa932e30804fb/R/methods.R#L119)

The author of ggjoy, Claus Wilke, has deprecated the package and requested that users switch to ggridges; he wrote a blog post explaining his decision.

Additional questions and comments on specific issues from the rOpenSci review template are below. Hopefully these comments are helpful. Let me know if you have any questions.

A statement of need

  • It's not clear from the README what problems cRegulome is designed to solve. I think an example of where cRegulome has been useful in your research (in simplified form) might help. This would clarify the target audience. A hypothetical example, "Alice is studying pancreatic cancer and she has a list of genes that seem to be important in this cancer. Now she wants to investigate which miRNA and TFs might be regulating these genes - cRegulome can help her do this by X, Y and Z!"
  • The README mentions that there are 'many bioinformatics tools ... to predict and identify transcription factors and microRNA targets and their role in development of diseases including cancers'; what is that these tools lack and which cRegulome adds or improves upon?

Installation instructions

  • The instructions are clear and work. However, when running R CMD check via devtools::check() I get a note Package unavailable to check Rd xrefs: ‘targetscan.Hs.eg.db’. targetscan.Hs.eg.db is available from Bioconductor. I'm unsure of rOpenSci's policy regarding packages linked to in docs but not installed as part of the package; @karthik can you please advise.

Vignette(s)

  • The vignette runs successfully but is focuses too much on the technical/R side and not enough on the practical/scientific side. Currently, it mostly repeats what's already in the man pages, documenting the various classes & methods and giving some examples but without any real context or motivation.
  • You can go into greater detail here than in the README, writing less about the technical details and more about the scientific context in the than in the man pages. Some of the sort of questions I'd try to address in the vignette are:
    • What sort of experiment/analysis might you be running that would benefit from analysis with cRegulome?
    • What would you do hope to learn from looking at correlations of TFs and microRNAs with your gene list?
    • How do the plots help determine if there's something interesting for a given TF, miRNA or gene?
    • What does a 'good' plot look like? What does a 'bad' plot look like? Basically, how do I know if cRegulome is giving me useful information?

Function Documentation

  • The man pages are function/method-focused. It may be easier to follow and understand if they were re-written to be class-focused, with most (all?) methods for that class described there together. For example, a man page for the cTF class might include a description of the cTF class, the cTF() constructor function, the print() method for the cTF class, and perhaps the tidy() and various plotting-methods. Similarly, a separate man page for the cmicroRNA class and its associated methods.
  • 'Description' of get_mir() and get_tf() are identical - probably worth mentioning in each case what is being accessed from the database
  • Not enough detail for hist, joy(), plot(). What is the histogram/joy plot of? Correlations? Something else? I think more descriptive names than hist(), joy(), plot(), upset(), would also help, e.g., cor_hist(). Also, plot() and hist() are S3 generics from the graphics package, but in cRegulome these S3 methods have different signatures to the generic. That's generally a bad idea and a warning sign that you should probably use a different name for this functions, one that is more specific and more descriptive.
  • min_cor arg of get_mir() and get_tf() might better be named min_abs_cor since it's the "an absoute correlation minimmum"
  • tf in get_tf() is documented as "the official gene symbols of the genes contains the transcription factor". Unfortunately, there are many 'official' gene names/symbols/IDs, so this needs to be clarified. Based on the example for get_tf(), which uses tf = c("AFF4", "ESR1"), these appear to be 'HUGO Gene Symbols' (specified by the HUGO Gene Nomenclature Committee).
  • Please run devtools::spell_check() as there are several spelling mistakes in the man pages and elsewhere in the documentation

Community guidelines

  • Contribution guidelines need to be added to the README or a CONTRIBUTING file added

Functionality

  • As noted above, I couldn't get the package to work for 32/34 cancer types with a TCGA study

Automated tests

  • All tests pass on local machine
  • I tested get_tf() and get_mir() with some nonsense input to check how robust these were to errors
    • max_num is documented to be an integer. One of tests is that max_num = -1 leads to error max_num should be an integer between 1 and Inf. However, max_num = 0.3 and max_num = 0 do not result in an error
@MahShaaban

This comment has been minimized.

Copy link
Author

@MahShaaban MahShaaban commented Oct 13, 2017

@PeteHaitch, I'd like to thank you first for your time and effort.
Here is a point-to-point reply to your comments

  • I added a paragraph to the README and rephrased some of the functions documentation to clarify the use of the package and the functions. In additions, I added a second vignette with a case study using cRegulome. Due to the long run time of the vignette, I temporarily have it in the .Rbuildignore after checking that it runs locally fine. I am not sure if there is a workaround that.

  • As for the instructions for installing the development version of the package, I assume this is the development version since there are no stable releases yet. Please, let me know if you recommend another way.

  • I added a field Author@R to the DESCRIPTION file and removed Author and Maintainer

  • I am planning to submit to Bioconductor. I looked into using a variation of the eSet as a container but at last, I thought the sqlite is more convenient. I'd appreciate your guidance on the integration part.

  • I am using Dropbox for now, and probably can use figshare or do you suggest other services?

  • I built the database and here is a repo explaining the steps and link to the resources used cRegulomedb. It would be great to get you input on it as well.

  • Cistrome is licensed under CC and miRCancer is GPL-2 (by our lab)

  • I will look further into that, for now get_db() olny download the database if it doesn't exist into the working directory.

  • Currently, users can't import additional databases

  • The pipe operator is now imported through the namespace

  • The package is using ggridges now instead of ggjoy

A statement of need
I think the new paragraph in README and the second vignette should address your concerns regarding this point

Installation instructions
I am also not sure about packages mentioned in docs!

Vignette(s)
Please check the second vignette and let me know if it needs further rewriting.

Function Documentation

  • I was tempted to do exactly that, document the classes cTF and cmicroRNA but I read others advise that for S3 there is no need to document the class and instead the document the constructor functions! Do you think I should use something like @method in the man page of the constructor functions to show the methods used with these classes?
  • I changed the description of get_tf() and get_mir() as you suggested.
  • I changed the methods names to cor_hist(), cor_plot, cor_joy and cor_upset to avoid these conflicts.
  • min_cor is now min_abs_cor
  • I specified the HUGO Nomenclature as the gene symbols
  • I ran devtools::spell_check() and corrected the errors

Community guidelines
I added a CONTRIBUTING file.

Functionality
There was a problem with the urls, so in either case of get_db(test = TRUE) and get_db(test = FALSE) the function downloaded the same test subset of the database which contains only two studies. I corrected the urls and it should download the full database with test = FALSE. You can check which studies are available using dbListFields() in either tables cor_tf and cor_mir

Automated tests
I edited the code in get_tf() and get_mir() to handle max_num = 0.3 and max_num = 0 as will as test cases for that.

Please, let me know if I missed anything or if you have further concerns

@PeteHaitch

This comment has been minimized.

Copy link

@PeteHaitch PeteHaitch commented Oct 16, 2017

Hi @MahShaaban

Thank you for your response to the review and the changes to the package. There are still a few to sort through, and I'll address those point-by-point, below.

But first, my main concern, and where I need guidance from @karthik or someone else from rOpenSci, is that rOpenSci and Bioconductor (where you plan to submit your package) have different review criteria, expectations, and requirements (https://www.bioconductor.org/developers/package-submission/, https://www.bioconductor.org/developers/package-guidelines/). I don't speak on behalf of Bioconductor, however, I know that they aim to have packages that interoperate well with existing Bioconductor packages and data structures. Currently, I don't think cRegulome does this, which may be a difficulty when submitting to Bioconductor for review. From rOpenSci's perspective this (quite reasonably) may not be a problem, but your package will be reviewed upon submission to Bioconductor and, ultimately, it will have to satisfy Bioconductor's review criteria and not rOpenSci's in order to be accepted as a Bioconductor package.

So right now, I see 3 options:

  1. Figuring out a way to satisfy both rOpenSci and Bioconductor review processes (potentially rather tricky and requiring a bit of back-and-forth communication)
  2. Aiming to address rOpenSci's review criteria and submitting to CRAN instead of Bioconductor
  3. Submitting to Bioconductor and not submitting to rOpenSci

To emphasise, I think cRegulome may be a useful bioinformatics package, but I'm unsure how to reconcile the expectations and requirements of an rOpenSci and a Bioconductor package, especially when the package is yet to be reviewed by Bioconductor.

Point-by-point comments

Update to README and addition of 'case study' vignette

The updates to the README are good and address the 'statement of need'. The 'case study' vignette is a good start on exactly the type of thing I was looking for as a potential new user for the package. I have a few suggestions for improvements to this vignette.

Due to the long run time of the vignette, I temporarily have it in the .Rbuildignore after checking that it runs locally fine.

  • I think this vignette should be built as part of the package. The long running time of the vignette is a bit of a problem (although I didn't find it unreasonable). However, my bigger concern is that it requires the full cRegulome.db to be downloaded (750 Mb). I wonder whether this could be mocked, e.g., by having an unevaluated code chunk showing how to download and set up the database and then a hidden code chunk that loads the relevant subset of the database for use in the vignette. From then on, the the code chunks in the vignette would be evaluated, behaving as if the entire database was available. This would allow the vignette to be built more quickly and requiring fewer resources while still walking the user through the steps required to do replicate the analysis with the full database.
  • I removed the vignette from .Rbuildignore and ran R CMD build + R CMD check. Doing so revealed a few issues with the vignette - these NOTES from R CMD check will need to be addressed (see below).+
Peter@peters-mbp-2 GitHub $ R CMD check cRegulome_0.0.0.9000.tar.gz

I plan to fix this - the report is just in case I forget or get hit by a bus.
   -- Ross Ihaka (reporting a bug in persp)
      R-help (October 2003)

* using log directory/Users/Peter/GitHub/cRegulome.Rcheck* using R version 3.4.1 (2017-06-30)
* using platform: x86_64-apple-darwin15.6.0 (64-bit)
* using session charset: UTF-8
* checking for filecRegulome/DESCRIPTION... OK
* checking extension type ... Package
* this is packagecRegulomeversion0.0.0.9000* package encoding: UTF-8
* checking package namespace information ... OK
* checking package dependencies ... OK
* checking if this is a source package ... OK
* checking if there is a namespace ... OK
* checking for executable files ... OK
* checking for hidden files and directories ... OK
* checking for portable file names ... OK
* checking for sufficient/correct file permissions ... OK
* checking whether packagecRegulomecan be installed ... OK
* checking installed package size ... NOTE
  installed size is  6.9Mb
  sub-directories of 1Mb or more:
    doc       5.1Mb
    extdata   1.7Mb
* checking package directory ... OK
* checkingbuilddirectory ... OK
* checking DESCRIPTION meta-information ... OK
* checking top-level files ... OK
* checking for left-over files ... OK
* checking index information ... OK
* checking package subdirectories ... OK
* checking R files for non-ASCII characters ... OK
* checking R files for syntax errors ... OK
* checking whether the package can be loaded ... OK
* checking whether the package can be loaded with stated dependencies ... OK
* checking whether the package can be unloaded cleanly ... OK
* checking whether the namespace can be loaded with stated dependencies ... OK
* checking whether the namespace can be unloaded cleanly ... OK
* checking dependencies in R code ... NOTE
Namespaces in Imports field not imported from:R.utils’ ‘RSQLite’ ‘dbplyrAll declared Imports should be used.
* checking S3 generic/method consistency ... OK
* checking replacement functions ... OK
* checking foreign function calls ... OK
* checking R code for possible problems ... OK
* checking Rd files ... OK
* checking Rd metadata ... OK
* checking Rd cross-references ... OK
* checking for missing documentation entries ... OK
* checking for code/documentation mismatches ... OK
* checking Rd \usage sections ... OK
* checking Rd contents ... OK
* checking for unstated dependencies in examples ... OK
* checking installed files frominst/doc... OK
* checking files invignettes... OK
* checking examples ... OK
* checking for unstated dependencies intests... OK
* checking tests ...
  Runningtestthat.ROK
* checking for unstated dependencies in vignettes ... NOTE
'::' or ':::' import not declared from:AnnotationDbi'library' or 'require' calls not declared from:clusterProfiler’ ‘igraph’ ‘org.Hs.eg.db’ ‘readxl’ ‘stringr’
  ‘tidyverse* checking package vignettes ininst/doc... OK
* checking running R code from vignettes ...case_study.RmdusingUTF-8... OKusing_cRegulome.RmdusingUTF-8... OK
 NONE
* checking re-building of vignette outputs ... OK
* checking PDF version of manual ... OK
* DONE

Status: 3 NOTEs
See/Users/Peter/GitHub/cRegulome.Rcheck/00check.logfor details.
  • The 'case study' vignette needs to be spell-checked and proof-read
  • It looks like you are trying to use BiocStyle to format the 'case study' vignette. However, this is not being picked up because you have another YAML chunk ahead of the one that sets up BiocStyle. There should only be one YAML chunk and it needs to be the one referencing BiocStyle, I believe.
  • Why are different 'study' terms used in the get_tf() and get_mir() examples in the 'case study' vignette (see below)?
creg_tf <- get_tf(conn,
                  tf = unique(tf$SOURCE),
                  study = 'STES*')

creg_mir <- get_mir(conn,
                    mir = tolower(unique(mir$AccID)),
                    study = 'STES')
  • The 'case study' vignette would benefit from including more explanation/interpretation of the results (e.g., when you plot correlations for specific TFs and microRNAs from the STES data). This is especially true for the more complicated figures, like the network diagram created with igraph and the clustering dendogram.
  • It would be good to have a function that converts the cRegulome output to the igraph input. Although documented in the vignette, the user currently has to do a bit of data manipulation, a process which could be turned into a function and included as part of cRegulome.

Bioconductor integration

I am planning to submit to Bioconductor. I looked into using a variation of the eSet as a container but at last, I thought the sqlite is more convenient. I'd appreciate your guidance on the integration part.

  • This obviously depends on whether you submit to Bioconductor and would need to be addressed to the reviewer of your Bioconductor submission. For what it's worth, a tidy data frame (tibble or otherwise) seems a reasonable data structure, to me.
  • Traditionally, Bioconductor has made heavy use of the AnnotationDBI package to interface with annotation data packages using SQLite data storage.
  • A more modern, 'tidyverse'-styled approach has recently been developed with the Organism.dplyr package. This style might better suit cRegulome.

Database

I am using Dropbox for now, and probably can use figshare or do you suggest other services?

I built the database and here is a repo explaining the steps and link to the resources used cRegulomedb.

Licensing

  • Could you please link to the license for Cistrome Cancer and miRCancer

Function Documentation

I was tempted to do exactly that, document the classes cTF and cmicroRNA but I read others advise that for S3 there is no need to document the class and instead the document the constructor functions! Do you think I should use something like @method in the man page of the constructor functions to show the methods used with these classes?

Again, this comes down to style differences, here between Bioconductor/S4 and tidyverse/S3. This choice will depend on whether you submit to Bioconductor or CRAN, I think

Community guidelines

Please include links to the issues and pull requests pages. You might also like to use a CONTRIBUTING.md file from other rOpenSci projects as a template (e.g., https://github.com/ropensci/spocc/blob/master/.github/CONTRIBUTING.md, https://github.com/ropensci/taxize/blob/master/.github/CONTRIBUTING.md, https://github.com/ropensci/bold/blob/master/.github/CONTRIBUTING.md)

Functionality

  • With the updated database URL there are still 4 cancer types without any data. Why is that? Should they be available in the database? At least one of these, 'GBM'/'GBMLGG', seems to be due to a potential switch in IDs used for cancer types.
library(cRegulome)
#> 
#> Attaching package: 'cRegulome'
#> The following object is masked from 'package:base':
#> 
#>     print
get_db(test = FALSE)
R.utils::gunzip('cRegulome.db.gz', overwrite = TRUE)
conn <- DBI::dbConnect(RSQLite::SQLite(), "cRegulome.db")

# All study IDs listed at 
# https://tcga-data.nci.nih.gov/docs/publications/tcga/
dat <- get_mir(conn,
               mir = c('hsa-let-7b', 'hsa-mir-134'),
               study = c("LAML", "ACC", "BLCA", "LGG", "BRCA", "CESC", "CHOL",
                         "COAD", "ESCA", "FPPP", "GBM", "HNSC", "KICH", "KIRC", 
                         "KIRP", "LIHC", "LUAD", "LUSC", "DLBC", "MESO", "OV", 
                         "PAAD", "PCPG", "PRAD", "READ", "SARC", "SKCM", "STAD",
                         "TGCT", "THYM", "THCA", "UCS", "UCEC", "UVM"),
               min_abs_cor = .5,
               max_num = 100,
               targets_only = TRUE)
#> Error: Strings must match column names. Unknown columns: LAML, FPPP, GBM, MESO

# Manually list cancer types in database
DBI::dbListFields(conn, "cor_mir")
#>  [1] "feature"    "mirna_base" "ACC"        "BLCA"       "BRCA"       "CESC"       "CHOL"      
#>  [8] "COAD"       "COADREAD"   "DLBC"       "ESCA"       "GBMLGG"     "HNSC"       "KICH"      
#> [15] "KIPAN"      "KIRC"       "KIRP"       "LGG"        "LIHC"       "LUAD"       "LUSC"      
#> [22] "OV"         "PAAD"       "PCPG"       "PRAD"       "READ"       "SARC"       "SKCM"      
#> [29] "STAD"       "STES"       "TGCT"       "THCA"       "THYM"       "UCEC"       "UCS"       
#> [36] "UVM"

Other

  • The default print method is masked when loading cRegulome. This is generally a bad idea. Your S3 print() methods, print.cmicroRNA() and print.cTF() should have the same signature as base::print(), i.e. print(x, ...) rather than print(ob, ...)
library(cRegulome)
#> 
#> Attaching package: 'cRegulome'
#> The following object is masked from 'package:base':
#> 
#>     print
@mmulvahill

This comment has been minimized.

Copy link

@mmulvahill mmulvahill commented Oct 16, 2017

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

  • A statement of need clearly stating problems the software is designed to solve and its target audience in README
  • Installation instructions: for the development version of package and any non-standard dependencies in README
  • [x ] Vignette(s) demonstrating major functionality that runs successfully locally
  • Function Documentation: for all exported functions in R help
  • Examples for all exported functions in R Help that run successfully locally
  • Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).
For packages co-submitting to JOSS

The package contains a paper.md matching JOSS's requirements with:

  • A short summary describing the high-level functionality of the software
  • Authors: A list of authors with their affiliations
  • A statement of need clearly stating problems the software is designed to solve and its target audience.
  • References: with DOIs for all those that have one (e.g. papers, datasets, software).

Functionality

  • Installation: Installation succeeds as documented.
  • Functionality: Any functional claims of the software been confirmed.
  • Performance: Any performance claims of the software been confirmed.
  • Automated tests: Unit tests cover essential functions of the package
    and a reasonable range of inputs and conditions. All tests pass on the local machine.
  • Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

Final approval (post-review)

  • The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing:


Review Comments

Forgive any inaccurate use of terminology. I've developed databse API's for microRNA data, but my work as a statistician and analyst is in a different area of biomedical research.

A statement of need

Package is dual purpose -- 1. serving data for microRNA annotations and transcription factor correlations and 2. provide a set of descriptive methods and visualizations for these data types. This is clear enough from the README and vignette.

My preference is to keep the readme simple -- installation instructions and brief statement of purpose -- and provide detailed instructions and scientific context in the vignettes. Along these lines, I would shorten the readme and add to the vignette.

Bioconductor has specific classes and package templates for packages that serve to annotation data. (I just went through this for our multiMiR package, which provides functions for querying a collection of 14 miRNA databases. While I wasn't required to implement the full AnnotationDbi structure, I did have to implement the data access and filter functions -- AnnotationDbi::select(), AnnotationDbi::keys(), AnnotationDbi::columns(), and AnnotationDbi::keytypes()).

Also, I would consider separating it into two packages, especially if the miRNA/tf correlation statistics will have other applications to other databases and if the cRegulome.db (or Cistrome and miRCancer) database is not yet on Bioconductor (I didn't see the component DBs in an initial, cursory look).

Installation instructions: for the development version of package and any non-standard dependencies in README

Installed fine for me. A few notes from devtools --

From devtools::document():

  • Looks like the package dbplyr is only used in the vignette and manuscript
    -- consider moving to Suggests from Imports.
  • The R.utils::gunzip function is used in many places (vignette, manuscript,
    unrun example), but is not actually used in the package code. Consider moving
    to suggests or adding a roxygen2 line to import the function (in the
    get_tf() documentation: #' @importFrom R.utils gunzip.
  • RSQLite::SQLite() has the same situation as gunzip above, except that it
    is used in examples executed with devtools::check(). Consider adding an
    #' @importFrom RSQLite SQLite line to documentation of functions that use
    this function in examples.

Either of the @importFrom suggestions could be @import <pkgname> also,
though the former is preferable to avoid function name conflicts and reduce namespace clutter.

From devtools::check():

  • Add .git/ to the .Rbuildignore
  • "Package unavailable to check Rd xrefs: 'targetscan.Hs.eg.db' -- resolves if
    targetscan.hs.eg.db is installed from bioconductor (doesn't have to be
    loaded). Doesn't affect installation. Not sure the best solution to this one.

From devtools::spell_check():

  • Looks like some spelling errors still exist in the readme and at least one instance of 'to' instead of 'two' in the 'Database query' section of the vignette, but otherwise most pure spelling errors were resolved with recent commits.

Vignette(s) demonstrating major functionality that runs successfully locally

Runs successfully and demonstrates functionality. Suggestions:

  • Consider wrapping the database connection functionality into the package. get_db() could just as easily test for the existence of cRegulome.db and return a dbConnect connection to it. Requiring the user to understand DBI, RSQLite, R.utils::gunzip on top of figuring out your package seems like a tall order. Before making any changes though, look into Bioconductor's DB utilities and S4 classes.
  • At >2GB, cRegulome.db is too large to include in the package itself. Look into database hosting options. We host our multiMiR database on our university servers, but there are likely other cheap or free options.
  • venn.diagram overlays on top of cor_joy plot (line 250). If unintentional, add dev.off() somewhere to reset the plotting device.

Function Documentation: for all exported functions in R help

  • Function documentation overall looks good.
  • The naming scheme as.<s3 class name> is typical of s3 constructors -- consider using this, but again Bioconductor integration may require changing to S4 classes.
  • More details on the methods (citations and help page descriptions) would be helpful.

Function naming

  • Consider renaming cRegulome::tidy or importing and overloading broom::tidy

Summary

Overall the biggest thing to consider is Bioconductor integration and whether the database and its summary functions are independently novel enough to be better suited to two packages. On the latter point, I can see an argument for both approaches.

@MahShaaban

This comment has been minimized.

Copy link
Author

@MahShaaban MahShaaban commented Oct 19, 2017

Hi @PeteHaitch,

I think at this point I would go with the first option, submit to rOpenSci and CRAN. Integrating the package in the Bioconductor seems like it would need substantial changes that wouldn't be natural based on the choices I already made.

  • I followed your advice in adding the data needed for running the second vignette as a part of the package so to avoid downloading the whole database.

  • I fixed the notes that appeared in the R CMD build + check by adding the vignette packages to the Suggests section in the DESCRIPTION file.

  • The note about the size of the package persists, even after changing the compression of the included data and the vignette graphs. I am not sure what goes in the doc folder! I hope you have some advice for me here.

  • I edited the vignette and added some explanations to the graphs in the context they are used.

  • I added a method cor_igraph to make an igraph objects direclty from cmicroRNA and cTF.

  • I moved the database to figshare and updated the urls.

  • I linked the repo that contains the details and the scripts for building the database.

  • I edited the CONTRIBUTING file using the templates you suggested.

  • I used the TCGA names as provided by the Cistrome Cancer and the RTCGA package when building the database. The different naming might be the reason for the missing studies but I also know that not all studies are included in the Cistrome Cancer.

  • I changed the name of theprint method to cor_print to avoid the conflict.

Looking forward to hearing from you.

@MahShaaban

This comment has been minimized.

Copy link
Author

@MahShaaban MahShaaban commented Nov 17, 2017

Thanks @PeteHaitch for spotting these issues. I removed the include=FALSE and changed clusterProfiler to be installed using biocLite(). However, I still get the same note library' or 'require' call not declared from: ‘BiocInstaller!

@PeteHaitch

This comment has been minimized.

Copy link

@PeteHaitch PeteHaitch commented Nov 17, 2017

Adding BiocInstaller to Suggests in the DESCRIPTION fixes this (for reference, https://stackoverflow.com/questions/15648772/how-do-i-prevent-r-library-or-require-calls-not-declared-warnings-when-dev)

@MahShaaban

This comment has been minimized.

Copy link
Author

@MahShaaban MahShaaban commented Nov 17, 2017

Thanks @PeteHaitch. That solves the note.
I would like to ask you, @mmulvahill and @karthik if there are any other required or recommended changes I need to do.

@PeteHaitch

This comment has been minimized.

Copy link

@PeteHaitch PeteHaitch commented Nov 20, 2017

I'll take a look over it again later this week

@mmulvahill

This comment has been minimized.

Copy link

@mmulvahill mmulvahill commented Nov 20, 2017

Same here -- though I'm traveling most of this week. If I don't get to it this week, I will early next.

@karthik

This comment has been minimized.

Copy link
Member

@karthik karthik commented Nov 22, 2017

Thanks for all the thoughtful and detailed review comments @mmulvahill and @PeteHaitch!
@MahShaaban Thanks for following through promptly. I will do my editor check this week and wait till both reviewers also sign off before we move to the next step.

@karthik

This comment has been minimized.

Copy link
Member

@karthik karthik commented Dec 8, 2017

A gentle ping for @PeteHaitch and @mmulvahill

@mmulvahill

This comment has been minimized.

Copy link

@mmulvahill mmulvahill commented Dec 20, 2017

I haven't been able to get the package to install and build vignettes on a clean install (hence my delay responding). This appears to be due to a quirk in automating the bioconductor installs that I can't quite figure out.

In the 2nd update I link to below, I added some code for loading and updating with biocLite. Despite this, I still get an error with devtools::install("./cRegulome", build_vignettes = TRUE):

Error processing vignette 'case_study.Rmd' failed with diagnostics: object 'org.Hs.eg.db' not found

Separately, I would add if (!require()) install code for R.utils to using_cRegulome.Rmd.

update #1 (to using_cRegulome.Rmd)
update #2 (to case_study.Rmd)

Anyone have suggestions? If not, at this point I'm okay with moving forward.

@PeteHaitch

This comment has been minimized.

Copy link

@PeteHaitch PeteHaitch commented Dec 24, 2017

Hi @MahShaaban

The vignette still needs more careful proofreading. In my opinion, the vignettes should be the most useful parts of the package but they can be hard to follow in their current state.

For example in the 'Case Study' vignette, "In the previous section, we showed the code for obtaining the TF/microRNA-gene expression correlation in stomach and esophogeal cancer using Cistrome and miRCancerdb for comparison purpose." But it's not clear to me that this is indeed what was done in the previous section. There are also many typos; try running spelling::spell_check_package(vignettes = TRUE). Please take the time to carefully proofread both vignettes.

It should be made clear in the vignettes that some of what the user is viewing is based on the 'test' database. Otherwise they may be surprised that the output doesn't match their own when they run the code themselves using the full database. Even better may be to use the mocking strategy I suggested in #149 (comment) This shouldn't be too hard to set up.

As I noted in my initial review (#149 (comment)):

The get_db() function could avoid re-downloading the database for each project each time by using a cache. There are several R packages available for doing this, e.g. BiocFileCache.

I still think this is a really good idea, but I'm willing to move on at this point.

Some minor points:

  • Could get_db() also take care of unzipping the database? This would simplify the setup process.
  • In the 'Case Study' vignette, conn <- dbConnect(SQLite(), '~/microRNA/cRegulomedb/cRegulome.db') is not portable since it references a path on your own computer

Cheers,
Pete

@MahShaaban

This comment has been minimized.

Copy link
Author

@MahShaaban MahShaaban commented Dec 28, 2017

Thanks @mmulvahill & @PeteHaitch for these comments.
I will work on them and follow with a commit soon.
Happy holidays
Mahmoud

@MahShaaban

This comment has been minimized.

Copy link
Author

@MahShaaban MahShaaban commented Jan 4, 2018

Here are a few changes I made to address the issues with the installation and the vignette.

Code changes

  • get_db now handles the unzipping process. So I moved R.utils to the Imports section in the DESCRIPTION file. This solved the problem of the installation. I successfully installed the package on rocker/verse by either devtools::install_github('MahShaaban/cRegulome', build_vignette=TRUE) and locally devtools::install(build_vignette=TRUE)
  • I looked the BiocFileCache but couldn't figure out a solution for using it in get_db, at least for now

Vignettes

  • I fixed the line conn <- dbConnect(SQLite(), '~/microRNA/cRegulomedb/cRegulome.db') to conn <- dbConnect(SQLite(), './cRegulome.db')
  • I reviewed it and ran the spelling::spell_check_package. Finally, I mentioned the fact that the vignette is using a test set early on

I hope these changes solve the issues raised by @mmulvahill and @PeteHaitch

@PeteHaitch

This comment has been minimized.

Copy link

@PeteHaitch PeteHaitch commented Jan 30, 2018

I submitted a pull request fixing the vignette metadata to ensure it's built and installed as part of R CMD build/check https://github.com/MahShaaban/cRegulome/pull/1 (also fixes some typos)

I tried running through the "Using cRegulome" vignette as a new user might, but it unfortunately does not work. Specifically, if you build the vignette and try running through the steps at https://github.com/MahShaaban/cRegulome/blob/master/vignettes/using_cRegulome.Rmd#getting-started, you'll get an error:

> # load required libraries
> library(cRegulome)
> library(RSQLite)
> library(ggplot2)
> # download the db file when using it for the first time
> if(!file.exists('cRegulome.db')) {
+     get_db(test = TRUE)
+ }
> 
> # connect to the db file
> conn <- dbConnect(SQLite(), 'cRegulome.db')
> # enter a custom query with different arguments
> dat <- get_mir(conn,
+                mir = 'hsa-let-7g',
+                study = 'STES',
+                min_abs_cor = .3,
+                max_num = 5)
Error: Unknown column `STES` 

This is because the test database you connect to as a vignette user (in the prepare database chunk with eval = FALSE) is not the same database as that used by the vignette itself (in the connect_db chunk with include = FALSE). I can't link to the exact lines because of how GitHub renders .Rmd files, but that should be enough to identify the problem. You need to ensure these are the same database for this type of 'mock' database download/connection to work properly.

You can try building and installing the package with vignettes (devtools::install_github('MahShaaban/cRegulome', build_vignettes = TRUE) and then following the build vignette as a new user might.

Incidentally, you may wish to update the installation instructions in the README.md to use devtools::install_github('MahShaaban/cRegulome', build_vignettes = TRUE) to ensure the user gets the vignettes (these will be build and installed automatically when the package is on CRAN).

@MahShaaban

This comment has been minimized.

Copy link
Author

@MahShaaban MahShaaban commented Jan 30, 2018

Thanks @PeteHaitch for the fixes, very helpful as always.
I added the line devtools::install_github('MahShaaban/cRegulome', build_vignettes = TRUE) to the read me as you suggested.
I updated the file and the url so that get_db(test = TRUE) would download the exact file that is used in the package and test running the vignettes and both chunks prepare dataset and connect_db connect to identical files.

@PeteHaitch

This comment has been minimized.

Copy link

@PeteHaitch PeteHaitch commented Feb 3, 2018

Hi @MahShaaban,

Thanks for the changes. I was able to work through the vignette like a new user.

I have gone through the rOpenSci package review checklist and I am happy to recommend approving cRegulome 🎉

Below are a few cosmetic suggestions/changes (some with PRs) that you might consider:

  • cor_print() could just be print() (https://github.com/MahShaaban/cRegulome/pull/3)
  • venn.diagram() -> cor_venn_diagram() for consistency with other generics (https://github.com/MahShaaban/cRegulome/pull/4)
  • cor_plot() might scale point size in (-1, 1) rather than relying on the range of the input data. For the example in the vignette, visual difference exaggerates actual difference in correlation values
  • Vignette text and code might be out of sync?
    • In the 'Pathway enrichment analysis' section:
      • "Clusters 1 and 3" seems to be clusters 3 and 4?
      • Why no KEGG enrichment results for other clusters?
      • "enrichment of 13 and 7 KEGG pathways respectively" doesn't seem to match the numbers in the figure
      • You might derive the numbers in the text from data to ensure they match the output using inline R code chunks
  • "We didn’t attempt to replicate the same findings but nevertheless we arrived at similar conclusions."
    • It's not clear to me what the conclusions your example analysis are or how they compare to Shi et al.'s. You might summarise each in the final section of the vignette

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

  • A statement of need clearly stating problems the software is designed to solve and its target audience in README
  • Installation instructions: for the development version of package and any non-standard dependencies in README
  • Vignette(s) demonstrating major functionality that runs successfully locally
  • Function Documentation: for all exported functions in R help
  • Examples for all exported functions in R Help that run successfully locally
  • Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).
For packages co-submitting to JOSS

Package not being co-submitted to JOSS

The package contains a paper.md matching JOSS's requirements with:

  • A short summary describing the high-level functionality of the software
  • Authors: A list of authors with their affiliations
  • A statement of need clearly stating problems the software is designed to solve and its target audience.
  • References: with DOIs for all those that have one (e.g. papers, datasets, software).

Functionality

  • Installation: Installation succeeds as documented.
  • Functionality: Any functional claims of the software been confirmed.
  • Performance: Any performance claims of the software been confirmed.
  • Automated tests: Unit tests cover essential functions of the package
    and a reasonable range of inputs and conditions. All tests pass on the local machine.
  • Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

Final approval (post-review)

  • The author has responded to my review and made changes to my satisfaction. I recommend approving this package.
@MahShaaban

This comment has been minimized.

Copy link
Author

@MahShaaban MahShaaban commented Feb 4, 2018

Thanks @PeteHaitch for these pull requests and the comments.

  • I merged the three requests you made, they certainly improve the consistency
  • I made changes in the vignette to sync the text with the graphs
  • The function clusterProfiler::compareCluster() returns data for the clusters that has significant enrichment hits only. In this case the 3 other clusters don't have any
  • In the motivation section there was some mistakes probably due to a previous editing. The author's conclusion was that the paper presented a proof of concept for using the TF-microRNA co-regulation networks to identify critical regulators in cancer
  • Finally, I experimented with changing the scale of the points in cor_plot. I tried using ggplot2::scale_size(range = c(-1,1)), the resulting points were very small and not any easier to compare. changing the range resulted in almost similar sizes as without using scale_size at all!

I really appreciate the time and effort you put in reviewing the package. The comments and the changes you made helped a lot.

Thanks again

Mahmoud

@mmulvahill

This comment has been minimized.

Copy link

@mmulvahill mmulvahill commented Feb 5, 2018

I second @PeteHaitch's recommendation of approving cRegulome! The package, including both vignettes, now builds, installs, and runs successfully for me on a fresh system. Thanks for your responsiveness @MahShaaban.

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

  • A statement of need clearly stating problems the software is designed to solve and its target audience in README
  • [x ] Installation instructions: for the development version of package and any non-standard dependencies in README
  • [x ] Vignette(s) demonstrating major functionality that runs successfully locally
  • Function Documentation: for all exported functions in R help
  • Examples for all exported functions in R Help that run successfully locally
  • Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).
For packages co-submitting to JOSS

Not applicable

The package contains a paper.md matching JOSS's requirements with:

  • A short summary describing the high-level functionality of the software
  • Authors: A list of authors with their affiliations
  • A statement of need clearly stating problems the software is designed to solve and its target audience.
  • References: with DOIs for all those that have one (e.g. papers, datasets, software).

Functionality

  • Installation: Installation succeeds as documented.
  • Functionality: Any functional claims of the software been confirmed.
  • Performance: Any performance claims of the software been confirmed.
  • Automated tests: Unit tests cover essential functions of the package
    and a reasonable range of inputs and conditions. All tests pass on the local machine.
  • Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

Final approval (post-review)

  • The author has responded to my review and made changes to my satisfaction. I recommend approving this package.
@karthik

This comment has been minimized.

Copy link
Member

@karthik karthik commented Feb 5, 2018

Thanks a ton @PeteHaitch and @mmulvahill! We are very grateful for your time, expertise, and attention to detail. 🙏👏

@karthik

This comment has been minimized.

Copy link
Member

@karthik karthik commented Feb 7, 2018

@MahShaaban Your badge should update soon to peer-reviewed. Congrats on your package being accepted! 🎉

Here are your next steps:

Please also add a footer to the bottom of your README

[![](http://www.ropensci.org/public_images/github_footer.png)](http://ropensci.org)
  • Please accept my invitation to join a team on the ropensci github org. This will allow you to transfer the repo. Once it's transferred, I'll give you write access there so you can update the CI badges.
  • Fix any links in badges for CI and coverage to point to the ropensci URL. (We'll turn on the services on our end as needed)

Once moved, please re-run all checks in preparation for submission to CRAN. I can help with this if you run into any issues.

Welcome aboard! We'd also love a blog post about your package, either a short-form intro to it (https://ropensci.org/tech-notes/) or long-form post with more narrative about its development. ((https://ropensci.org/blog/). If you are, @stefaniebutland will be in touch about content and timing.

@MahShaaban

This comment has been minimized.

Copy link
Author

@MahShaaban MahShaaban commented Feb 7, 2018

Thanks again @PeteHaitch, @mmulvahill and @karthik.
I added the footer to the README and waiting for your invitation to transfer the repo.

@stefaniebutland

This comment has been minimized.

Copy link

@stefaniebutland stefaniebutland commented Feb 7, 2018

Hello @MahShaaban and congratulations on your package being accepted! I'm rOpenSci's Community Manager. As Karthik said, we'd love to have a blog post about cRegulome. Here is some technical and editorial info: https://github.com/ropensci/roweb2#contributing-a-blog-post. Let me know if you're interested. Happy to help as needed.

@MahShaaban

This comment has been minimized.

Copy link
Author

@MahShaaban MahShaaban commented Feb 8, 2018

Thanks @stefaniebutland. I am certainly considering the blog post. I will look into it as soon as I can.

@maelle

This comment has been minimized.

Copy link
Member

@maelle maelle commented Feb 11, 2018

@MahShaaban I see that you've now transferred the repo, fantastic! Welcome! 🎉

  • I've activated Appveyor again, the badge is [![Build status](https://ci.appveyor.com/api/projects/status/gcmojtcsyt7rcwtk?svg=true)](https://ci.appveyor.com/project/ropensci/cregulome)

  • I'm closing this since the package was approved, but feel free to comment here (or ping Stefanie elsewhere) when you're ready to write a blog post. 😺

@maelle maelle closed this Feb 11, 2018
@MahShaaban

This comment has been minimized.

Copy link
Author

@MahShaaban MahShaaban commented Feb 11, 2018

Thanks @maelle. After I transferred the repo I updated the the ci links including that for Appveyor. It seems to be working; builds for the commits and the badge links to the correct page. Do I need to update the badge again with the link you provided in the previous comment?

@maelle

This comment has been minimized.

Copy link
Member

@maelle maelle commented Feb 11, 2018

Oh if it points to the correct builds then no. ☺

@maelle

This comment has been minimized.

Copy link
Member

@maelle maelle commented Feb 11, 2018

But check e.g after your next commit just to be sure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants
You can’t perform that action at this time.