Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

citesdb: A high-performance database of shipment-level CITES trade data #292

Closed
15 of 25 tasks
noamross opened this issue Apr 5, 2019 · 25 comments
Closed
15 of 25 tasks

Comments

@noamross
Copy link
Contributor

noamross commented Apr 5, 2019

Submitting Author: Noam Ross (@noamross)
Repository: https://github.com/ecohealthalliance/citesdb
Version submitted: 0.1.0
Editor: @melvidoni
Reviewer 1: @xavi-rp
Reviewer 2: @pachamaltese
Archive: https://zenodo.org/record/3234871
Version accepted: 0.2.0


  • Paste the full DESCRIPTION file inside a code block below:
Package: citesdb
Version: 0.1.0
Title: A high-performance database of shipment-level CITES trade data
Description: Provides convenient access to over 40 years and 20 million records of
    endangered wildlife trade data from the Convention on International Trade
    in Endangered Species of Wild Fauna and Flora, stored on a local on-disk,
    out-of memory 'MonetDB' database for bulk analysis.
Authors@R: c(
    person("Noam", "Ross", , "ross@ecohealthalliance.org", role = c("aut", "cre"), comment = c(ORCID = "0000-0002-2136-0000")),
    person("Evan A.", "Eskew", , "eskew@ecohealthalliance.org", role = c("aut"), comment = c(ORCID = "0000-0002-1153-5356")),
    person("EcoHealth Alliance", role = "cph"))
License: MIT + file LICENSE
Encoding: UTF-8
LazyData: true
ByteCompile: true
URL: https://github.com/ecohealthalliance/citesdb, https://www.cites.org/
BugReports: https://github.com/ecohealthalliance/citesdb/issues
Imports:
  MonetDBLite,
  rappdirs,
  DBI,
  httr,
  R.utils,
  purrr,
  tools,
  dplyr,
  dbplyr
Suggests:
  spelling,
  testthat,
  roxygen2,
  knitr,
  rmarkdown,
  rstudioapi,
  lintr,
  callr,
  here,
  ggplot2,
  rcites
RoxygenNote: 6.1.1
Roxygen: list(markdown = TRUE)
Language: en-US
VignetteBuilder: knitr

Scope

  • Please indicate which category or categories from our package fit policies this package falls under: (Please check an appropriate box below. If you are unsure, we suggest you make a pre-submission inquiry.):

    • data retrieval
    • data extraction
    • database access
    • data munging
    • data deposition
    • reproducibility
    • geospatial data
    • text analysis
  • Explain how and why the package falls under these categories (briefly, 1-2 sentences):

citesdb gives researchers access to the large CITES wildlife trade dataset by syncing it to a local out-of-memory trade database and provides an interface to that local database.

  • Who is the target audience and what are scientific applications of this package?

Researchers and conservationists studying the wildlife trade.

Our deprecated cites package used a different framework to access a smaller subset of the data. The rcites package provides complementary data.

  • If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted.

Technical checks

Confirm each of the following by checking the box. This package:

I note the relatively low test coverage (~65%) is due to a verbose section of code configuring the RStudio interactive interface. Other than this, the remaining code has >90% coverage.

Publication options

JOSS Options
  • The package has an obvious research application according to JOSS's definition.
    • The package contains a paper.md matching JOSS's requirements with a high-level description in the package root or in inst/.
    • The package is deposited in a long-term repository with the DOI: 10.5281/zenodo.2630836
    • (Do not submit your package separately to JOSS)
MEE Options
  • The package is novel and will be of interest to the broad readership of the journal.
  • The manuscript describing the package is no longer than 3000 words.
  • You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see MEE's Policy on Publishing Code)
  • (Scope: Do consider MEE's Aims and Scope for your manuscript. We make no guarantee that your manuscript will be within MEE scope.)
  • (Although not required, we strongly recommend having a full manuscript prepared when you submit here.)
  • (Please do not submit your package separately to Methods in Ecology and Evolution)

Code of conduct

Recommended reviewers: karinorman will be familiar with the database used here via her work on taxadb, JonasGeschke and KevCaz will know about the data and their application through their work on rcites. Feedback from someone involved in metadata projects - e.g.: Bryce Mecum, Auriel Fournier, Kelly Hondula - would also be valuable.

@noamross noamross changed the title A high-performance database of shipment-level CITES trade data citesdb: A high-performance database of shipment-level CITES trade data Apr 5, 2019
@melvidoni melvidoni self-assigned this Apr 7, 2019
@melvidoni
Copy link
Contributor

melvidoni commented Apr 7, 2019

Thank you for submitting, @noamross! I will be your editor, so please hold on while I perform editor checks.

@melvidoni
Copy link
Contributor

melvidoni commented Apr 8, 2019

Editor checks:

  • Fit: The package meets criteria for fit and overlap
  • Automated tests: Package has a testing suite and is tested via Travis-CI or another CI service.
  • License: The package has a CRAN or OSI accepted license
  • Repository: The repository link resolves correctly
  • Archive (JOSS only, may be post-review): The repository DOI resolves correctly
  • Version (JOSS only, may be post-review): Does the release version given match the GitHub release (v1.0.0)?

Editor comments

Thanks for your submission @noamross!

As the output from goodpractice is perfect I'll start finding reviewers for this.

♥ Ho-ho! Slick package! Keep up the stunning work!

Reviewers: @pachamaltese @xavi-rp
Due date: TBD

@melvidoni
Copy link
Contributor

melvidoni commented Apr 15, 2019

Hello @noamross! @pachamaltese and @xavi-rp have been assigned as reviewers. They will have until May 6th to review the package.

@pachadotdev
Copy link

pachadotdev commented Apr 15, 2019

Hi @noamross, @melvidoni and @xavi-rp
I started to install the package. One immediate thing to fix is to add the ROpenSci badge so please paste this code next to CI badge and regenerate the README:

[![](https://badges.ropensci.org/292_status.svg)](https://github.com/ropensci/onboarding/issues/292)

@eveskew
Copy link

eveskew commented Apr 16, 2019

Hi @pachamaltese, that should be taken care of now.

@pachadotdev
Copy link

pachadotdev commented Apr 16, 2019

Package Review

  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

  • A statement of need clearly stating problems the software is designed to solve and its target audience in README

  • Installation instructions: for the development version of package and any non-standard dependencies in README

  • Vignette(s) demonstrating major functionality that runs successfully locally

  • Function Documentation: for all exported functions in R help

  • Examples for all exported functions in R Help that run successfully locally

  • Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

Functionality

  • Installation: Installation succeeds as documented.

  • Functionality: Any functional claims of the software been confirmed. Checked yes but please check the vignettes examples!

  • Performance: Any performance claims of the software been confirmed.

  • Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.

  • Packaging guidelines: The package conforms to the rOpenSci packaging guidelines _I

Final approval (post-review)

  • The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 4 hours

Review Comments

I enjoyed reviewing the citesdb package. I found it easy to use.

Below I've highlighted some observations that might make this package even better.

Documentation

Package Metadata

  • I appreciate that the authors included the ORCID ids!
  • What I like from the metadata is that print(citation("citesdb"), bibtex = TRUE) works ok and returns a proper BibTeX entry with no warnings.

Package Documentation

  • The GIF is a nice touch!

Vignettes

  • The vignettes couldn't be generated. The problem is described here: Connection is garbage-collected, use dbDisconnect() to avoid this citesdb#1
  • While I get "no vignettes found" due to the errors above, I read the Rmd file instead of the rendered vignette. The bibliography is well formatted and I like that you used the brackets in a correct way. This shall ease proper citation and identifying sources.
  • With respect to the vignettes contents I'd change the installation part a bit and use devtools::install_github(), and also install.packages() if you plan to submit to CRAN later.
  • I consider really useful that you warn on disk space usage, and also the vignette covers the case when the local database is missing or removed, and how to fix that problem.
  • The examples can be improved a bit. For example, the part that says "Note that running collect() on all of cites_shipments() will load a >3 GB data frame into memory!" is something that I would prefer to re-state the example than to include that kind of warning. For example, and given that the package can be used with dplyr, I would obtain an additional benefit from the SQL connections and use dplyr functions to return just what I want instead of loading 3GB in RAM and then filtering (i.e. see https://db.rstudio.com/dplyr/)
  • In the same line with performant examples, I'd run something like this
    library(citesdb)
    library(dplyr)
    
    con <- cites_db()
    DBI::dbListTables(con)
    
    cites_shipments_db <- tbl(con, "cites_shipments")
    
    cites_shipments_db
    
    example_query <- cites_shipments_db %>% 
    select(Year, Appendix, Taxon, Exporter) %>% 
    filter(Appendix == "II" & Exporter == "DE")
    
    example_query
    
    example_query_tibble <- as_tibble(example_query)
    
    example_query_tibble
  • The integration with MonetDBLite is well covered and explains the existing limitations.
  • If there's something I really like from this vignette is that even when it is full of latin names and technical words, I was able to understand it by being just concentrated. Well done!

Function Documentation

  • The functions are well covered and I had no problem reading the documentation.

Functionality / Implementation

External Dependencies

Section 1.11 of rOpenSci's packaging guides reinforces the good practice of trying to limit
external dependencies where possible.
This is a good practice not only for the health and longevity of your package but also for marketing purposes; sometimes people working on servers or behind proxies
will have limited ability to install packages so fewer dependencies may enable more users.
I think this aspect is ok and I wouldn't touch the dependencies.

Readability / Modularity

  • With respect to a personal taste, I'd love to see the glue package in use here. It can make the code more readable. It could be better than using paste0 and sprintf in some cases.
  • I sent a PR where I applied styler to the code to make it more neat applied styler to code citesdb#3 but two functions need additional review since styler returns an error. The rest of the functions got some minimal re-arrangements mostly related to spacing and line width.

Technical Implementation Details

  • I tested on Ubuntu 18.04. I would like to get a Windows laptop and check the package from there to see if I would need R Tools or something special to run the package.
  • check_status should be renamed since it's too generic and cites_check_status or similar should be used.

Testing

Error Messages

  • The error messages are helpful and have no idea to improve then since the texts are complete and informative

General comments

Like I stated above, there are some fixes that are needed and may improve the package a lot.

The code is easy to read, without strange lines, and my overall impression of it is positive.

65% of test coverage seems to be very low. By adding some elementary checks, this number can be increased a lot.

Unlike my 1st submission to rOpenSci, most of the functions in citedb have adecuate names and can be easily distinguished as those names are not generic (i.e. "filter" or "download").

@noamross
Copy link
Contributor Author

noamross commented Apr 23, 2019

Note to reviewers: The MonetDBLite package has been (hopefully temporarily) archived on CRAN, so I have switched the citesdb to using a stable GitHub version of MonetDBLite via the Remotes: field in the DESCRIPTION file.

@xavi-rp
Copy link

xavi-rp commented May 5, 2019

Hi @noamross and @melvidoni,
Thanks for giving me the opportunity to review citesdb. I think it's a very nice and useful package. I've enjoyed (and learned!) a lot reviewing it.
Please take my comments only as suggestions as I think it's already very well written.

Package Review

  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

  • A statement of need clearly stating problems the software is designed to solve and its target audience in README

  • Installation instructions: for the development version of package and any non-standard dependencies in README

  • Vignette(s) demonstrating major functionality that runs successfully locally

    • It gives an error while running devtools::check() (see comments below). However, reading the Rmd file and running the examples in it manually, it can be seen that they work perfectly.
  • Function Documentation: for all exported functions in R help

  • Examples for all exported functions in R Help that run successfully locally

  • Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

Functionality

  • Installation: Installation succeeds as documented.
  • Functionality: Any functional claims of the software been confirmed.
  • Performance: Any performance claims of the software been confirmed.
  • Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
  • Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

Final approval (post-review)

  • The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 8 (a lot of time spent to try to understand the problem with the vignettes)


Review Comments

I find this package easy to install/use and very useful to explore and use the data from the CITES Trade DB, so congrats to the authors!

It's very useful the inclusion of the metadata functions. And also the gif!!!

The main problem that I found is in the vignette building while running devtools::check(). See ropensci/citesdb#1 (comment)

In devtools::test(), the lints test fails. Already reported by @pachamaltese here: ropensci/citesdb#2
These are the results of checking the lints with lintr::lint_package() (most of them are probably irrelevant; except maybe #13)

> library(lintr)
> as.data.frame(lint_package())

              filename line_number column_number    type
1          R/connect.R         114            13   style
2          R/connect.R         121            13   style
3          R/connect.R         128            13   style
4          R/connect.R         149            14   style
5  R/connection-pane.R          34            14   style
6  R/connection-pane.R          65            64   style
7  R/connection-pane.R          85            14   style
8         R/download.R          41             7   style
9         R/download.R          42             5   style
10        R/download.R          46             3   style
11        R/download.R          49             5   style
12        R/download.R          59             3   style
13        R/download.R         117            52 warning
14        R/onattach.R          24            13   style
15        R/onattach.R          25             5   style
16        R/onattach.R          42             7   style
17        R/onattach.R          43             5   style
18        R/onattach.R          70            22   style
                                                message
1  Variable and function names should be all lowercase.
2  Variable and function names should be all lowercase.
3  Variable and function names should be all lowercase.
4  Variable and function names should be all lowercase.
5  Variable and function names should be all lowercase.
6  Variable and function names should be all lowercase.
7  Variable and function names should be all lowercase.
8  Variable and function names should be all lowercase.
9  Variable and function names should be all lowercase.
10 Variable and function names should be all lowercase.
11 Variable and function names should be all lowercase.
12 Variable and function names should be all lowercase.
13                           Do not use absolute paths.
14 Variable and function names should be all lowercase.
15 Variable and function names should be all lowercase.
16 Variable and function names should be all lowercase.
17 Variable and function names should be all lowercase.
18 Variable and function names should be all lowercase.
                                                                           line
1                          as_tibble(dbReadTable(cites_db(), "cites_metadata"))
2                             as_tibble(dbReadTable(cites_db(), "cites_codes"))
3                           as_tibble(dbReadTable(cites_db(), "cites_parties"))
4                               observer$connectionClosed("CITESDB", "citesdb")
5                                                    observer$connectionOpened(
6                             paste("SELECT * FROM", table, "LIMIT", rowLimit))
7                          observer$connectionUpdated("CITESDB", "citesdb", "")
8                                     if (dbExistsTable(cites_db(), tblname)) {
9                                            dbRemoveTable(cites_db(), tblname)
10               dbCreateTable(cites_db(), tblname, fields = cites_field_types)
11                                                                   dbExecute(
12   dbWriteTable(cites_db(), "cites_status", make_status_table(version = ver),
13                   paste0("https://api.github.com/repos/", repo, "/releases")
14                                        for (t in dbListTables(cites_db())) {
15                                                 dbRemoveTable(cites_db(), t)
16                          if (dbExistsTable(cites_db(), "cites_shipments") &&
17                                 dbExistsTable(cites_db(), "cites_status")) {
18                       suppressMessages(dbWriteTable(cites_db(), tblnames[i],
                  linter
1      camel_case_linter
2      camel_case_linter
3      camel_case_linter
4      camel_case_linter
5      camel_case_linter
6      camel_case_linter
7      camel_case_linter
8      camel_case_linter
9      camel_case_linter
10     camel_case_linter
11     camel_case_linter
12     camel_case_linter
13 absolute_paths_linter
14     camel_case_linter
15     camel_case_linter
16     camel_case_linter
17     camel_case_linter
18     camel_case_linter

The spelling errors found after running spelling::spell_check_package() and spelling::spell_check_files("README.Rmd") are irrelevant.

Other minor comments:

  • I would like to know more easily where the DB is stored; for instance when calling cites_status()

  • If you do cites_db_delete() and then again cites_db_download(), the Size on Disk becomes 2.3 Gb instead of 1.1Gb. I don't know why this happens, but it's not nice...

  • cites_pane() @examples L29-30. It should be in the same line (or using {}).

  • I would prefer single examples for cites_metadata(), cites_codes() and cites_parties()

  • Use Title Case in the package title

Congrats again to the authors!!

@noamross
Copy link
Contributor Author

noamross commented May 6, 2019

Thank you for your careful reviews, @pachamaltese and @xavi-rp! We'll get to work on these.

@noamross
Copy link
Contributor Author

noamross commented May 22, 2019

Response to reviewers

Thanks to both @pachamaltese and @xavi-rp for their comments and issues that have
helped us to make a more user-friendly package.

Changes implemented

  • We have changed the installation method in the README to use devtools.

  • We have added information to the cites_status table indicating the location
    of the database files on disk.

  • We now clear out and disconnect the database tables after updating, which
    should trigger disk cleanup and avoid doubling database size.

  • Thanks for the reviewers for sticking with us through a long and merry chase
    of a bug that was preventing package-building in
    Connection is garbage-collected, use dbDisconnect() to avoid this citesdb#1, which had its origins
    in a missing token for use with the rcites package as well as a low-level
    database lock issue. We now cache this the rcites information to avoid
    having to make remote calls in vignette-building, and also have resolved the
    DB locking conflict.

  • We have modified our linter tests to avoid the false positives shown in
    Fix testhat path citesdb#2.

  • We have renamed the internal function check_status() to cites_check_status()
    to be less generic.

  • We have made more elaborate help and examples for the cites_db(),
    cites_shipments(), and metadata functions to illustrate their use and
    distinguish between dplyr- and DBI-based workflows.

  • We've made some small changes to the vignette text and references.

Changes not implemented / justifications

  • We have opted not to use the glue package and instead stick with the base
    R paste() functions in the interest of limiting dependencies. We believe
    this is a minor trade-off.

  • The low test coverage shown in goodpractice output shows test coverage should be extended citesdb#4
    is due to tests skipped on CRAN. Setting the environment variable to
    NOT_CRAN=true shows that our test coverage is ~61%. This is lower than
    typical, but as we note in the issue above, this is largely due to the
    verbose code for interacting with the RStudio connection pane, which cannot be
    tested except in an interactive session. Other code coverage is
    85%.
    We believe that all core functionality is tested, including important
    edge-cases and conditions not reflected in the coverage statistic, such as
    error handling in multiple sessions and changing up upstream data sources.

A note: we have learned from the maintainers of MonetDBLite that it will
not be returning to CRAN (its current iteration fails on R-devel), but they are
working on a successor embedded database package that will replace it and go
to CRAN later this year. So, for now, we will host this package on GitHub and
replace the database back-end and send to CRAN when the successor package is ready.
We've added a note to the README and vignette showing that users need package build tools for
the current version.

@xavi-rp
Copy link

xavi-rp commented May 25, 2019

Hi,
I think that the authors have adressed all my review comments and am happy with their changes. Therefore, I recomend citesdb for approving.
Again, thank you so much to the editor for giving me the opportunity to review such a nice package and congratulations to the authors!!

@melvidoni
Copy link
Contributor

melvidoni commented May 26, 2019

Thank you @xavi-rp, noted. @pachamaltese what are your thoughts?

@pachadotdev
Copy link

pachadotdev commented May 26, 2019

@melvidoni I totally agree with this:

Hi,
I think that the authors have adressed all my review comments and am happy with their changes. Therefore, I recomend citesdb for approving.
Again, thank you so much to the editor for giving me the opportunity to review such a nice package and congratulations to the authors!!

@melvidoni
Copy link
Contributor

melvidoni commented May 26, 2019

Approved! Thanks @noamross for submitting and @pachamaltese and @xavi-rp for your reviews!

To-dos:

  • Transfer the repo to rOpenSci's "ropensci" GitHub organization under "Settings" in your repo. I have invited you to a team that should allow you to do so. You'll be made admin once you do.
  • Add the rOpenSci footer to the bottom of your README
    " [![ropensci_footer](https://ropensci.org/public_images/ropensci_footer.png)](https://ropensci.org)"
  • Fix any links in badges for CI and coverage to point to the ropensci URL. We no longer transfer Appveyor projects to ropensci Appveyor account so after transfer of your repo to rOpenSci's "ropensci" GitHub organization the badge should be [![AppVeyor Build Status](https://ci.appveyor.com/api/projects/status/github/ropensci/pkgname?branch=master&svg=true)](https://ci.appveyor.com/project/individualaccount/pkgname).
  • We're starting to roll out software metadata files to all ropensci packages via the Codemeta initiative, see https://github.com/ropensci/codemetar/#codemetar for how to include it in your package, after installing the package - should be easy as running codemetar::write_codemeta() in the root of your package.

Should you want to acknowledge your reviewers in your package DESCRIPTION, you can do so by making them "rev"-type contributors in the Authors@R field (with their consent). More info on this here.

Welcome aboard! We'd love to host a blog post about your package - either a short introduction to it with one example or a longer post with some narrative about its development or something you learned, and an example of its use. If you are interested, review the instructions, and tag @stefaniebutland in your reply. She will get in touch about timing and can answer any questions.

We've started putting together a gitbook with our best practice and tips, this chapter starts the 3d section that's about guidance for after onboarding. Please tell us what could be improved, the corresponding repo is here.

@pachadotdev
Copy link

pachadotdev commented May 26, 2019

@melvidoni IMO there's an additional pending: to apply rOpenSci themes to their vignettes

@melvidoni
Copy link
Contributor

melvidoni commented May 26, 2019

@pachamaltese You just told me before that it was ready to be approved...

@noamross
Copy link
Contributor Author

noamross commented May 29, 2019

Thanks all very much, I believe the RO theme will be applied automatically upon repository transfer and building on docs.ropensci.org.

@pachamaltese and @xavi-rp, we'd like to acknowledge you as rev authors in our DESCRIPTION if that's all right. I believe your ORCiDs are https://orcid.org/0000-0003-2046-0621 (Xavier) and 0000-0003-1017-7574 (Pachá), correct?

@xavi-rp
Copy link

xavi-rp commented May 29, 2019

Yes, that's my orcid! Thanks!!

@pachadotdev
Copy link

pachadotdev commented May 29, 2019

@noamross
Copy link
Contributor Author

noamross commented May 29, 2019

I've checked off everything except the footer, which I'm leaving off in anticipation of a change in workflow: ropensci/software-review-meta#79, and will be in touch with Stefanie about a blog post. Thanks all!

@pachadotdev
Copy link

pachadotdev commented May 29, 2019

@maelle
Copy link
Member

maelle commented May 29, 2019

@pachamaltese we are about to change the guidance actually because @jeroen has been working on centrally building all docs websites so authors do not need to worry about this. The pkgdown config is used but not for styling since rotemplate is used for all. See e.g. docs.ropensci.org/magick.

@maelle
Copy link
Member

maelle commented May 29, 2019

@pachadotdev
Copy link

pachadotdev commented May 29, 2019

@maelle
Copy link
Member

maelle commented May 29, 2019

Indeed, very recent change!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants