New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rebi - an R-Client for Europe PMC #29

Closed
njahn82 opened this Issue Mar 11, 2016 · 17 comments

Comments

Projects
None yet
4 participants
@njahn82
Copy link
Member

njahn82 commented Mar 11, 2016

    1. What does this package do? (explain in 50 words or less)

rebi gives access to Europe PubMed Central, an indexing service for life-
science publications that is provided by the European Bioinformatics Institute
(EBI). This client can be used to search metadata and full-texts, retrieve
reference sections, citations, text-mined terms, and links to other EBI
databases or external sources like the Wikipedia.

    1. Paste the full DESCRIPTION file inside a code block below.
Package: rebi
Version: 0.0.999
Maintainer: 'Najko Jahn' <najko.jahn@uni-bielefeld.de>
Author: Najko Jahn
License: GPL-3
Title: rebi -- R Interface for Europe PMC RESTful Web Service
URL: http://github.com/njahn82/rebi/
BugReports: http://github.com/njahn82/rebi/issues
Description: R Interface to Europe PMC RESTful Web Service. Europe PMC covers
    life science literature and it gives access to open access full texts.
    Coverage is not only restricted to Europe, but articles and abstracts are
    indexed from all over the world. As a partner in the PMC International
    (PMCi), Europe PMC ingests all PubMed content and extends its index with
    other sources, including Agricola, a bibliographic database of citations to
    the agricultural literature, or Biological Patents. In addition to
    bibliographic data, rebi gives automated access to citations and references
    that were indexed by Europe PMC. Links between life-science literature and
    other EBI databases, including ENA, PDB or ChEMBL are also accessible.
    External links provide a mechanism to link out to external providers such as
    Wikipedia or research data repositories.
LazyLoad: yes
LazyData: yes
VignetteBuilder: knitr
Depends:
  R (>= 3.00)
Imports:
    httr,
    jsonlite,
    plyr,
    xml2
RoxygenNote: 5.0.1
Suggests:
    testthat,
    knitr,
    rmarkdown

    1. URL for the package (the development repository, not a stylized html page)

https://github.com/njahn82/rebi

    1. What data source(s) does it work with (if applicable)?

It only works with Europe PMC's Articles RESTful API.
https://europepmc.org/RestfulWebService

    1. Who is the target audience?

Life-science researchers and students, scholars that study the life sciences,
librarians.

    1. Are there other R packages that accomplish the same thing? If so, what is different about yours?

To my knowledge, this is the only package that implements Europe PMC's
Articles RESTful API. Since there is a considerable overlap with PubMed/PubMed
Central, the rentrez package could be used as well to fetch bibliographic
information. fulltext package gives access to supplementary material
deposited in Europe PMC. oai package could be used to retrieve metadata from
Europe PMC via its OAI-PMH interface.

    1. Check the box next to each policy below, confirming that you agree. These are mandatory.
    • This package does not violate the Terms of Service of any service it interacts with.
    • The repository has continuous integration with Travis CI and/or another service
    • The package contains a vignette
    • The package contains a reasonably complete README with devtools install instructions
    • The package contains unit tests
    • The package only exports functions to the NAMESPACE that are intended for end users
    1. Do you agree to follow the rOpenSci packaging guidelines? These aren't mandatory, but we strongly suggest you follow them. If you disagree with anything, please explain.
    • Are there any package dependencies not on CRAN?
    • Do you intend for this package to go on CRAN?
    • Does the package have a CRAN accepted license?
    • Did devtools::check() produce any errors or warnings? If so paste them below.
    1. Please add explanations below for any exceptions to the above:
    1. If this is a resubmission following rejection, please explain the change in circumstances.

This package has been available through rOpenSci since summer 2013. Because I
made major upgrades in the last days that reflect the rOpenSci guidance on how
to write packages, I thought it might be helpful to re-submit this package.

See also the discussion in the forum:
https://discuss.ropensci.org/t/major-package-update-of-rebi-and-question-regarding-re-submission/333/

@noamross

This comment has been minimized.

Copy link
Collaborator

noamross commented Mar 11, 2016

Reviewers: @toph-allen

@sckott

This comment has been minimized.

Copy link
Member

sckott commented Mar 27, 2016

@toph-allen - hey there, it's been 16 days, please get your review in by Apr 01, thanks 😺 (ropensci-bot)

@toph-allen

This comment has been minimized.

Copy link

toph-allen commented Mar 28, 2016

Here's my review. N.B. According to the submission above, the package has been available through rOpenSci since 2013, but has recently made major upgrades that reflect rOpenSci guidance on package authorship, so it probably is already sufficient. Let me know if you have any questions, or any areas where you'd like me to clarify more.

Installation

  • No issues installing on OS X.

General comments on code

  • All sample code ran correctly.
  • The code generally seems concise and sensibly divided into different functions. As far as I can tell, the package addresses the problem space pretty comprehensively, and the code is clear to understand and pretty self-explanatory. Nice work. 👍

Comments on specific functions

  • epmc_ftxt() includes a data_src = "pmc" argument, but the function is hard-coded to always use PMC as a source--this argument is only used to throw an error if anything other than PMC is used. Perhaps better to remove this argument and clarify in documentation, as using args(epmc_ftxt) might sugget to a user that they can use another source.

Improvements to code style?

  • There are a few local package variables in use: supported_dbs and supported_semantic_types are used in one function each, and are defined in that function's file; supported_data_src is used in many functions, but defined in epmc_details.r. I'm not sure if there's an accepted best practice for this, but it seems it'd make more sense to either put these inside functions or put them in their own file, say supported.r.
  • While code itself is self-explanatory enough that I don't think this is a major issue, sometimes comments are a little sparse or terse (i.e. just a word or two).

Comments on documentation?

  • The top-level package help file is rebi-package.r. Seems like it would be better to have this as rebi.r and change @name rebi-package to @name rebi, so that ?rebi brings up that document.
  • There are one or two places where documentation could use a touch of clarifying.
    • E.g. epmc_citations.r could be clearer as to whether it fetches works cited by a publication vs. works citing a publication.
    • epmc_refs.r has the same problem, especially because it refers to "citations" in its documentation. This could be simply a reflection of the language in the underlying API's documentation, but I think it could use a little clarification.

Do tests pass locally?

  • Yes

Documentation and examples run without issue

  • Yes

ROpenSci Packaging Guidelines

  • Package naming
    • It's a fine name, though I don't know its provenance.
  • Function/variable naming
    • Snake case, self-explanatory. All functions are prefixed with "epmc_", which seems unnecessary? It seems, however, that this at least in part to avoid naming conflicts with defunct functions. The new naming scheme, while a bit redundant, seems more consistent than the old one.
  • README
    • README.md is clear and very comprehensive, explaining what the package can do and how to do it.
  • Code of conduct
    • Yep, included
  • Documentation
    • Uses roxygen2
  • Package dependencies
  • Testing
    • Uses testthat
  • Continuous integration
    • Yep, TravisCI
  • Console messages
    • No instances of cat() or print() in the package
  • Recommended software scaffolding
    • Uses all recommended packages where relevant

(I spent about 1.5–2 hours on the review).

@noamross

This comment has been minimized.

Copy link
Collaborator

noamross commented Mar 28, 2016

Thanks for the review @toph-allen!

@njahn82 Let us know when you've incorporated / responded to @toph-allen's comments and we can merge the updated version of rebi into the rOpenSci repo.

@njahn82

This comment has been minimized.

Copy link
Member Author

njahn82 commented Mar 29, 2016

Thank you for the review. I will incorporate the suggested improvements in the next days.

@njahn82

This comment has been minimized.

Copy link
Member Author

njahn82 commented May 3, 2016

Hi @toph-allen, sorry for coming back to you after such a long time. Thank you again for your helpful comments and suggestions to improve my package. I have a new version ready that include your remarks. I'll go over them in the following.

Please let me know if you have any further comments.

epmc_ftxt() includes a data_src = "pmc" argument, but the function is hard-coded to always use PMC as a source--this argument is only used to throw an error if anything other than PMC is used. Perhaps better to remove this argument and clarify in documentation, as using args(epmc_ftxt) might sugget to a user that they can use another source.

As suggested, I removed the data_src = "pmc" argument from the function epmc_ftxt()

https://github.com/njahn82/rebi/issues/2

There are a few local package variables in use: supported_dbs and supported_semantic_types are used in one function each, and are defined in that function's file; supported_data_src is used in many functions, but defined in epmc_details.r. I'm not sure if there's an accepted best practice for this, but it seems it'd make more sense to either put these inside functions or put them in their own file, say supported.r.

Agreed. I moved functions that are commonly used to utils.r. Thanks to your suggestion, I also started to dry out my code by

  • defining the uri and the REST API path in utils.r only,
  • use of roxygen's @inheritParams
  • Thanks to @stubben , the default limit is now 25. Batch_size is by default 1000 and is dropped as function parameter. The function make_queries in utils.r implements paging and is used by epmc_search(), make_path provides a more generic way to generate paths for epmc_ref(), epmc_citations(), epmc_db(), epmc_tm()

While code itself is self-explanatory enough that I don't think this is a major issue, sometimes comments are a little sparse or terse (i.e. just a word or two).

Agreed. Added more comprehensive explanations in some places.

The top-level package help file is rebi-package.r. Seems like it would be better to have this as rebi.r and change @name rebi-package to @name rebi, so that ?rebi brings up that document.

I moved rebi-package.r to rebi.r

There are one or two places where documentation could use a touch of clarifying. E.g. epmc_citations.r could be clearer as to whether it fetches works cited by a publication vs. works citing a publication.

As suggested, I clarified both functions. In the documentation of epmc_refs() it says now:

#'  Get references for a given publication
#'
#'  This function retrieves all the works listed in the bibliography of a given
#'  article.

epmc_citations()

#' Get citations for a given publication
#'
#' The function finds works that cite a given publication.

Package naming: It's a fine name, though I don't know its provenance.

It comes from the acronym European Bioinformatics Institute (EBI), which hosts Europe PMC. Do you think europepmc as package name would be more appropriate since the EBI offers many other services that are not covered by this package?

Function/variable naming All functions are prefixed with "epmc_", which seems unnecessary?

I followed the example of other rOpenSci packages like rcrossref or aRxiv. Is there any best practise available if and when to prefix functions?

Again, many thanks for your helpful comments and suggestions.

@noamross

This comment has been minimized.

Copy link
Collaborator

noamross commented May 10, 2016

Quick editor's comment about function naming: We recommend prefixing (e.g, epmc_) to prevent conflicts and ease autocompletion when function names are likely to conflict with other packages. In this case, this is certainly the case, as search(), ref(), etc., will conflict with other packages and even the base R namespace.

@noamross

This comment has been minimized.

Copy link
Collaborator

noamross commented May 17, 2016

@toph-allen - Please let us know if you have responses to the changes, thanks!

@toph-allen

This comment has been minimized.

Copy link

toph-allen commented May 25, 2016

I'm so sorry I took even longer to respond to your changes—I had been budgeting more time to go through it than it actually took.

Reading through your responses to my suggestions, and looking through the accompanying changes to the code, I don't think I have any further issues.

Regarding the naming of your packages:

Do you think europepmc as package name would be more appropriate since the EBI offers many other services that are not covered by this package?

I think that renaming the package europepmc or epmc would make its name more obviously connected with its main functions, and with the function prefixes. Those would be concrete benefits over the current name. It might be a bit of a hassle (or it might not be more than a "find and replace" and some touch-up)—and, you might just be fond of the current name. I'll leave that decision up to you.

All seems in order to me!

@njahn82

This comment has been minimized.

Copy link
Member Author

njahn82 commented May 25, 2016

Great, I would prefer to rename the package to europepmc as I have not initially submitted it to CRAN yet, but plan to do this within the next days.

Many thanks for reviewing this package!

@noamross

This comment has been minimized.

Copy link
Collaborator

noamross commented May 25, 2016

Thanks, @toph-allen for reviewing and @njahn82 for keeping up this package!

@njahn82 I am getting one local error for a test that is skipped on Travis:

1. Failure (at test_epmc_citations.r#22): epmc_citations returns ---------------
epmc_citations("13814508") does not match 'No citing documents found'. Actual value: "Error in rebi_GET(path = path) : Not Found (HTTP 404).\n"

Once that's fixed, and you've made all the internal changes for changing the name, I think we're good to go. You should be able to apply your changes to the package in the ropensci repo via a pull request. I'm not sure about whether you have the ability to change that repo's name. @sckott: heads up, we may need to see that a name change populates through our systems properly.

@njahn82

This comment has been minimized.

Copy link
Member Author

njahn82 commented May 25, 2016

Thanks for letting me know. I realized that the Europe PMC API has started to give a 404 when no citations were found, which is not a nice behavior.

Compare
http://www.ebi.ac.uk/europepmc/webservices/rest/MED/13814508/citations
with
http://www.ebi.ac.uk/europepmc/webservices/rest/MED/9843981/citations

I will fix the error messages and corresponding tests.

@sckott

This comment has been minimized.

Copy link
Member

sckott commented May 26, 2016

we may need to see that a name change populates through our systems properly.

okay, not sure how to handle this when there's a fork and original version - I'll look into this

@njahn82

This comment has been minimized.

Copy link
Member Author

njahn82 commented May 30, 2016

I prepared the package for name change.

@sckott To get rid of the fork, I would prefer to transfer repository ownership to ropensci first. Then we could do the repository name change.

What do you think?

@sckott

This comment has been minimized.

Copy link
Member

sckott commented May 30, 2016

Okay, sounds good!

@sckott

This comment has been minimized.

Copy link
Member

sckott commented May 31, 2016

@njahn82

This comment has been minimized.

Copy link
Member Author

njahn82 commented May 31, 2016

Great, all references to the former package name should now be changed to europepmc.

Many thanks for all your comments and suggested improvements, I really appreciate your help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment