New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refimpact submission #78

Closed
perrystephenson opened this Issue Oct 3, 2016 · 42 comments

Comments

Projects
None yet
5 participants
@perrystephenson
Copy link

perrystephenson commented Oct 3, 2016

Summary

  • What does this package do? (explain in 50 words or less):

Provides wrapper functions around the UK Research Excellence Framework 2014 Impact Case Studies Database API http://impact.ref.ac.uk/. The database contains relevant publication and research metadata about each case study as well as several paragraphs of text from the case study submissions. Case studies in the database are licenced under a CC-BY 4.0 licence http://creativecommons.org/licenses/by/4.0/legalcode.

  • Paste the full DESCRIPTION file inside a code block below:
Package: refimpact
Title: API Wrapper for the UK REF 2014 Impact Case Studies Database
Version: 0.1.0
Authors@R: person("Perry", "Stephenson",
    email = "perry.stephenson+cran@gmail.com",
    role = c("aut", "cre"))
Description: Provides wrapper functions around the UK Research
    Excellence Framework 2014 Impact Case Studies Database API
    <http://impact.ref.ac.uk/>. The database contains relevant publication and
    research metadata about each case study as well as several paragraphs of
    text from the case study submissions. Case studies in the database are
    licenced under a CC-BY 4.0 licence
    <http://creativecommons.org/licenses/by/4.0/legalcode>.
License: MIT + file LICENSE
URL: https://github.com/perrystephenson/refimpact
BugReports: https://github.com/perrystephenson/refimpact/issues
Encoding: UTF-8
LazyData: true
Depends:
    R (>= 3.2.5)
Imports:
    curl (>= 2.1),
    jsonlite (>= 1.1),
    tibble (>= 1.2)
RoxygenNote: 5.0.1
Suggests: testthat
  • URL for the package (the development repository, not a stylized html page):

https://github.com/perrystephenson/refimpact

  • Who is the target audience?

People interested in the following:

  1. Research in the UK and it's global impact
  2. Relationships between academic institutions
  3. Relationships between impactful publications
  4. A high quality and homogenous text resource for text mining
  • Are there other R packages that accomplish the same thing? If so, what is different about yours?

There are no other packages for accessing this data, as far as I am aware.

Requirements

Confirm each of the following by checking the box. This package:

  • does not violate the Terms of Service of any service it interacts with.
  • has a CRAN and OSI accepted license.
  • contains a README with instructions for installing the development version.
  • includes documentation with examples for all functions.
  • contains a vignette with examples of its essential functions and uses. (this is currently in the README.md file, but I can add a vignette if required)
  • has a test suite.
  • has continuous integration with Travis CI and/or another service.

Publication options

  • Do you intend for this package to go on CRAN? (Already accepted - link)
  • Do you wish to automatically submit to the Journal of Open Source Software? If so:
    • The package contains a paper.md with a high-level description in the package root or in inst/.
    • The package is deposited in a long-term repository with the DOI:
    • (Do not submit your package separately to JOSS)

Detail

  • Does R CMD check (or devtools::check()) succeed? Paste and describe any errors or warnings: No errors, warnings nor notes.
  • Does the package conform to rOpenSci packaging guidelines? Please describe any exceptions:
  • If this is a resubmission following rejection, please explain the change in circumstances:
  • If possible, please provide recommendations of reviewers - those with experience with similar packages and/or likely users of your package - and their GitHub user names:
@sckott

This comment has been minimized.

Copy link
Member

sckott commented Oct 3, 2016

Editor checks:

  • Fit: The package meets criteria for fit and overlap
  • Automated tests: Package has a testing suite and is tested via Travis-CI or another CI service.
  • License: The package has a CRAN or OSI accepted license
  • Repository: The repository link resolves correctly
  • Archive (JOSS only, may be post-review): The repository DOI resolves correctly
  • Version (JOSS only, may be post-review): Does the release version given match the GitHub release (v1.0.0)?

Editor comments

Thanks for your submission @perrystephenson

In assessing fit, I have a few questions:

  • How widely used is this data in research/academia/non-profits? From a google scholar search it seems to be mentioned/cited a lot.
  • Do you know if this data is meant to be temporarily available, or meant to be around for a while?
  • Are there similar things in other countries? (just curiosity)
  • You mentioned text mining. I didn't see egs. in your docs though. What kind of text is available through the service?

Update 2016-10-07:

Will be seeking reviewers, but in the meantime. I ran goodpractice::gp() on your package. Here are the results, which it may be helpful to get started with before review (you can run it yourself and dig into the results more):

It is good practice to

  ✖ write unit tests for all functions, and all package code
    in general. At least some lines are not covered in this package.

    R/get_tag_values.R:25:NA

  ✖ use '<-' for assignment instead of '='. '<-' is the
    standard, and R users and developers are used it and it is easier
    to read your code for them if you use '<-'.

    R/search_case_studies.R:46:11
    R/search_case_studies.R:46:23
    R/search_case_studies.R:46:36
    R/search_case_studies.R:46:51

  ✖ avoid long code lines, it is bad for readability. Also,
    many people prefer editor windows that are about 80 characters
    wide. Try make your lines shorter than 80 characters

    R/get_institutions.R:18:1
    R/get_tag_values.R:21:1
    R/get_tag_values.R:25:1
    R/get_tag_values.R:29:1
    R/get_units_of_assessment.R:17:1
    ... and 15 more lines

  ✖ fix this R CMD check error/warning/note: Namespace in
    Imports field not imported from: ‘curl’ All declared Imports should
    be used.

Reviewers: @njahn82 @kbenoit
Due date: 2016-11-01

@perrystephenson

This comment has been minimized.

Copy link
Author

perrystephenson commented Oct 5, 2016

Hi @sckott

Thanks for getting back to me so quickly! I'll reply in order:

  1. It has been used predominantly in text mining research projects, an example is here. We're also working on it at the University of Technology, Sydney (UTS) with a view to building tools which can help researchers improve the quality of their academic writing by making it "more like" an existing body of known high quality text. This is related to our work on the Academic Writing Analysis project.
  2. I have looked around previously for an answer to this question, and everything I have read seems to imply that the data will be around for a while. I suspect that they will build up the dataset over time, which will mean adding the 2019 submission data when it becomes available.
  3. I am not aware of similar datasets in other countries (unfortunately). If my project proves successful then perhaps UTS will lobby the Australian Research Council to release a similar dataset.
  4. The text available through the service is the full text of over 6,000 research impact case study submissions, as well as a few structured and semi-structured pieces of metadata. This is a large collection of similar documents, with consistent grammar and minimal spelling errors. I am planning to write a vignette for the next CRAN release, which will incorporate some of my work on the project mentioned in point 1.

Thanks!

@sckott

This comment has been minimized.

Copy link
Member

sckott commented Oct 7, 2016

Thanks, I'll consult with my colleagues and get back ASAP

@sckott

This comment has been minimized.

Copy link
Member

sckott commented Oct 7, 2016

@perrystephenson checked the fit box ✔️

Seeking reviewers now, see updated comment above about goodpractice

@sckott

This comment has been minimized.

Copy link
Member

sckott commented Oct 11, 2016

reviewers now assigned

@sckott

This comment has been minimized.

Copy link
Member

sckott commented Oct 19, 2016

@njahn82 @kbenoit sorry about the ping from the bot (message deleted, but you probably got an email). still working out the rough spots

@kbenoit

This comment has been minimized.

Copy link

kbenoit commented Oct 19, 2016

No problem!

@njahn82

This comment has been minimized.

Copy link
Member

njahn82 commented Oct 31, 2016

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (such as being a major contributor to the software).

Documentation

The package includes all the following forms of documentation:

  • A statement of need clearly stating problems the software is designed to solve and its target audience in README
  • Installation instructions: for the development version of package and any non-standard dependencies in README
  • Vignette(s) demonstrating major functionality that runs successfully locally
  • Function Documentation: for all exported functions in R help
  • Examples for all exported functions in R Help that run successfully locally
  • Community guidelines including contribution guidelines in the README or CONTRIBUTING, and URL, Maintainer and BugReports fields in DESCRIPTION
Paper (for packages co-submitting to JOSS)

The package contains a paper.md with:

  • A short summary describing the high-level functionality of the software
  • Authors: A list of authors with their affiliations
  • A statement of need clearly staing problems the software is designed to solve and its target audience.
  • References: with DOIs for all those that have one (e.g. papers, datasets, software).

Functionality

  • Installation: Installation succeeds as documented.
  • Functionality: Any functional claims of the software been confirmed.
  • Performance: Any performance claims of the software been confirmed.
  • Automated tests: Unit tests cover essential functions of the package
    and a reasonable range of inputs and conditions. All tests pass on the local machine.
  • Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

Final approval (post-review)

  • The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 3 hours


Review Comments

This package gives access to UK's 2014 Research Excellence Framework (REF) Impact Case Studies. To my knowledge, this is one of the first comprehensive efforts to share reports under an open content license, making it an interesting source for studying science, science communication and funding activities.

The package interfaces the REF2014 API. Although the interfaced database seems to contain work being completed, obtaining data from the API rather than as package data is very reasonable because CRAN limits the size of data files to reduce installation time. Besides, the API provides various query methods, which are accessible through the package. The main function of refimpact is get_case_studies()

The author did a nice job interfacing the API with jsonlite and documenting the functions. Using tibble is a great choice to encourage re-use of the fetched data within R. The package is also available via CRAN, so it has passed essential tests when sharing an R package. However, I feel that the package could benefit from some revisions in order to increase code quality and rOpenSci compliance.

Use httr to work with and manipulate http requests

I would strongly recommend using httr for building and accessing data from web APIs. httr provides useful functions to create API queries, and increases the reliability of packages when something breaks. This httr vignette is a great introduction about how to write an API package.

More specifically:

  • Input validation and building the queries can be done with httr:::modify_url(). This would reduce the current code used and thus make it more reliable.
  • APIs can break. httr allows converting http errors to R, and testing if content from a request is valid json. Both methods should be implemented in refimpact to make it more transparent to the users when a request fails.

Provide a vignette

The package would benefit from a vignette, a long-form documentation of how to use the package. The README nicely summarises who might be interested in using refimpact, how to install and how to access the data. This could be a great start and continued with a demonstration of how to solve particular problems researchers are faced with when they study science and its funding. In particular, it would be nice to see how to work with refimpact output using other R packages written for text-mining purposes or network analysis. I also see a lot of potential when interfacing refimpact data with literature packages to get more detailed bibliographic records.

README

  • Please follow rOpenSci guidelines and create the README.md directly from a README.Rmd. One advantage is that you can present both code and results to your audience.

Functions

  • Instead of integers, I would suggest to use character vectors as input for get_case_studies(). Although REF's variables are syntactically numeric, they should be handled as categorical variables.
  • I wonder if the parameter phrase in get_case_studies() could be improved to allow more than a single word as input?
  • There is some potential to dry out the code basis, especially when calling the API. Functionality needed in many functions, i.e. API calls and validation, could be separated and stored in a utils.R file.

Contributing

  • A Code of Conduct as a community guideline should be added according to the rOpenSci guidelines. devtools::use_code_of_conduct() makes it easy to add the Contributor Covenant template.

R CMD check

There was one NOTE when running R CMD check locally:

* checking dependencies in R code ... NOTE
Namespace in Imports field not imported from: ‘curl’
  All declared Imports should be used.

In summary, the package is a very good start, but could be improved by better aligning the code basis and documentation to the rOpenSci guidelines.

Please let me know if something is unclear or doesn't make sense. I am happy to help!

@sckott

This comment has been minimized.

Copy link
Member

sckott commented Nov 1, 2016

@kbenoit - hey there, it's been 29 days, please get your review in soon, thanks 😺 (ropensci-bot)

@perrystephenson

This comment has been minimized.

Copy link
Author

perrystephenson commented Nov 2, 2016

Thanks @njahn82, I appreciate the detailed feedback! I probably won't get a chance to make changes right away, but I'll make sure that all of your feedback is addressed in the next version of the package.

@kbenoit

This comment has been minimized.

Copy link

kbenoit commented Nov 3, 2016

Sorry folks, I have been in transit and under the gun for a deadline that passed Oct 31. I will do my best to complete my review by 4 November (Friday).

@sckott

This comment has been minimized.

Copy link
Member

sckott commented Nov 3, 2016

thanks for the update @kbenoit

@kbenoit

This comment has been minimized.

Copy link

kbenoit commented Nov 4, 2016

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (such as being a major contributor to the software).

Documentation

The package includes all the following forms of documentation:

  • A statement of need clearly stating problems the software is designed to solve and its target audience in README
  • Installation instructions: for the development version of package and any non-standard dependencies in README
  • Vignette(s) demonstrating major functionality that runs successfully locally
  • Function Documentation: for all exported functions in R help
  • Examples for all exported functions in R Help that run successfully locally
  • Community guidelines including contribution guidelines in the README or CONTRIBUTING, and URL, Maintainer and BugReports fields in DESCRIPTION
Paper (for packages co-submitting to JOSS)

The package contains a paper.md with:

  • A short summary describing the high-level functionality of the software
  • Authors: A list of authors with their affiliations
  • A statement of need clearly staing problems the software is designed to solve and its target audience.
  • References: with DOIs for all those that have one (e.g. papers, datasets, software).

Functionality

  • Installation: Installation succeeds as documented.
  • Functionality: Any functional claims of the software been confirmed.
  • Performance: Any performance claims of the software been confirmed.
  • Automated tests: Unit tests cover essential functions of the package
    and a reasonable range of inputs and conditions. All tests pass on the local machine.
  • Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

Final approval (post-review)

  • The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

    Estimated hours spent reviewing: 3


Review Comments

This is a very specific package aimed at one thing: making it easy for researchers to pull data from the 2014 UK Research Excellence Framework API, for subsequent analysis. The package wraps API calls detailed on the REF website and returns conveniently classed R objects, for direct analysis in R.

Installation

Installed easily, passed full check with --as-cran toggled on my system. (I did not experience the NOTE about curl described by the other reviewer.)

Tests

The test coverage looks quite impressive (98%), although many or all of these tests could be made much more robust. For instance, the single test in test_tag_types.R is:

test_that("get_tag_types() returns a tibble", {
  # skip_on_cran()
  expect_equal(dim(get_tag_types()),c(13,2))
})

which not only does not test whether the return is a tibble, but also only matches the dimensions known in advance. A more robust test might also compare the values returned to the known tags from the REF website, or that the return type is in fact a tibble class object.

This is more than just being picky, since all of the ability to make sure the package is actually working are done in the testthat tests: Every function's Examples section is not run. I realize it's difficult to get API calls to run in tests, but it might be possible to run some of them, but tag them as \donttest{} instead of \dontrun{}.

Adherence to community guidelines

Here I will suggest a few easy-to-make changes:

  • In DESCRIPTION, URL and BugReports are present, but the Maintainer field is missing.
  • No CONTRIBUTING or CONDUCT.md files are present.

Adherence to ROpenSci packaging guidelines

  • NEWS.md: This file has no details besides declaring the "initial release", although there are two tagged releases in the GitHub account. Could link the release announcement in NEWS.md to the v1.0 release tag.
  • A README.Rmd would better meet the guidelines, including a pre-commit hook to make sure that it gets made into README.md. (See ?devtools::use_readme_rmd.)
  • This is minor, but the naming could follow the object_verb naming guidelines suggested in rOpenSci Packaging Guide, possibly: casestudies_get, institutions_get, etc., or even ref_casestudies_get, ref_institutions_get, ... to make it clear that these are all from this package. This makes it more like the stringi example in the packaging guidelines. But for such a short package this is really a matter of preference than any sort of requirement. (But since rOpenSci is designed to keep the R jungle tidy, why not start with even the smallest vines.)
  • To be really picky: According to the guidelines, knitr and roxygen2 should be added to Suggests: in the DESCRIPTION file.

Documentation

This is where the package is weakest, since the documentation is limited to the barest of documentation for the functions. Even those do not describe much about the context of the options or inputs: for instance get_case_studies() did not really explain what the ID is or why I might use it instead of a UKPRN. I suggest that this could be vastly expanded in a Details section and added as an overview to refimpact-package.Rd. I had to do considerable testing and reading from the UK REF website before I understood what were the inputs to the functions.

Of course a vignette is what this package is really crying out for, to show not only how to use it but why, including some analysis of the data to show the practical motivation for querying this data through an R package. This will also get around the problem of not being to run all of the examples live because they involve remote API calls.

Motivation

As someone who was actually an individual submitted in this exercise -- although not for a case study -- from one of the listed institutions, and personally involved in managing the staff submitted from my institution, I held the first impression that this would be a really cool package to have, for accessing this data. However the more I experimented with the package, the more I wondered why the API approach was needed. The data is static, so the only reasons not to package the data are copyright and size. Most of the information (except some of the case studies - see http://impact.ref.ac.uk/CaseStudies/Terms.aspx) is governed by a CC license, and so could easily be packaged as data. The only objection to size applies to the case studies themselves, but again, if the documentation or README.md had more on the motivation and/or documentation, I would have a better idea of just how large this is (and whether this size makes it something that is not better simply provided as a flattened large data.frame or "tibble").

The following static tables from the API are CC licensed and could easily be packaged as built-in objects:

  • institutions: This table is 155 x 5 data.frame of 20.8k in size
  • units_of_assessment: 36 x 3
  • tag_types: 13 x 2
  • values: This is much larger but the entire table could be flattened in a way that links to tag_types, if we are willing to strongly suspend the principles of relational data normalization (something most users may not know or care about).

This seems to gut the functions from the package, since it leaves only get_case_studies(), which might be appropriately handled through an API call. But here I suggest the package could really enhance value by adding data-handling functions that link the static data objects to the structure of what get_case_studies() returns, such as ways to flatten the lists that are elements of the return objects from that function. For instance, the return object from get_case_studies(ID = c(27,29)) is a 2 x 19 element tibble, but several of those columns (e.g. Continient) are variable length lists. Many users who are not experts in dissecting R objects are going to have trouble with the nesting of lists within data.frames.

In addition, by having the smaller objects as built-in data, the inputs to get_case_studies() can be checked for valid values, rather than relying on the API to reject a non-existent ID, for instance.

@sckott

This comment has been minimized.

Copy link
Member

sckott commented Nov 4, 2016

thanks very much for the review @kbenoit !

@perrystephenson both reviews in now. let us know if you have any questions. continue discussion here with me and reviewers.

@perrystephenson

This comment has been minimized.

Copy link
Author

perrystephenson commented Nov 6, 2016

Thanks everyone for the considerable effort you have put in to reviewing my work - I can see why feedback is the first benefit listed in the onboarding README! I've never had such a thorough review of my code before, so I really appreciate all the advice.

I've created a whole bunch of issues in the refimpact repository, and I'll start working through them when I get a chance. I'll probably have a few questions (mainly around the benefits of a split data/API package) but I'll wait until I've had a chance to think about the issues more fully before I do.

Thanks again, and I'll let you know how I go!

@kbenoit

This comment has been minimized.

Copy link

kbenoit commented Nov 6, 2016

@perrystephenson Happy to help further with the process.

@sckott

This comment has been minimized.

Copy link
Member

sckott commented Jan 5, 2017

@perrystephenson let me know if I can help the process move along

1 similar comment
@sckott

This comment has been minimized.

Copy link
Member

sckott commented Jan 16, 2017

@perrystephenson let me know if I can help the process move along

@perrystephenson

This comment has been minimized.

Copy link
Author

perrystephenson commented Jan 17, 2017

@sckott

This comment has been minimized.

Copy link
Member

sckott commented Jan 17, 2017

ok

@perrystephenson

This comment has been minimized.

Copy link
Author

perrystephenson commented Feb 4, 2017

I've missed the self-imposed end of January deadline, but I've made fairly significant process working through the issues raised above. You can see how I'm tracking against the issues here:
https://github.com/perrystephenson/refimpact/issues.

@sckott

This comment has been minimized.

Copy link
Member

sckott commented Feb 6, 2017

thanks @perrystephenson for the update

@perrystephenson

This comment has been minimized.

Copy link
Author

perrystephenson commented Mar 2, 2017

I've finished re-writing the package based on your excellent feedback - I don't think there is a single line of code remaining from the original package you reviewed! This has necessarily involved a breaking change to the function syntax (there is now only one user-facing function, named ref_get), so the version number has been incremented to v1.0.0.

I have a couple of notes - excuses if you will!

  • There are a few lines without unit test coverage. These lines deal with API failures, which are quite difficult to test. I've tested them locally by removing my network connection, so I am confident that these lines are performing as expected. Test coverage is otherwise more thorough than in the previous version, and I have made use of expect_equal_to_reference() to perform efficient and complete testing of the API methods.
  • There is a NOTE in R CMD check about UTF-8 characters in the bundled dataset. I decided that it was better to leave the NOTE there than to replace those UTF-8 characters with ASCII, as they form part of names (such as institutions) in the ref_tags dataset, and changing these characters would overall be detrimental for users. I hope CRAN will agree!

I am happy to go through the changes in more detail, but given that one of the critiques was that the package was hard to use for new users, I think it might be best if I leave you to your own devices with the documentation in the package. Hopefully you will find it far more helpful than before!

Happy to answer any questions, and make any more changes that you suggest. When you're happy with the package, I will tag the v1.0.0 release to align with NEWS.md.

Thanks again for all of your time!

@maelle

This comment has been minimized.

Copy link
Member

maelle commented Mar 2, 2017

@perrystephenson Regarding the encoding NOTE, see Julia Silge's post and her CRAN comments (her package was accepted with the NOTE)

@sckott

This comment has been minimized.

Copy link
Member

sckott commented Mar 2, 2017

@njahn82 @kbenoit if you have time soon, please let us know if you are happy with @perrystephenson changes. if you don't have time soon, also let me know.

@kbenoit

This comment has been minimized.

Copy link

kbenoit commented Mar 2, 2017

I would not worry about the NOTE for the non-ASCII data. quanteda has had these since the beginning. Brian Ripley gave the beginnings of a grumble about it, once, but as long as you have

Encoding: UTF-8

in DESCRIPTION then don't worry about it.

@kbenoit

This comment has been minimized.

Copy link

kbenoit commented Mar 13, 2017

Sorry it's taken a while to comment on the revision... I think the update is greatly improved. The vignette is a welcome addition and the tests significantly improved. As I mention above, don't worry about the UTF-8 note.

I still think you could include a lot of the smaller bits of data that are currently accessed through API calls, namely institutions, units_of_assessment, and tag_types. But it works well enough to query these too.

I also still think you could (easily!) add more description to the ?refimpact page.

Finally, there are some cool new features in roxygen2 6.0 for documentation that you could use, esp for documenting data and packages. Not at all important for the approval process but definitely worth checking out!

@sckott

This comment has been minimized.

Copy link
Member

sckott commented Mar 15, 2017

thanks @kbenoit !

@perrystephenson any thoughts on his comments above?

@njahn82

This comment has been minimized.

Copy link
Member

njahn82 commented Mar 20, 2017

Hi @perrystephenson

Apologies for my late review of your thorough package revision. I am very happy to see that httr is now implemented for interacting with the web API and that both vignette and README guide users through the package.

Here are some some follow-up points after having time to look into your revision:

Ensure backward compatibility

As introduced, there is now only one function that can be used for calling the API. However, I am afraid that the removed functions are not marked as .Deprecated. This is important to ensure backward compatibility with the existing CRAN release. Hadley Wickhams R packages book contains a chapter describing how to deprecate functions:

http://r-pkgs.had.co.nz/release.html

Function input

I wonder if it would be possible to split up the query parameters, so that I can provide something like

ref_get("SearchCaseStudies", id = "2", tags = "dkkdk")

instead of a list object.

I would still not allow numeric input for identifying the studies. In my opinion, it is best practice that IDs should be of class character.

Simplify nested lists?

I second @kbenoit comment that learners and non-frequent users of R are often unfamiliar with nested lists. The vignette gives good guidance on how to select list elements. However, it would be great to have at least one parser that additionally returns a subset of data as a tidy data.frame. This parser could become a great time saver for users simply wishing to get an overview over the data and to export this selection.

Vignette and README

Vignette and README contribute greatly to a better understanding of the aim and the usage of this package. Especially, the benefits and limits of the API implementation are very well described. One API problem, no call for retrieving all datasets at once, is very well solved by showing how to extract the entire dataset. There is some potential to shorten the example by using lapply instead of a for-loop:

# load dev version
devtools::install_github("perrystephenson/refimpact")
#> Skipping install of 'refimpact' from a github remote, the SHA1 (a1ccaafb) has not changed since last install.
#>   Use `force = TRUE` to force installation
# get IDs
uoa_table <- refimpact::ref_get("ListUnitsOfAssessment")
# apply 'ref_get' over all IDs
tt <- lapply(uoa_table$ID, function(x) refimpact::ref_get("SearchCaseStudies", 
  query = list(UoA = x)))

Please provide at least one usage example in the README.

Finally, I wonder if you want to include your vignette as part of the README. If so, simply add

```{r child="vignettes/refimpact.Rmd"} ```

to your README.Rmd.

@sckott

This comment has been minimized.

Copy link
Member

sckott commented Mar 20, 2017

thanks @njahn82 for your comments!

@perrystephenson

This comment has been minimized.

Copy link
Author

perrystephenson commented Mar 26, 2017

Thanks everyone,

I'll try and respond to these in order.

@kbenoit

I've made comments about the bundling of the smaller tables here. If the case studies were small enough to bundle with the package then I would have just gone with a 100% data package, but given that users must use the API to make use of the package, there isn't any harm in getting the users to pull down the small tables. The possible benefit is that if there are new records added to the database, then users will get these records.

I have raised an issue for the help ?refimpact documentation and will sort it out soon.

And I'll take a look at Roxygen2 6.0!

@njahn82

I thought about backwards compatibility, and my thinking was along the lines of:

  1. It has barely any downloads (672 from the RStudio CRAN mirror as of today), and a Google search for library(refimpact) didn't return any meaningful hits. I doubt anyone is going to notice the lack of backward compatibility.
  2. It has no reverse dependencies on CRAN
  3. The old function names were not compliant with rOpenSci guidelines
  4. I incremented the major version number, which indicates breaking changes

I'm happy to take guidance on this if you think backwards compatibility is worth maintaining!

I can definitely split up the list - I left it as a list partly due to convenience (httr takes a list) and partly due to having seen that pattern in many other packages (and in fact in lots of base R functions). Is there a reason you think it's better as individual arguments?

Regarding the simplification of nested lists into useful data structures, I had a bit of a look at what this would take, and there is a fair bit of work in it. I've left the issue open and tagged it as something to look at for v1.1, unless you think it's important enough that it needs to be part of v1.0. I suspect it's going to take a while to think through all of the use cases though.

I'll have another look at the vignette and README based on your comments - I might use purrr to shorten the code example above, as I don't think that the apply functions are particularly good at conveying meaning to users looking at vignettes, and purrr tends to be a little easier to read.

@njahn82

This comment has been minimized.

Copy link
Member

njahn82 commented Mar 27, 2017

Thanks for your reply, @perrystephenson

I am feeling unsure about what policies and good practices concerning deprecated or defunct functions are in place at ropensci, perhaps @sckott can help? Maybe there are users of your package who would appreciate it to find guidance about changes.

Regarding the ref_get() input, I found it a good practice among web api packages to call functions with simple atomic vectors because they make it easier to grasp what input is required. In the case of ref_get() I had to check with the API documentation to get an idea about what search parameters exist and what they mean. So I want to suggest to at least expand the ref_get() documentation with examples that cover the five existing query parameters.

It is great that you are planning to provide some parsers, and if you would like to add them in another release I am ok with it.

Looking forward to find purrr examples in the vignette!

@sckott

This comment has been minimized.

Copy link
Member

sckott commented Mar 30, 2017

In terms of deprecating and defunct, since it's already on CRAN, i advise to make a function deprecated in the next pkg version, then version after that make the function defunct - So that users have one version with warning about function going away, then next version it's gone - And do maintain man files for deprecated and defunct functions

@sckott

This comment has been minimized.

Copy link
Member

sckott commented Jun 7, 2017

any thoughts @perrystephenson ?

@perrystephenson

This comment has been minimized.

Copy link
Author

perrystephenson commented Jun 29, 2017

Hi @sckott and sorry for the delay. Following from the discussion above:

  • I've restored all of the functions from v0.1.0 and added a deprecation message to those functions (and made an appropriate note in the NEWS.md file)
  • Made a minor addition to ?refimpact (pointing the reader to the vignette - no point in duplicating the vignette in ?refimpact)
  • Added lots of detail and examples to ?ref_get

I've decided to leave the "get all tags" example in the vignette as a loop. I think that loops are likely to be more readily understood by users who are new to programming (and new to R), which makes it a bit more friendly than using an apply function or purrr.

These changes have all been pushed to GitHub, so feel free to take a look.

Thanks again for the feedback!

@sckott

This comment has been minimized.

Copy link
Member

sckott commented Jun 30, 2017

thanks @perrystephenson - vacation for a week now, will have a look when I get back, sorry for the delay!

@sckott

This comment has been minimized.

Copy link
Member

sckott commented Jul 11, 2017

@perrystephenson looking good, checked it over, and nearly all is good. Just one thing

here https://github.com/perrystephenson/refimpact/blob/master/R/ref_get.R#L120 you are parsing content and then parsing JSON, whereas the HTTP status code check happens after that - why not do the status code check directly after https://github.com/perrystephenson/refimpact/blob/master/R/ref_get.R#L109

@perrystephenson

This comment has been minimized.

Copy link
Author

perrystephenson commented Jul 12, 2017

Hi @sckott excellent question. The reason for this ordering is that the API gives useful error messages for some error conditions, and those error messages are worth passing on to the user. So following the API call I firstly check whether the response is a JSON (and throw an error if not), then I parse the JSON (which I now know I can do safely) to get either the content or the error message, then if there is an error I can add the parsed error message to the error string which gets presented to the user.

The alternative is to make the error messages less useful, which I think would be an inferior outcome.

@sckott

This comment has been minimized.

Copy link
Member

sckott commented Jul 14, 2017

True, you do check for JSON first, then parse after that is confirmed, so seems okay. - Def. agree to give useful error messages whenever possible

@sckott

This comment has been minimized.

Copy link
Member

sckott commented Jul 14, 2017

Approved!

  • Please transfer the package to ropensci- I've invited you to a team within our ropensci organization. After you accept you should be able to move it.
  • Make sure to
    • add rOpenSci footer to README
      [![ropensci_footer](https://ropensci.org/public_images/ropensci_footer.png)](https://ropensci.org)
    • Add the (brand new!) "Peer-reviewed" badge to your README (in the future people will add this on submittal; it updates through the stages of review):
      [![](https://ropensci.org/badges/78_status.svg)](https://github.com/ropensci/onboarding/issues/78)
    • Change any needed links, such those for CI badges
    • Travis CI should update to the new location automatically - you may have to update other CI systems manually
@perrystephenson

This comment has been minimized.

Copy link
Author

perrystephenson commented Jul 15, 2017

Thanks @sckott!

I've done all of those tasks now - is there anything I need to do before submitting this new version to CRAN? Do I need to add any rOpenSci references to DESCRIPTION?

Once again, thanks to everyone for the incredibly detailed feedback - I have learned a lot through the process! I really appreciate the time and effort you've committed to this process.

@sckott

This comment has been minimized.

Copy link
Member

sckott commented Jul 15, 2017

is there anything I need to do before submitting this new version to CRAN? Do I need to add any rOpenSci references to DESCRIPTION?

Just make sure to update the URLs in your DESCRIPTION file , right now they have the old github username

thanks for your submission!

@sckott sckott closed this Jul 17, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment