Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

medrxivr: Accessing and searching medRxiv preprint data in R #380

Closed
17 of 31 tasks
mcguinlu opened this issue May 27, 2020 · 44 comments
Closed
17 of 31 tasks

medrxivr: Accessing and searching medRxiv preprint data in R #380

mcguinlu opened this issue May 27, 2020 · 44 comments
Assignees

Comments

@mcguinlu
Copy link
Member

mcguinlu commented May 27, 2020

Submitting Author: Luke McGuinness (@mcguinlu)
Repository: https://github.com/mcguinlu/medrxivr
Version submitted: 0.0.2
Editor: @maurolepore
Reviewer 1: @tts
Reviewer 2: @njahn82
Archive: TBD
Version accepted: TBD


  • Paste the full DESCRIPTION file inside a code block below:
Package: medrxivr
Title: Access MedRxiv Preprint Data
Version: 0.0.2
Authors@R: c(
    person("Luke", "McGuinness",
           role = c("aut", "cre"),
           email = "luke.mcguinness@bristol.ac.uk",
           comment = c(ORCID = "0000-0001-8730-9761")),
    person("Lena", "Schmidt",
           role = "aut",
           comment = c(ORCID = "0000-0003-0709-8226")))
Description: The medRxiv <https://www.medrxiv.org/> repository is a free online
    archive and distribution server for complete but unpublished manuscripts 
    (preprints) in the medical, clinical, and related health sciences. medrxivr
    provides programmatic access to both medRxiv API <https://api.biorxiv.org/>
    and a static snapshot of database, which is updated daily. Users can then 
    search for relevant records using regular expressions and Boolean logic, and
    can easily download the full-text PDFs of preprints matching their search 
    criteria. 
License: MIT + file LICENSE
Encoding: UTF-8
LazyData: true
Language: en-US
URL: https://github.com/mcguinlu/medrxivr
BugReports: https://github.com/mcguinlu/medrxivr/issues
Imports: 
    rvest,
    methods,
    dplyr,
    xml2,
    curl,
    jsonlite,
    httr,
    stringr,
    rlang
Suggests: 
    testthat (>= 2.1.0),
    knitr,
    rmarkdown,
    covr,
    kableExtra
VignetteBuilder: 
    knitr, 
    rmarkdown    
RoxygenNote: 7.1.0

Scope

  • Please indicate which category or categories from our package fit policies this package falls under: (Please check an appropriate box below. If you are unsure, we suggest you make a pre-submission inquiry.):

    • data retrieval
    • data extraction
    • data munging
    • data deposition
    • workflow automataion
    • version control
    • citation management and bibliometrics
    • scientific software wrappers
    • field and lab reproducibility tools
    • database software bindings
    • geospatial data
    • text analysis
  • Explain how and why the package falls under these categories (briefly, 1-2 sentences):
    medrxivr allows users to programmatically access data from medRxiv, a preprint respository for papers in medical, clinical, and related health sciences. The package also allows user to readily perform and document reproducible literature searches of the medRxiv database.

  • Who is the target audience and what are scientific applications of this package?
    The primary target of this package is systematic reviewers (i.e. me!), who frequently wish both to use more complicated queries (e.g. regular expresssions/Boolean combinations) when searching medRxiv than the official site currrently allows for, and who also wish to be easily able to download the full text PDFs of records matching their search. medrxivr helps with both of these challenges. However, anyone who wishes to find and retrieve relevant medRxiv records in R, for example to explore the distribution of preprints by subject area, will find the package useful.

  • Are there other R packages that accomplish the same thing? If so, how does yours differ or meet our criteria for best-in-category?
    As far as I am aware, no other package allows users to access medRxiv data in R.

  • If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted.
    Issue: Presubmission inquiry: medrxivr #369
    Editor: @annakrystalli

Technical checks

Confirm each of the following by checking the box.

This package:

Publication options

  • Do you intend for this package to go on CRAN?
  • Do you intend for this package to go on Bioconductor?
  • Do you wish to automatically submit to the Journal of Open Source Software? If so:
JOSS Options
  • The package has an obvious research application according to JOSS's definition.
    • The package contains a paper.md matching JOSS's requirements with a high-level description in the package root or in inst/.
    • The package is deposited in a long-term repository with the DOI: 10.5281/zenodo.3860024
    • (Do not submit your package separately to JOSS)
MEE Options
  • The package is novel and will be of interest to the broad readership of the journal.
  • The manuscript describing the package is no longer than 3000 words.
  • You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see MEE's Policy on Publishing Code)
  • (Scope: Do consider MEE's Aims and Scope for your manuscript. We make no guarantee that your manuscript will be within MEE scope.)
  • (Although not required, we strongly recommend having a full manuscript prepared when you submit here.)
  • (Please do not submit your package separately to Methods in Ecology and Evolution)

Code of conduct

Tagging my co-author @L-ENA for reference.

@maurolepore
Copy link
Member

maurolepore commented Jun 3, 2020

@mcguinlu and @L-ENA, thanks for your submission. I'll be the editor. As we move though the process I'll keep you posted. I welcome your questions any time.

@maurolepore
Copy link
Member

maurolepore commented Jun 4, 2020

Editor checks:

  • Fit: The package meets criteria for fit and overlap
  • Automated tests: Package has a testing suite and is tested via Travis-CI or another CI service.
  • License: The package has a CRAN or OSI accepted license
  • Repository: The repository link resolves correctly
  • Archive (JOSS only, may be post-review): The repository DOI resolves correctly
  • Version (JOSS only, may be post-review): Does the release version given match the GitHub release (v1.0.0)?

Editor comments

@mcguinlu, thanks again for your submission. The editor checks flagged a few issues that need your attention; see them below.

Let's discuss the first two items (ml1 and ml2) before I search for reviewers; these two items refer to a potential overlap with existing packages.

The remaining items are important but not as urgent as the first two.

  • (ml3) Run spelling::spell_check_package(); then fix or update the list of valid words with spelling::update_wordlist().
> spelling::spell_check_package()
WORD              FOUND IN
api               mx_api_content.Rd:39,42
                  mx_api_doi.Rd:28,31
                  description:4
AppVeyor          README.md:14
                  README.Rmd:24
ation             building-complex-search-strategies.Rmd:110
biorxiv           description:4
capitalisation    building-complex-search-strategies.Rmd:125
... more lines
  • (ml4) Run goodpractice::gp() to identify lines of code that tests don't touch.
> goodpractice::gp()
... more lines
── GP medrxivr ─────────────────────────────────────────────────────────────────

It is good practice towrite unit tests for all functions, and all package code in
    general. 77% of code lines are covered by test cases.

    R/mx_crosscheck.R:50:NA
    R/mx_download.R:25:NA
    R/mx_download.R:27:NA
    R/mx_download.R:28:NA
    R/mx_download.R:30:NA
    ... and 51 more lines
  • (ml5) Run covr::package_coverage() and try to test code in files with low % coverage.
> covr::package_coverage()
medrxivr Coverage: 77.60%
R/mx_download.R: 1.92%
R/mx_crosscheck.R: 96.15%
R/mx_search.R: 96.33%
R/mx_api.R: 100.00%
R/mx_info.R: 100.00%
> styler::style_pkg()
Styling  12  files:
 R/medrxivr.RR/mx_api.RR/mx_crosscheck.RR/mx_download.RR/mx_info.RR/mx_search.Rtests/testthat.Rtests/testthat/test-api.Rtests/testthat/test-crosscheck.Rtests/testthat/test-download.Rtests/testthat/test-info.Rtests/testthat/test-search.R     ℹ 
────────────────────────────────────────
Status	Count	Legend4	File unchanged.8	File changed.
x 	0	Styling threw an error.
────────────────────────────────────────
Please review the changes carefully!

Reviewers: @tts and @njahn82
Due date: 2020-07-01

This was referenced Jun 4, 2020
@mcguinlu
Copy link
Member Author

mcguinlu commented Jun 4, 2020

Hi @maurolepore

Thanks for your inital review of our package. I've gone through it and try to address each point below:

ml1: Overlap with fulltext
Personally, I think that fulltext and medrxivr should continue to be two seperate packages, but that there is the potential for intergration between the two (as occured with the aRxiv package). In all honesty, I completely forgot to reply to the issue on fulltext 🤦‍♂️- sorry @sckott! The reason I think they should be seperate is two fold:

  • In the first instance, the two packages take fundamentally different approaches to searching. At present, the medrxivr workflow is to create a local copy of the whole medRxiv repository via the API (or maintained static snapshot), and then search it locally using the mx_search() function to find relevant articles [i.e. all data -> local search -> results]. In comparison, fulltext takes the same approach to search bioRxiv as the biorxivr package does (note: medRxiv and bioRxiv are very similar, so I am using fulltext's approach to bioRxiv for comparison here). They both paste the search (see this code line) on after the base URL (https://www.biorxiv.org/search) and scrape the resulting page(s), essentially mimicing what would happen if you performed a search on the site itself [i.e. remote search -> results]. The search functionality offered by this approach is dependent on what the site itself offers, and so therefore is not as comprehensive as that offered by medrxivr (e.g. you can't use regexes to define capitalisation/alternative spellings, or use the NEAR operator). In addition, using the search functionality of the site itself (i.e. by pasting the query onto the search/ path) has been shown to not be very reproducible/transparent, which is what originally motivated the development of medrxivr.

  • Secondly, as far as I can tell (correct me if I am wrong @sckott), search strings for fulltext do not vary based on the database searched - for example, if you use ft_search() without specifying the source, it uses the same string to search every database. This would cause issues for anything beyond a simple search, as medrxivr allows for advanced search strings that would not be compatible with other data sources. Based on this, my argument is that medrxivr should be a standalone full package, and a simple restricted version of the medrxivr search could be implemented in fulltext, provided @sckott is happy to implement the medrxivr workflow [i.e. download data then search] within fulltext.

ml2: Overlap with biorxivr
Thanks for highlighting this. I did come across this package while developing medrxivr - however, the last work on this package took place 5 years ago, before introduction of the API which you refer to in your query. In addition, while the base URL of the API contains "bioRxiv" (e.g. https://api.biorxiv.org/), this is only because the same organisation (Cold Spring Harbour) is responsible for both repositories. The actually endpoint for the medRxiv API is https://api.biorxiv.org/details/medrxiv/[interval]/[cursor]/[format]. Finally, as mentioned above, biorxivr relies on the search functionality offered by the site (e.g. by pasting the query on after the search/ path) rather than performing the searches itself.

ml3: Spelling
I've added this as an issue (ropensci/medrxivr#4) and plan to address it soon.

ml4/ml5: Test coverage
In hindsight, I should have highlighted this as a potential sticking point in my inital submission. The single file which is dragging down the average coverage contains mx_download(), which takes the dataframe of records identified by the user as relevant and downloads a PDF for each one. In fact, this function has a test suite (see https://github.com/mcguinlu/medrxivr/blob/master/tests/testthat/test-download.R), but because it manipulates files and folders on a users machine, I have these tests set to skip_on_CRAN, which in turn means that covr doesn't pick them up. This is something I was hoping to get feedback on during the review process, as I am not sure if this is best practice or what a viable alternative would be?

ml6: styler
I've added this as an issue (ropensci/medrxivr#5) and plan to address it soon.

Hopefully this addresses your inital concerns, but please do let me know if anything is unclear, if my responses are insufficient, or if you need further details!

@maurolepore
Copy link
Member

maurolepore commented Jun 6, 2020

Thanks @mcguinlu! I think {medrxivr} merits to move to the next stage. I'll now start searching for reviewers.

ml1 and ml2

Here is my conclusion. I base it on your answers above, and on this quote from rOpenSci's
guidelines on overlap:

"An R package that replicates the functionality of an existing R package
may be considered for inclusion in the rOpenSci suite if it significantly
improves on alternatives in any repository (RO, CRAN, BioC) by being ...
better in usability and performance".

I considered the packages {medrxivr}, {fulltext} and {biorxivr}. I see an
overlap in what the packages aim to do but not in how they do
it. This would not justify the overlap in general; but in this case I think it does.

Compared to the other packages, {medrxivr} searches locally. This ensures the
results can be reproduced; and enables searching with regular expressions. The
different approach to searching also means that to integrate {medrxivr} into
the other packages seems challenging. This might be eventually possible,
but first {medrxivr} may need to mature independently.

ml3 to ml6

@mcguinlu, please let me know or check the boxes as you address these issues.

ml7

@mcguinlu, I see the positive aspects of the "local" approach to searching that {medrxivr} implements; but I understand that {medrxivr} downloads the entire database. I worry this may not scale up. Here are some questions I have; you may discuss them directly with the reviewers:

  • How big is the database? How fast does it grow? And how long does it take to download it in a range of reasonable conditions? What happens in a range of extreme conditions?
  • Is the process transparent and "polite" to the user?

Maybe you can avoid downloading the database and still provide flexible queries. For example, see how pkgsearch::advanced_search() does it -- apparently it uses elastic, which supports regular expressions, wildcards, fuzziness, and more). If you implement something similar you might get some guidance from @maelle; she is one of the two developers of pkgsearch::advanced_search().

@maurolepore
Copy link
Member

maurolepore commented Jun 6, 2020

@mcguinlu, please do this (from these guidelines):

add a rOpenSci review badge to their README, via rodev::use_review_badge(), rodev::use_review_badge(<issue_number>). Badge URL is https://badges.ropensci.org/<issue_id>_status.svg. Full link should be:

[![](https://badges.ropensci.org/<issue_id>_status.svg)](https://github.com/ropensci/software-review/issues/<issue_id>)

@maurolepore
Copy link
Member

maurolepore commented Jun 6, 2020

(ml8) @mcguinlu have you suggested any reviewer?

(The editor guidelines suggest you might but I fail to find them here or at #369.)

@maurolepore
Copy link
Member

maurolepore commented Jun 10, 2020

Thanks @tts and @njahn82 for accepting to review this package. Your reviews are due on 2020-07-01 -- that is three weeks from today.

Let me know any questions you have.

@tts
Copy link

tts commented Jun 28, 2020

Package Review

Hi @maurolepore and @mcguinlu - here is my review. Thanks for this opportunity, and all the best for the package!

  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

  • A statement of need clearly stating problems the software is designed to solve and its target audience in README
  • Installation instructions: for the development version of package and any non-standard dependencies in README
  • Vignette(s) demonstrating major functionality that runs successfully locally
  • Function Documentation: for all exported functions in R help
  • Examples for all exported functions in R Help that run successfully locally
  • Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).
For packages co-submitting to JOSS

The package contains a paper.md matching JOSS's requirements with:

  • A short summary describing the high-level functionality of the software
  • Authors: A list of authors with their affiliations
  • A statement of need clearly stating problems the software is designed to solve and its target audience.
  • References: with DOIs for all those that have one (e.g. papers, datasets, software).

Functionality

  • Installation: Installation succeeds as documented.
  • Functionality: Any functional claims of the software been confirmed.
  • Performance: Any performance claims of the software been confirmed.
  • Automated tests: Unit tests cover essential functions of the package
    and a reasonable range of inputs and conditions. All tests pass on the local machine.
  • Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

Final approval (post-review)

  • The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 10

  • Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.

Review Comments

medRxiv has been accepting preprints for a year now. Their API does not offer any search capabilities, so clearly medrxivr has a function to fill. Regular expressions and Boolean logic are state-of-the-art ways to fine-tune search queries, so users of this package should be happy. In addition, you can download all searched and found preprints as PDF files, which is handy and helpful.

Although the target group and the goal of the package are clearly defined, it took me some time to understand the core functionality. I suppose the main reason for this is the varying terminology of data sources used in vignettes and help pages. The way I understand the logic looks like this:

medrxiv

In short, for a search target there are two options, the dataset I download myself from medRxiv, or the dataset provided by the GitHub repo. The former can be either all items or just a subset limited by date. The latter is all items. Technically speaking, my download uses the medRxiv API, but the dataset in the repo is built by scraping the medRxiv web site on a daily basis. My understanding is that the main reasons for the scraped dataset are to provide a reliable data source for those occasions when the API does not serve well or not at all, and lighten the burden of the API usage.

How long does it take to download all metadata from the API? I tested it from two physical locations with a differing bandwidth:

start_time <- Sys.time()
medrxiv_data <- mx_api_content()
end_time <- Sys.time()
end_time - start_time
Time difference of 1.434701 mins (1Gb/s line, work)
Time difference of 3.045589 mins (28Mb/s line, home)

So far this is not bad, especially if you run the function once a day. One minor thing: is there any way to gracefully stop the process if started by accident? When the RStudio's red Stop button is hit, the following error is thrown

Error in curl::curl_fetch_memory(url, handle = handle): Operation was aborted by an application callback
Request failed [ERROR]. Retrying in 1 seconds...

httr::RETRY is a new function to me. Thanks for this, I will definitely try to use it myself at some point. I wonder though if it allows a clean, user-friendly, forced exit and if yes, how should it be defined?

How rapidly can we expect medRxiv to grow? Looking back, the amount of submissions accelerated when the still very much prevailing COVID-19 pandemic began.

library(tidyverse)
library(medrxivr)

mx_data <- mx_api_content()

stats <- mx_data %>% 
  mutate(date = as.Date(date)) %>% 
  group_by(date) %>% 
  summarise(count = n())

png(filename="medrxivstats.png", 
    units="cm", 
    width=20, 
    height=20,
    pointsize=12, 
    res=72)

qplot(x=date, y=count,
      data=stats, na.rm=TRUE,
      main="medRxiv item growth",
      xlab="Date of submission", ylab="Number of submissions")

dev.off()

medrxivstats

Search is a key component of this package, and vignettes help in building search queries. The medrxivr one shows how to use the mx_search function: either within a two-step process, or with a one-step or piped process. The examples are a little confusing though because the functions shown are not the same; the first example uses mx_api_content, the second one mx_api which does not exists. I suppose mx_api is a typo, maybe the name of a former version? The vignette building-complex-search-strategies shows several strategies to filter data, and also how to use regular expressions. Very helpful. One minor thing about this example

mx_results <- mx_search(query = "dementia",
                        NOT = "mild cognitive impairment")

The NOT argument does not match to Mild cognitive impairment which is found in one abstract, so perhaps better to use the form of [mM]ild cognitive impairment instead.

In mx_search , the data argument is important because it defines the target. Again, the example in the help file is slightly misleading because there is no mx_raw function. A former version this one too I presume?

When I ran mx_search with zero arguments, my first thought was that there are some issues with error handling. The query starts but clearly you need to include the search string too! However, after some time the error handling kicks in and correctly reminds me of the missing query argument. If I am not mistaken, the delay was caused by the latency of the default data source in the GitHub repository.

As of writing this, how long does it take to query the repo?

start_time <- Sys.time()
mx_results <- mx_search(query = "molecular")
end_time <- Sys.time()
(end_time - start_time)
Using medRxiv snapshot - 2020-06-27 06:01
Found 226 record(s) matching your search.
Time difference of 20.75107 secs

To me this is acceptable, but people of today tend to be impatient. Still, when the same search against my local copy of the medRxiv database takes only 0.5 secs, you begin to wonder which one to use. I noticed that the question of how to efficiently host and serve a dataset is something you and the editor have already discussed about. Unfortunately, I cannot give any advice, but am very much interested to learn about this topic too. I hope you will find a good solution.

Downloading PDFs works smoothly and as promised. Note: the mx_download help file example of mx_search uses a limit argument which is not defined.

The Shiny application that comes with the package is a beautiful piece of work, and the idea of delivering reproducible code is a nice one indeed. However, there are some issues with the code. Both the basic and advanced search codes throw an error when run in R.

Basic:

query <- "coronavirus"
mx_results <- mx_search(query) 

Error: $ operator is invalid for atomic vectors

Advanced:

topic1 <- c("coronavirus")
topic2 <- c("airborne")
query <- list(topic1, topic2)
mx_results <- mx_search(query, from.date =20190101, to.date =20200628, NOT = c(""), deduplicate = TRUE) 

Error in UseMethod("filter_") : no applicable method for 'filter_' applied to an object of class "list"

I was noted by @maurolepore that the package includes also a short manuscript to be submitted to Journal of Open Source Software. I found the manuscript in the inst directory, read it, and found it to be both clear and concise. Good luck!

@mcguinlu
Copy link
Member Author

mcguinlu commented Jun 29, 2020

Hi @tts,

Just a short note to say thanks so much for your review. I've given it a quick skim, and it seems that everything you propose will be straightforward to implement. I'll go through your comments systematically soon, and post a response/list of changes. (@maurolepore, a process question - is it better for me to wait until the second reviewer has filed their review before beginning to make changes?)

Thanks in particular for spotting the discrepancies across the package (old function names in the examples, missing definitions for arguments, problems with the code from the app). You are correct that there is some hangover from an earlier version of the package/early versions of the package functions - I thought I had caught them all, but obviously not! When I started developing medrxivr, the medRxiv API didn't exist, meaning the data argument of mx_search() was not required. To note, this is also what's causing the reproducible code from the Shiny app to fail, as under the new version of the function, mx_search(query) is read as mx_search(data = query).

One specific thing I wanted to follow-up on was that the "Automated testing" item in the reviewer checklist is not marked as complete - did you have any specific issues with/reccomendations for this area of the pacakge?

@maurolepore
Copy link
Member

maurolepore commented Jun 29, 2020

@tts, thanks for your wonderful review!

@mcguinlu, RE

"Is it better for me to wait until the second reviewer has filed their review before beginning to make changes?"

Both reviewers should work on the exact same package. You may change the package in a separate branch, but please only merge it after both reviewers submitted their review.

@tts
Copy link

tts commented Jun 29, 2020

One specific thing I wanted to follow-up on was that the "Automated testing" item in the reviewer checklist is not marked as complete - did you have any specific issues with/reccomendations for this area of the pacakge?

@mcguinlu Sorry, my bad. Both devtools::check() and devtools::test() ran without errors. Checked that item in the list.

@maurolepore
Copy link
Member

maurolepore commented Jul 2, 2020

@njahn82, I hope you are well. Could you please update us about your review?

@njahn82
Copy link
Member

njahn82 commented Jul 6, 2020

@njahn82
Copy link
Member

njahn82 commented Jul 8, 2020

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

  • Briefly describe any working relationship you have (had) with the package authors.
  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

  • A statement of need clearly stating problems the software is designed to solve and its target audience in README
  • Installation instructions: for the development version of package and any non-standard dependencies in README
  • Vignette(s) demonstrating major functionality that runs successfully locally
  • Function Documentation: for all exported functions in R help
  • Examples for all exported functions in R Help that run successfully locally
  • Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).
For packages co-submitting to JOSS

The package contains a paper.md matching JOSS's requirements with:

  • A short summary describing the high-level functionality of the software
  • Authors: A list of authors with their affiliations
  • A statement of need clearly stating problems the software is designed to solve and its target audience.
  • References: with DOIs for all those that have one (e.g. papers, datasets, software).

Functionality

  • Installation: Installation succeeds as documented.
  • Functionality: Any functional claims of the software been confirmed.
  • Performance: Any performance claims of the software been confirmed.
  • Automated tests: Unit tests cover essential functions of the package
    and a reasonable range of inputs and conditions. All tests pass on the local machine.
  • Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

Final approval (post-review)

  • The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 5 hours

  • Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.

Review Comments

This is very timely package that not just reflect the increasing popularity of open access preprints in Health Sciences, but also issues around finding and searching them. Although a growing suite of scholarly search engines make medRxiv preprints available, there seems to be no standard way to retrieve data from medRxiv thoroughly and systematically. Also finding full-texts is challenging, because medRxiv preprints are not made available via PubMed Central. Similiary, Crossref metadata, medRxiv's DOI registration agency, lack links to pdf full-texts.

Before I share my code review, I want to disclose that I neither have an academic background in Health Sciences nor have I been involved in systematic reviews as a librarian. I will therefore focus on more formal aspects of the package and its design.

Overall Design

The package contains functions to retrieve metadata from medRxiv, applying complex search strategies on a metadata snapshot, and download pdf full-texts. However, the source code repository contains a considerable amount of other functionality as well, which is outside of the R directory and excluded from the package build in .Rbuildignore:

  • app comprises a nice-looking and useful Shiny app helping users to build queries and visually explore the results.
  • data-extraction has scripts and functions for fetching and validating data.

There's also a link to (daily updated) data in an external GitHub repo, https://github.com/mcguinlu/medrxivr-data/, which is used in an exported R function.

My main concern with this approach is that dependencies, which are not part of the package, are loaded, and in one case installed. The code outside of the R folder also lacks documentation using roxygen tags and tests, and there's some redundancy. I feel that R code not part of the {medrxivr} package build either needs to be factored out should be moved into the R/ directory.

In the following, I will focus on the functionality, which is part of the package build.

README

  • The README is very helpful to get started with the package. A brief description of what medRxiv is and a link to the preprint server would make the README more informative.
  • Maybe the distinction between downloading a snapshot and searching the remote snapshot could be made a bit clearer. I first started to download the whole corpus, and then realised that there's already a snapshot that I can use instead.
  • I love @tts sketch of the overall design. Maybe it can be adapted and re-used?

Documentation

  • In the beginning, I followed the docs on https://mcguinlu.github.io/medrxivr/index.html and faced several issues. It took me a while before realising that the pkgdown website is outdated, because it was not build after code and documentation updates.
  • Documentation of functions could be expanded, particularly roxygen tags @import and @importFrom do not cover all external functions used.
  • Documentation of mx_search() refers to a function called mx_raw(), which is not part of the package.
  • Preprint are a quite new scholarly communication phenomena in Health Science and not all health scientists publish preprints regularly. Moreover, some other preprint servers target health science publications. Therefore, I think it would be good to warn users that medRxiv only contains parts of the Health Science (preprint) literature.

Vignette

  • There are three vignettes, which is great. Again, the general overview misses a sentence about what the preprint server medRxiv is about.
  • Not all code chunks are rendered. Some are introduced with a blank between the ticks and {r} Is this intentional?

Functionality

  • There is a considerable duplication of code regarding the API call, which can make it hard to update the package in case of API changes. It would be good to have a single function for the API call.
  • URL paths are constructed using paste. httr::modify_url() and the query of httr::GET() allow passing arguments to a API. Furthermore, {httr} provides helpful functionality to capture API errors more systematically than in the current implementation.
  • mx_crosscheck() does web scraping, which is fine according to the robots.txt. However, the requested crawl delay of 7 sec has been not implemented, yet.

Here's the checking using {polite}

polite::bow("https://www.medrxiv.org/archive", force = TRUE)
#> <polite session> https://www.medrxiv.org/archive
#>     User-agent: polite R package - https://github.com/dmi3kno/polite
#>     robots.txt: 68 rules are defined for 1 bots
#>    Crawl delay: 7 sec
#>   The path is scrapable for this user-agent

Created on 2020-07-08 by the reprex package (v0.3.0)

  • mx_search() returns a grouped tibble. Personally, I prefer to have an ungrouped tibble. The column date is of type double, not date.
  • Because of the downloading time, it is good to have feedback about the progress. Maybe re-using a progress bar functionality like from {progress} can lead to less code, while expanding the current feedback mechanism.
  • mx_search(): rOpenSci style guide recommends snake case for params (from.date and to.date)
  • Finally, I wonder, if Europe PMC could be of use for searching medRxiv. Europe PMC search syntax is quite extensive and supports Boolean operator, wildcards and controlled vocabularies. What are the reasons not using it for searching medRxiv? Is it an indexing lag, or lacking metadata?

Here's a reprex using the vignette example, which took less than 2 second.

library(tidyverse)
library(europepmc)
ep_q <-
  c('PUBLISHER:"medRxiv" AND (mendelian* AND (randomisation OR randomization))')
epmc_l <- europepmc::epmc_search(ep_q, "raw", limit = 10000)
#> 91 records found, returning 91

my_df <-
  purrr::map_dfr(epmc_l, `[`, c("doi", "title", "abstractText"))
my_df %>%
  filter_at(vars(abstractText, title), any_vars(
    grepl(
      "[Mm]endelian(\\s)([[:graph:]]+\\s){0,4}randomi([[:alpha:]])ation",
      .
    )))
#> # A tibble: 81 x 3
#>    doi           title                         abstractText                     
#>    <chr>         <chr>                         <chr>                            
#>  1 10.1101/2020… Cardiometabolic traits, seps… Objectives: To investigate wheth…
#>  2 10.1101/2020… The relationship between gly… Aims: To investigate the relatio…
#>  3 10.1101/2020… Modifiable lifestyle factors… Aims: Assessing whether modifiab…
#>  4 10.1101/2020… Influence of blood pressure … Objectives: To determine whether…
#>  5 10.1101/2020… Increased adiposity is prote… Background Breast and prostate c…
#>  6 10.1101/2020… Examining the association be… Background: We examined associat…
#>  7 10.1101/2020… Investigating the potential … Aim: Use Mendelian randomisation…
#>  8 10.1101/2020… Unhealthy Behaviours and Par… Objective: Tobacco smoking, alco…
#>  9 10.1101/2020… Exploring the causal effect … BACKGROUND: Hearing loss has bee…
#> 10 10.1101/2020… Genetically informed precisi… Impaired lung function is associ…
#> # … with 71 more rows

Created on 2020-07-08 by the reprex package (v0.3.0)

(Disclaimer: I maintain the {europepmc} package and I am curios to learn more about potential shortcomings using Europe PMC instead of a primary literature source. Because I also find it sometimes not very helpful when reviewers point to their own work, I do not expect you to consider this :-))

Testing

  • All tests passed, but it took a while. My duration was 1221.5 sec. However, I was connected to the internet via a cell phone connection during the review of the package.
  • I realised that a lot of skipping for CI platforms happens and I wonder why? Is it the run-time?

I think that's it from me! Thank you for making Health Science preprints more accessible and better discoverable! Happy to help further with the process!

@maurolepore
Copy link
Member

maurolepore commented Jul 8, 2020

Thanks @njahn82 for your review!

@mcguinlu, please aim to address the comments of both @tts and @njahn82 within the next 2 weeks.

@njahn82
Copy link
Member

njahn82 commented Jul 9, 2020

Hi @mcguinlu. Sorry, while still playing with your app, I just realised that I was wrong and nothing is installed from the Shiny app. Please ignore this bit from the review.

@mcguinlu
Copy link
Member Author

mcguinlu commented Jul 9, 2020

@njahn82 Thanks a million for your detailed review! At a quick skim, everything you flag/recommend is fixable/implementable, and will definitely help to improve the functionality. I'm also looking forward to examing europepmc further - to be honest, I was not aware that medRxiv preprints were indexed in Europe PMC.

@maurolepore Just confirming that I have seen this, and so am aiming to address the comments by 23rd July (at the latest).

@mcguinlu
Copy link
Member Author

mcguinlu commented Jul 27, 2020

Hi all (esp @maurolepore)

A brief message to let you know that I have most of the changes requested made, but due to external circumstances, I haven't yet finished off the small number of outstanding items. I'm now aiming to have it ready for re-review by Thursday week (6th August) at the very latest.

Very sorry for the delay, and hope this is okay!

@maurolepore
Copy link
Member

maurolepore commented Jul 27, 2020

That's okay. Thanks for letting me know.

@tts
Copy link

tts commented Aug 10, 2020

Hi @mcguinlu and thanks for your efforts! Below, I'll use your headings, and give my remarks to each of them.

General comments.
In addition to adding your suggested diagram, I have tried to make the language used across the documentation more consistent, but please do point out anything that could be clearer!

Great, much better now. I cannot find anything more to complain :)

API.
having to hit "Esc"/click "Stop" multiple times in order to actually get mx_api_content() to stop. Is this right?

Yes, that's what I mean, and I understand your explanation. Several clicks do terminate the download, so I find this sufficient.

Snapshot.
Trying mx_search(query="molecular")

Time difference of 6.319079 secs for me, which is not bad.

Vignette/README

Ok now.

Download

Ok.

Shiny app
mx_search()

Reproducible code works now fine, and a missing data | query argument is caught right away. Good!

One new comment

mx_info(commit = "master")
Error in mx_info(commit = "master") : could not find function "mx_info"

Except this comment, I give my approval.

@mcguinlu
Copy link
Member Author

mcguinlu commented Aug 14, 2020

Thanks for the further feedback both (and Happy Friday)! Please find my responses to your comments below:

Editor (@maurolepore)

  • ml3: spelling::spell_check_package() still shows some unknown words. Please update your words list and consider automating the process with usethis::use_spell_check().
    I have automated the spellcheck now as recommended.

  • ml4: goodpractice::gp() still suggests some improvements.
    I have addressed all issues raised, and goodpractice::gp() now does not recommend any further improvements.

  • ml5: covr::package_coverage() shows greater coverage than before; thanks. The only file that's still a little low is R/mx_crosscheck.R. Please consider adding more tests or excluding code as necessary (https://github.com/r-lib/covr#exclusions).
    I have added more tests to increase the coverage, and where it is not possible to test the error handling behaviour (e.g. because it's not possible to simulate the user not having an internet connection or the API returning a specific message), I have excluded lines as needed. The skipped lines are all marked with a #nocov comment, so can be readily found for inspection. I've included the output of my local run of covr::package_coverage() below:

medrxivr Coverage: 100.00%
R/helpers.R: 100.00%
R/mx_api.R: 100.00%
R/mx_crosscheck.R: 100.00%
R/mx_download.R: 100.00%
R/mx_export.R: 100.00%
R/mx_info.R: 100.00%
R/mx_search.R: 100.00%
R/mx_snapshot.R: 100.00%
  • ml6: usethis::use_tidy_style() suggests some files could improve. Please run usethis::use_tidy_style() and consider committing the changes.
    I have run this and commited the changes.

  • ml7: On the website, the Reference tab shows "All functions". Maybe you can help users navigate this reference by grouping functions in some meaningful way? (see https://pkgdown.r-lib.org/reference/build_reference.html).
    I had added keywords to the functions already, but hadn't realised that you needed to alter the _pkgdown.yml file in order to group the functions. This has now been implemented, and functions are grouped into three categories: "Accessing medRxiv/bioRxiv data", "Performing the search", and "Helper functions".

  • ml8: You may want to consider setting up a CI services for a wider range of environments. Here are two workflows you may use -- standard, and full.
    Thanks for the recommendation - I have gone with the standard workflow, and R CMD passes in all environments.

  • ml9: I see three .Rmd files inside vignettes/ but only two in the Articles section of the website. Is this intentional? Also, vignettes are great, but they can make the installation heavier. Consider the difference between use_vignette() and use_article().
    Yes, this is intentional. When you include a .Rmd file with the same name as the package in the vignette/ folder, pkgdown treats this as a special type of vignette ("Get Started"). From the pkgdown website:

    A vignette with the same name as the package (e.g., vignettes/pkgdown.Rmd or vignettes/articles/pkgdown.Rmd) automatically becomes a top-level "Get started" link, and will not appear in the articles drop-down. (If your package name include a ., e.g. pack.down, use a - in the vignette name, e.g. pack.down.Rmd.)

    I have also taken your advice and converted the two vignettes covering advanced topics to articles, and signposted to them in the final introductory vignette.

  • ml10: I recommend walking through the steps listed by use_release_issue() or devtools::release(). Even if you don't submit to CRAN, walking through the process can help you find details to improve.
    As a result of this process, the following changes were made:

    • xml2 was removed from the DESCRIPTION as it is now longer needed now that the package does not perform any web-scraping.
    • Titles of some of the functions were edited to be more comprehensive, so that the pkgdown function list is more useful.
    • README.html was removed from the top level directory.
  • ml11: The vignettes show code but not output. Reproducible examples are most useful when they include the output because readers can understand what the code does even if they choose not to run the code themselves. This is why reprex::reprex() prints output (https://reprex.tidyverse.org/).
    Thanks for this feedback. I have decided not to produce output for the one remaining vignette, as the example code in this vignette calls the API via mx_api_content(). I am worried that enabling evalutation of the code in this vignette would mean that it would take a long time to render and make installing the package slow. However, for the two new articles (converted from vignettes as per ml9, and included only in the pkgdown website), the output is now shown.

Reviewer 1 (@tts)

Glad to hear things are a bit clearer now!

The reason mx_info() is not found is that it is an internal function (medrxivr:::mx_info()) and should not have been available in the function list on the pkgdown website. I had marked several internal function with the "Internal" keyword, which should have hidden them, but it seems that pkgdown is case sensitive and the correct keyword is "internal". This has been corrected now and the internal functions now longer appear in the website's function list.

Finally, just wanted to confirm that your details in the DESCRIPTION are correct?

@tts
Copy link

tts commented Aug 17, 2020

@mcguinlu Yes, my details in DESCRIPTION are correct.

@maurolepore
Copy link
Member

maurolepore commented Aug 19, 2020

@mcguinlu, just to let you know that I believe @njahn82 will respond to your changes next week.

@mcguinlu
Copy link
Member Author

mcguinlu commented Aug 19, 2020

Great - thanks for letting me know!

@njahn82
Copy link
Member

njahn82 commented Aug 23, 2020

Great job @mcguinlu, and thank you for the careful and thorough consideration of my review. I feel, it is clearer now what the package does and how it relates to the Shiny app and the backup/dump mechanism.

Thank you also for cross-checking with Europe PMC and demonstrating the added value of the medrxivr package.

Although all my suggestions have been addressed, I have some final suggestions

  • I wonder if the returned data frames from the mx_api_* family could be also represented as tibbles?

  • The package does a good job in parsing and cleaning preprint metadata. Unfortunately, I cannot find documentation or an example showcasing what is actually returned. Can you provide one reproducible example in the README and/or extend the documentation in the function docs?

  • In the function docs of mx_export(), it says Dataframe returned by mx_search(), but I realised that also data obtained from the mx_api_ family can be exported as bib file using mx_export().

@mcguinlu
Copy link
Member Author

mcguinlu commented Aug 24, 2020

Thanks @njahn82. Just to note as well that I recently moved the snapshot functionality from relying on my local Task Scheduler to working from GitHub Actions, so it should now be a lot more robust (in the past, if my local PC experienced network issues, the snapshot would not be taken).

In response to your comments:

I wonder if the returned data frames from the mx_api_*() family could be also represented as tibbles?

  • The package now returns tibbles across the board. I had never really understood the difference, but after a bit of research, I do prefer the printing defaults for tibble objects.

The package does a good job in parsing and cleaning preprint metadata. Unfortunately, I cannot find documentation or an example showcasing what is actually returned. Can you provide one reproducible example in the README and/or extend the documentation in the function docs?

  • Hoping I understood this ask correctly, there is now a section in the README that desribes how to access the raw, uncleaned API data using the mx_api_*() functions, which also points to a section in the API article on the pkgdown website that provides more detail and an example of the uncleaned output. In addition, a clearer description of what the cleaning process entails has been included in the documention of the mx_api_*() functions (e.g. here)

In the function docs of mx_export(), it says Dataframe returned by mx_search(), but I realised that also data obtained from the mx_api_ family can be exported as bib file using mx_export().

@maurolepore, I have checked that these changes don't throw any new errors and that goodpractice doesn't recommend any changes. I've also re-run styler/spelling functions and commited any modications.

Hoping we are nearly there!

@maurolepore
Copy link
Member

maurolepore commented Aug 28, 2020

Thanks @njahn82 and @mcguinlu,

@njahn82, once again, please consider @mcguinlu 's changes and respond with either your approval or further suggestions for improvement.

@njahn82
Copy link
Member

njahn82 commented Sep 2, 2020

Thank you again @mcguinlu for your careful consideration of my review! All my suggestions have been addressed.

@maurolepore
Copy link
Member

maurolepore commented Sep 5, 2020

Approved! Thanks @mcguinlu for submitting and @tts and @njahn82 for your reviews! 😄

To-dos:

  • Transfer the repo to rOpenSci's "ropensci" GitHub organization under "Settings" in your repo. Soon you will be invited to a team that should allow you to do so. You'll be made admin once you do.
  • Fix all links to the GitHub repo to point to the repo under the ropensci organization.
  • If you already had a pkgdown website and are ok relying only on rOpenSci central docs building and branding,
    • deactivate the automatic deployment you might have set up
    • remove styling tweaks from your pkgdown config but keep that config file
    • replace the whole current pkgdown website with a redirecting page
    • replace your package docs URL with https://docs.ropensci.org/package_name
    • In addition, in your DESCRIPTION file, include the docs link in the URL field alongside the link to the GitHub repository, e.g.: URL: https://docs.ropensci.org/foobar (website) https://github.com/ropensci/foobar
  • Fix any links in badges for CI and coverage to point to the ropensci URL. We no longer transfer Appveyor projects to ropensci Appveyor account so after transfer of your repo to rOpenSci's "ropensci" GitHub organization the badge should be [![AppVeyor Build Status](https://ci.appveyor.com/api/projects/status/github/ropensci/pkgname?branch=master&svg=true)](https://ci.appveyor.com/project/individualaccount/pkgname).
  • We're starting to roll out software metadata files to all ropensci packages via the Codemeta initiative, see https://github.com/ropensci/codemetar/#codemetar for how to include it in your package, after installing the package - should be easy as running codemetar::write_codemeta() in the root of your package.

From #380 (comment) I see you wish to automatically submit to the Journal of Open Source Software? If so:

  • Activate Zenodo watching the repo
  • Tag and create a release so as to create a Zenodo version and DOI
  • Submit to JOSS at https://joss.theoj.org/papers/new, using the rOpenSci GitHub repo URL. When a JOSS "PRE REVIEW" issue is generated for your paper, add the comment: This package has been reviewed by rOpenSci: https://LINK.TO/THE/REVIEW/ISSUE, @ropensci/editors

Should you want to acknowledge your reviewers in your package DESCRIPTION, you can do so by making them "rev"-type contributors in the Authors@R field (with their consent). More info on this here.

Welcome aboard! We'd love to host a post about your package - either a short introduction to it with an example for a technical audience or a longer post with some narrative about its development or something you learned, and an example of its use for a broader readership. If you are interested, consult the blog guide, and tag @stefaniebutland in your reply. She will get in touch about timing and can answer any questions.

We've put together an online book with our best practice and tips, this chapter starts the 3d section that's about guidance for after onboarding. Please tell us what could be improved, the corresponding repo is here.

@annakrystalli
Copy link
Contributor

annakrystalli commented Sep 8, 2020

Hello @mcguinlu! I've just invited you to the @ropensci/medrxivr team! You should now be allowed to transfer the repo. Once you do, just ping me here and I'll transfer full admin rights back to you 🙂👍

@mcguinlu
Copy link
Member Author

mcguinlu commented Sep 8, 2020

Hi @annakrystalli have transferred across now. @maurolepore thanks for the checklist - I will work through it over the coming day. And finally, just flagging to @stefaniebutland that I would be interested in producing a blog post for this package!

Thanks again to @tts and @njahn82 for reviewing, and @maurolepore for herding us all through the process!

@annakrystalli
Copy link
Contributor

annakrystalli commented Sep 8, 2020

Thanks @mcguinlu ! Full admin rights now returned 👍

@danielskatz
Copy link

danielskatz commented Sep 8, 2020

Has this review been completed? (I'm asking as the editor of the corresponding JOSS submission)

@mcguinlu
Copy link
Member Author

mcguinlu commented Sep 8, 2020

To-dos:

  • Transfer the repo to rOpenSci's "ropensci" GitHub organization under "Settings" in your repo. Soon you will be invited to a team that should allow you to do so. You'll be made admin once you do.

  • Fix all links to the GitHub repo to point to the repo under the ropensci organization.

  • If you already had a pkgdown website and are ok relying only on rOpenSci central docs building and branding,

    • deactivate the automatic deployment you might have set up
    • remove styling tweaks from your pkgdown config but keep that config file
    • replace the whole current pkgdown website with a redirecting page
    • replace your package docs URL with https://docs.ropensci.org/package_name
    • In addition, in your DESCRIPTION file, include the docs link in the URL field alongside the link to the GitHub repository, e.g.: URL: https://docs.ropensci.org/foobar (website) https://github.com/ropensci/foobar
  • Fix any links in badges for CI and coverage to point to the ropensci URL. We no longer transfer Appveyor projects to ropensci Appveyor account so after transfer of your repo to rOpenSci's "ropensci" GitHub organization the badge should be [![AppVeyor Build Status](https://ci.appveyor.com/api/projects/status/github/ropensci/pkgname?branch=master&svg=true)](https://ci.appveyor.com/project/individualaccount/pkgname).

  • We're starting to roll out software metadata files to all ropensci packages via the Codemeta initiative, see https://github.com/ropensci/codemetar/#codemetar for how to include it in your package, after installing the package - should be easy as running codemetar::write_codemeta() in the root of your package.

From #380 (comment) I see you wish to automatically submit to the Journal of Open Source Software? If so:

  • Activate Zenodo watching the repo
  • Tag and create a release so as to create a Zenodo version and DOI
  • Submit to JOSS at https://joss.theoj.org/papers/new, using the rOpenSci GitHub repo URL. When a JOSS "PRE REVIEW" issue is generated for your paper, add the comment: This package has been reviewed by rOpenSci: https://LINK.TO/THE/REVIEW/ISSUE, @ropensci/editors

Okay, I've completed all the steps now @maurolepore! Re: the JOSS review, please see @danielskatz's comment above.

The one thing I wasn't clear on was how to replace the old pkgdown website with a redirecting page - seeing as the repo has been transferred across, the old pkgdown website on GitHub Pages (https://mcguinlu.github.io/medrxivr/index.html) no longer exists (I think?!), so I wasn't clear on how to set the redirect.

@maurolepore
Copy link
Member

maurolepore commented Sep 8, 2020

@danielskatz, thanks for checking. Yes, as the guest editor of this submission, I confirm this review has been completed.

@maurolepore
Copy link
Member

maurolepore commented Sep 8, 2020

@mcguinlu,

RE:

The one thing I wasn't clear on was how to replace the old pkgdown website with a redirecting page - seeing as the repo has been transferred across, the old pkgdown website on GitHub Pages (https://mcguinlu.github.io/medrxivr/index.html) no longer exists (I think?!), so I wasn't clear on how to set the redirect.

I'm sorry this isn't clear for you or me. But as you say, the working website seems correct. I see no reason to worry.

Here are a few more comments from section 8.1.4 of https://devguide.ropensci.org/:

  • If you intend to submit to CRAN, see CRAN gotchas. I"m happy to provide support through this process. Let me know.

Please check these boxes to confirm you've done the following last steps:

  • Add a CodeMeta file by running codemetar::write_codemeta() (codemetar GitHub repo)

  • Change any needed links, such those for CI badges

  • Re-activate CI services

    • For Travis, activating the project in the ropensci account should be sufficient
    • For AppVeyor, tell the author to update the GitHub link in their badge, but do not transfer the project: AppVeyor projects should remain under the authors’ account. The badge is AppVeyor Build Status.
    • For Codecov, the webhook may need to be reset by the author.
  • If authors maintain a gitbook that is at least partly about their package, contact an rOpenSci staff member so they might contact the authors about transfer to the ropensci-books GitHub organisation.

  • Add a “peer-reviewed” topic to the repo (it seems I'm the one supposed to do this but I apparently lack the privileges to access the "topics" settings -- see if you can or let me know).

--

Ping me when this is done and I'll then close this issue.

Thanks!

@maurolepore
Copy link
Member

maurolepore commented Sep 8, 2020

@mcguinlu, I see you already mentioned Stephanie Butland above. To comply with https://devguide.ropensci.org/editorguide.html#after-review, I also mention @ropensci/blog-editors for follow-up about your willingness to write a blog post or tech note.

Finally, please see https://devguide.ropensci.org/editorguide.html#package-promotion

@mcguinlu
Copy link
Member Author

mcguinlu commented Sep 9, 2020

So in response to the last few bits:

[x] Add a CodeMeta file by running codemetar::write_codemeta() (codemetar GitHub repo)

CodeMeta file added (see here)

[x] Change any needed links, such those for CI badges

All CI badges updated to point to the ropensci endpoints (e.g see here)

[x] Re-activate CI services

Done, and have triggered a build under the new set-up to ensure everything works, which was successful.

[ ] If authors maintain a gitbook that is at least partly about their package, contact an rOpenSci staff member so they might contact the authors about transfer to the ropensci-books GitHub organisation.

Not applicable to me.

[x] Add a “peer-reviewed” topic to the repo (it seems I'm the one supposed to do this but I apparently lack the privileges to access the "topics" settings -- see if you can or let me know).

Done!

Thanks also for the additional materials re: CRAN submission (I do intend to submit to CRAN in the near future) and promotion, and for looping in@ropensci/blog-editors.

And I think that's us!

@maurolepore
Copy link
Member

maurolepore commented Sep 9, 2020

@mcguinlu , thanks and congratulations! To the best of my knowledge, this completes the review process so I'll close now.

--

You may already know this. To prepare packages for CRAN, usethis::use_release_issue() is useful. And here are some aspects of the workflow that might help (including a link to some tweaks). Feel free to reach out with questions.

Are you in rOpenSci's Slack workspace? If not, I recommend you find someone who can add you. I have found friendly advice there that I wouldn't find anywhere else.

@stefaniebutland
Copy link
Member

stefaniebutland commented Sep 9, 2020

Hello @mcguinlu. We'd love to have a post about medrxivr.

Our Blog Guide has most of the information you should need, with both content and technical advice. For readers, it would be helpful to highlight how this package relates to similar ones and the specific niche that medrxivr fills. Once that's clear early in the post, your readers will give their attention.

Let me know when you'd like to submit a draft and I can suggest a publication date.

@stefaniebutland
Copy link
Member

stefaniebutland commented Sep 9, 2020

@mcguinlu Also let me know if you'd like a new invitation to rOpenSci Slack. We could move this discussion there for example.

@mcguinlu
Copy link
Member Author

mcguinlu commented Sep 10, 2020

@stefaniebutland a new invite would be great! I thought I had activated the first one correctly but apparently not (I am still getting to grips with Slack) 🤦‍♂️ and happy to continue chatting about this there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants