Skip to content

Developer Guideline

Matthijs Berends edited this page Jun 23, 2024 · 37 revisions

AMR package Developer Guideline

Welcome to the Developer Guideline of the AMR R package. This guideline explains about repository workflows and updates of package elements.

Contents

Introduction

Copyright

To start, it is important to know that this R package and all of its components are free, open-source software and licensed under the GNU General Public License (GPL) v2.0. Open-source software does not mean that there are no legal constraints. There are actually some profound ones, since GPL-2.0 in a nutshell means that this package:

  • May be used for commercial and private purposes, but may not be used for patent purposes

  • May be modified, although (1) modifications must also be released under the GPL-2.0 license when distributing the package, and (2) changes made to the code must be documented (using NEWS.md)

  • May be distributed, although (1) source code must be made available when the package is distributed, and (2) a copy of the license and copyright notice must be included with the package

  • Comes with a LIMITATION of liability, and with NO warranty

The full legal text is included on this repository here.

General Git(Hub) Workflow

This repository uses Git hooks to support automated generation of R documentation, automated semantic software versioning, and automated export of our data sets for other software (MS Excel, SPSS, SAS, Stata, Apache Parquet, Apache Feather).

Pre-commit checks in git

All updates to the repository should be done locally using git commit (or RStudio) and not the GitHub website, since local commands allow the use of our git prehooks, allowing automated semantic versioning and R documentation updates.

When using git commit, a script will be run to increase the version number, update the date and R documentation. Note: This only works on Unix systems, such as macOS and Linux.

To set this up, run this command once when working locally in the repository:

git config --local core.hooksPath ".github/prehooks"

Now, when using git commit:

git commit -am "test commit"
# Running prehook...
# >>  Updating R documentation...
# >>  done.
# >>  
# >>  Updating semantic versioning and date...
# >>  - latest tag is 'v1.8.1', with 26 previous commits
# >>  - AMR pkg version set to 1.8.1.9027
# >>  - updated DESCRIPTION
# >>  - updated NEWS.md
# >>  
# [main 300b93e] (v1.8.1.9027) test commit
#  3 files changed, 3 insertions(+), 4 deletions(-)

To circumvent using the checks, you can use the argument —-no-verify (or -n for short) with git commit, or add the text "no-check" or "no-verify" to the commit message. This is useful for releasing new versions, since otherwise the version number in DESCRIPTION and NEWS.md would become overwritten.

# add checks:
git commit -am  "small website fix"
# skip checks:
git commit -am  "small website fix (no-checks)"
git commit -am  "small website fix (no-verify)"
git commit -amn  "small website fix"
git commit --no-verify -am "small website fix"

In RStudio, where the git commit command runs in the background, it is the most convenient way to add "no-checks" to the commit message.

GitHub Actions: website generation

The website (https://msberends.github.io/AMR) will be generated automatically if changes are pushed to the main branch. This is done using GitHub Actions, and the workflow file can be found here: .github/workflows/website.yaml. The website generation will be done in the latest Ubuntu LTS version and the current release version of R.

The website will be stored in the gh-pages branch.

Since a GitHub Action uses git pull to retrieve the repo contents, timestamps of files will not be preserved. This is a problem, since the ‘Data Set for Download’ vignette (https://msberends.github.io/AMR/articles/datasets.html) relies on timestamps to let users know when a data set was last updated. For this reason, the following code was added to the GitHub Action workflow file:

https://github.com/msberends/AMR/blob/d2edcf51adcb1b2e5dbba811dfc76549d10ffbf6/.github/workflows/website.yaml#L47-L54

GitHub Actions: all workflow files

This repository contains five GitHub Actions workflow files, each for a different purpose:

File Runs when Runs on Purpose
check.yaml Everyday at 1 AM;

After every push to any branch
Ubuntu 22.04 (R 3.0 to R-devel);

Latest Windows (R 3.6 to R-devel);

Latest macOS (R 3.6 to R-devel)
Run R CMD check, including all unit tests
check-pr.yaml In every pull request, including updates (not if author is repo member/owner) Ubuntu 22.04 (R-release and R-devel);

Latest Windows (R-release and R-devel);

Latest macOS (R-release and R-devel)
Run R CMD check, including all unit tests
codecovr.yaml After every push to any branch;

In every pull request, including updates
Latest Ubuntu (R-release) Check code coverage and upload to http://codecov.io/gh/msberends/AMR
lintr.yaml After every push to any branch;

In every pull request, including updates
Latest Ubuntu (R-release) Check coding style according to Tidyverse convention
website.yaml After every push to the 'main' branch Latest Ubuntu (R-release) Create website from scratch, with all examples

Updating the AMR Package

Add or update a language

Please read the separate Wiki page Add or Update a Language for Translation.

This process is also covered when committing a change, since data-raw/_pre_commit_hook.R contains the full workflow to update language files.

Update EUCAST/CLSI Guidelines

After updating these guidelines, be sure to add the new version numbers to R/aa_globals.R.

Clinical breakpoints

The clinical breakpoints from EUCAST and CLSI are stored in the data set clinical_breakpoints. To update this data set to include the latest guidelines, follow the instructions in data-raw/reproduction_of_clinical_breakpoints.R. There is no need to update the documentation manually, all values in the documentation that refer to the clinical_breakpoints data set are parametrised (such as the names of included guidelines). Running devtools::document() will do fine, though this is also part of the pre-commit hook.

This script will incorporate the last 10 years of the CLSI and the EUCAST guidelines.

Be sure to do some checks with the original e.g. EUCAST files to check if everything works as expected! For example, run scripts like this:

test_mics <- as.mic(c(0.256, 0.5, 1, 2, 4, 8, 16, 32, 64))

as.sir(test_mics, mo = "Escherichia coli", ab = "ciprofloxacin", guideline = "EUCAST")
as.sir(test_mics, mo = "Escherichia coli", ab = "ciprofloxacin", guideline = "CLSI")

as.sir(test_mics, mo = "Pseudomonas aeruginosa", ab = "ciprofloxacin", guideline = "EUCAST")
as.sir(test_mics, mo = "Pseudomonas aeruginosa", ab = "ciprofloxacin", guideline = "CLSI")

as.sir(test_mics, mo = "Streptococcus pneumoniae", ab = "amoxicillin", guideline = "EUCAST")
as.sir(test_mics, mo = "Streptococcus pneumoniae", ab = "amoxicillin", guideline = "CLSI")

EUCAST Inferred resistance / susceptibility

These rules are inside the Clinical Breakpoints tables from EUCAST and only available via their MS Excel and PDF files, we've found no other source in a machine-readable format such as TXT or CSV. The rules (in the Notes sections of each page/sheet) must be added manually to data-raw/eucast_rules.tsv, although most rules can be copied from an earlier version from that file.

EUCAST Expert rules

Expert rules from EUCAST are only available via their MS Excel and PDF files, we've found no other source in a machine-readable format such as TXT or CSV. The rules must be added manually to data-raw/eucast_rules.tsv, although most rules can be copied from an earlier version from that file.

Be sure to update the version numbers in R/data.R and R/aa_globals.R afterwards.

EUCAST Dosage guidelines

EUCAST Dosage guidelines are stored in the data set dosage. Up to 2022, EUCAST only distributes PDF files with their dosing guidelines. Adobe Acrobat is required to transform them to an Excel file. Follow the instructions in data-raw/reproduction_of_dosage.R to automatically update the data set.

Be sure to update the version numbers in R/data.R and R/aa_globals.R afterwards.

Update the microbial taxonomy

The microbial taxonomy is stored in the data set microorganisms. Updating this data set is almost 100% automated and can be done following the instructions in data-raw/reproduction_of_microorganisms.R. Note that it is required to download the full GBIF data set, which requires at least 10 GB of RAM to read into R.

Downloading data from LPSN requires an account. This is free and easy, and can be done here (or alternatively, visit https://lpsn.dsmz.de/downloads and click on Register at the bottom of the form).

There are a lot of unit tests in place to check its integrity after updating, but running a few manual checks never hurts:

as.mo("E. coli")
as.mo("eco")
as.mo("KLEPNE")

Update the antimicrobial agents

The package contains two data sets for antimicrobial agents: antibiotics and antivirals.

Antibiotics

The antiviral agents are stored in the data set antibiotics.R . To update this data set, follow the instructions in data-raw/reproduction_of_antibiotics.R. This script is not fully automated and requires some manual work. The parts to update DDDs and ATC codes are fully automated, though.

Antivirals

The antiviral agents are stored in the data set antivirals. To update this data set, follow the instructions in data-raw/reproduction_of_antivirals.R. This script is fully automated.

Other

Reproducibility scripts

The data-raw folder contains all scripts and git history required for any other maintenance task such as updating data or finding out about package development history.

S3 extensions

The AMR package supports extensive S3 support using self-defined data types, also with support for other packages. Read about the S3 object system of R in the free Advanced R book by Hadley Wickham.

In short, S3 allows to add new data types (called a class) to a package, to extend on e.g. character and Date. To add a new class labnumber as an extension of double, the basis works like this:

x <- c(20220001, 20220002, 20220003)
class(x) <- c("labnumber", "double")

# now print the object:
print(x)
#> [1] 20220001 20220002 20220003
#> attr(,"class")
#> [1] "labnumber" "double"

Now we add an S3 extension for print() to the package:

#' @export
print.labnumber <- function(x, ...) {
  x <- as.character(x)
  print(paste0("LAB-", substr(x, 1, 4), "-", substr(x, 5, 8)),
        quote = FALSE)
}

Which results in:

print(x)
#> [1] LAB-2022-0001 LAB-2022-0002 LAB-2022-0003

User visible classes

The AMR package contains 6 new classes (data types) using S3 extensions that users can create themselves with an as.xxx() function:

Class Created with Extension of Full object class Purpose Defined in file
ab as.ab() character c("ab", "character") Printing of antibiotic and antimycotic codes, ensuring integrity of antimicrobial codes R/ab.R
av as.av() character c("av", "character") Printing of antivirals, ensuring integrity of antiviral codes R/av.R
disk as.disk() integer c("disk", "integer") Cleaning of disk diffusion values, and printing, assigning, extracting them R/disk.R
mic as.mic() factor c("mic", "ordered", "factor") Cleaning of MIC values, using mathematical operators with them (over 80 extensions, such as >, mean, log2), and printing, assigning, extracting them R/mic.R
mo as.mo() character c("mo", "character") Cleaning of microbial codes and names, and printing, assigning, extracting them R/mo.R
sir as.sir() factor c("sir", "ordered", "factor") Interpreting and cleaning to SIR values, and printing, assigning, extracting them R/sir.R

Non-user visible classes

Additionally, the AMR package contains 5 classes that are used internally and do not have an as.xxx() function:

Class Created with Extension of Full object class Purpose Defined in file
ab_selector antibiotic selectors, such as carbapenems() character c("ab_selector", "character") Selecting/Filtering of antibiotic columns in data R/ab_selectors.R
ab_selector_any_all N/A logical c("ab_selector_any_all", "logical") Using ==, !=, any() and all() on antibiotic selectors R/ab_selectors.R
bug_drug_combinations bug_drug_combinations() data.frame At least c("bug_drug_combinations", "data.frame") but might inherit other classes, such as tbl_df of tibbles Printing and formatting the result of bug_drug_combinations() R/bug_drug_combinations.R
custom_eucast_rules custom_eucast_rules() list c("custom_eucast_rules", "list") Concatenating and printing custom EUCAST rules R/custom_eucast_rules.R
custom_mdro_guideline custom_mdro_guideline() list c("custom_mdro_guideline", "list") Concatenating and printing custom MDRO rules R/mdro.R

Support for other packages

The AMR package also extends foreign packages, by providing S3 classes for functions of those packages. Usually, these functions have to be imported but since the AMR package is designed to independent of any other package, the S3 extensions are loaded after the AMR package is loaded, as defined in R/zzz.R. The most important benefit is that even if those foreign do not exist anymore, the AMR package will work the exact same way without CRAN complaining about incompatible support. This greatly improves durability of our package.

Currently extended packages are cleaner, ggplot2, pillar, skimr, and vctrs. These are for that reason also in the Enhances field of the DESCRIPTION file.

Foreign package Foreign package function Additional (input) class Defined for class Defined in
pillar pillar_shaft() ab R/ab.R
pillar pillar_shaft() av R/av.R
pillar pillar_shaft() mo R/mo.R
pillar pillar_shaft() sir R/sir.R
pillar pillar_shaft() mic R/mic.R
pillar pillar_shaft() disk R/disk.R
pillar type_sum() ab R/ab.R
pillar type_sum() av R/av.R
pillar type_sum() mo R/mo.R
pillar type_sum() sir R/sir.R
pillar type_sum() mic R/mic.R
pillar type_sum() disk R/disk.R
cleaner freq() mo R/mo.R
cleaner freq() sir R/sir.R
skimr get_skimmers() mo R/mo.R
skimr get_skimmers() sir R/sir.R
skimr get_skimmers() mic R/mic.R
skimr get_skimmers() disk R/disk.R
ggplot2 autoplot() sir R/sir.R
ggplot2 autoplot() mic R/mic.R
ggplot2 autoplot() disk R/disk.R
ggplot2 autoplot() resistance_predict R/resistance_predict.R
ggplot2 fortify() sir R/sir.R
ggplot2 fortify() mic R/mic.R
ggplot2 fortify() disk R/disk.R
vctrs vec_ptype2() character ab_selector R/vctrs.R
vctrs vec_ptype2() ab_selector character R/vctrs.R
vctrs vec_cast() character ab_selector R/vctrs.R
vctrs vec_ptype2() logical ab_selector_any_all R/vctrs.R
vctrs vec_ptype2() ab_selector_any_all logical R/vctrs.R
vctrs vec_cast() logical ab_selector_any_all R/vctrs.R
vctrs vec_ptype2() character ab R/vctrs.R
vctrs vec_ptype2() ab character R/vctrs.R
vctrs vec_cast() character ab R/vctrs.R
vctrs vec_cast() ab character R/vctrs.R
vctrs vec_ptype2() character av R/vctrs.R
vctrs vec_ptype2() av character R/vctrs.R
vctrs vec_cast() character av R/vctrs.R
vctrs vec_cast() av character R/vctrs.R
vctrs vec_ptype2() character mo R/vctrs.R
vctrs vec_ptype2() mo character R/vctrs.R
vctrs vec_cast() character mo R/vctrs.R
vctrs vec_cast() mo character R/vctrs.R
vctrs vec_ptype2() integer disk R/vctrs.R
vctrs vec_ptype2() disk integer R/vctrs.R
vctrs vec_cast() integer disk R/vctrs.R
vctrs vec_cast() disk integer R/vctrs.R
vctrs vec_cast() double disk R/vctrs.R
vctrs vec_cast() disk double R/vctrs.R
vctrs vec_cast() character disk R/vctrs.R
vctrs vec_cast() disk character R/vctrs.R
vctrs vec_cast() character mic R/vctrs.R
vctrs vec_cast() double mic R/vctrs.R
vctrs vec_cast() mic character R/vctrs.R
vctrs vec_cast() mic double R/vctrs.R
vctrs vec_math() mic R/vctrs.R
vctrs vec_ptype2() character sir R/vctrs.R
vctrs vec_ptype2() sir character R/vctrs.R
vctrs vec_cast() character sir R/vctrs.R
vctrs vec_cast() sir character R/vctrs.R

Badge for sharing anywhere: