Stronger together? Potential and limitations of combining industry datasets to fill in global AMR surveillance gaps
Quentin Leclerc
This repository contains the code for our project submitted to the Vivli AMR Data Challenge. This repository is based on research using data from Pfizer, GSK, Johnson & Johnson, Paratek, Venatorx, Shionogi, obtained through https://amr.vivli.org. The GLASS dataset was manually extracted from the WHO GLASS dashboard (dataset available from https://github.com/qleclerc/GLASS2022).
Note: The raw industry surveillance datasets were obtained via Vivli and are not included in this repository. The prepare_data.R
script requires these datasets to be present in a folder named "raw", placed in the "data" folder.
We have included in the "data" folder the example combined dataset final_AMR_dataset.csv
for E. coli and K. pneumoniae resistance to ceftriaxone, ceftazidime, imipenem and meropenem, for the years 2018-19.
If you have the industry surveillance datasets, you can edit the prepare_data.R
script to choose the combinations of years, bacteria and antibiotics to extract and combine from the datasets. This script will automatically warn you if your chosen combination is not present within one or more datasets. The resulting combined dataset will be saved in the "data" folder as final_AMR_dataset.csv
.