Skip to content

Commit

Permalink
added tighter function logic, added 2 more tests, moved README text t…
Browse files Browse the repository at this point in the history
…o an Overview vignette. These moved kidsides up to minor version 5
  • Loading branch information
ngiangre committed May 20, 2023
1 parent 4ed216a commit f4a373a
Show file tree
Hide file tree
Showing 7 changed files with 111 additions and 62 deletions.
2 changes: 1 addition & 1 deletion .Rbuildignore
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
^pkgdown$
^\.github$
^.*\.sqlite$
^vignettes$
^vignettes/articles$
^vignettes/.ipynb_checkpoints
^vignettes/*.sqlite
^\.ipynb$
Expand Down
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Package: kidsides
Title: Download, Cache, and Connect to KidSIDES
Version: 0.4.2
Version: 0.5.0
Authors@R: c(
person("Nicholas", "Giangreco",
email = "nick.giangreco@gmail.com",
Expand Down
46 changes: 29 additions & 17 deletions R/functions.R
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
#' Download the Pediatric Drug Safety database
#'
#' Download the database published in Giangreco et al. 2022. Warning, the size of the uncompressed 'sqlite' file is close to 900 MB. Use wit caution
#' Download the database published in Giangreco et al. 2022. Warning, the size of the uncompressed 'sqlite' file is close to 0.9GB or 900 MB. Use with caution
#'
#' @param method The method to download the sqlite database. See \code{download.file}
#' @param quiet Whether to download quietly. See \code{download.file}
#' @param timeout Extended download session for downloading this file. Default is 1000 seconds.
#' @param force Whether to force the download of the database. Defaults to FALSE.
#' @param force Whether to force the download of the database. Defaults to FALSE. Needs to be TRUE for database to download. The function will prompt for confirmation.
#'
#'
#' @return TRUE, invisibly
Expand All @@ -17,7 +17,7 @@
#'
#' @examples
#' if(FALSE){
#' download_sqlite_db()
#' download_sqlite_db() #set force=TRUE if desired to download 0.9GB file to machine
#' }
download_sqlite_db <- function(method="auto",quiet=FALSE,timeout=1e3,force=FALSE) {

Expand All @@ -30,31 +30,40 @@ download_sqlite_db <- function(method="auto",quiet=FALSE,timeout=1e3,force=FALSE
options(timeout=newTimeout)
}

if(!file.exists(get_db_path()[['destname']]) | force){
if(force){

ans <- utils::askYesNo(paste0("kidsides would like to download a 'sqlite' database to your cache directory at: ",dirname(get_db_path()[['dest_file']]), ". Is that okay?", sep = "\n"))
ans <- utils::askYesNo(
paste0("kidsides would like to download a 0.9GB 'sqlite' database to your cache. Is that okay?\nThe file will be located at at: ",
get_db_path()[['kidsides_cache']], sep = "\n")
)
if (!ans) stop("Exiting...", call. = FALSE)

if(!dir.exists(get_db_path()[['kidsides_cache']])){
dir.create(get_db_path()[['kidsides_cache']])
}

R.utils::downloadFile(
url = get_db_path()[['url']],
filename = get_db_path()[['dest_file']],
filename = get_db_path()[['dest_gzfile']],
method = method,
quiet = quiet
)
R.utils::gunzip(
get_db_path()[['dest_gzfile']],
get_db_path()[['dest_file']],
get_db_path()[['destname']],
overwrite=T
)
}else if(file.exists(get_db_path()[['dest_file']])){
message(paste0(
get_db_path()[['dest_file']]," already exists!"
"Already exists: ",get_db_path()[['dest_file']]
))
}else{
message(paste0(
"Attempt failed to check sqlite exists",
" or to download from the URL: ",
get_db_path()[['url']])
get_db_path()[['url']]),"\n",
" If you want to download for the first time,",
" set argument force=TRUE"
)
}

Expand All @@ -79,7 +88,7 @@ download_sqlite_db <- function(method="auto",quiet=FALSE,timeout=1e3,force=FALSE
#' disconnect_sqlite_db(con)
#' }
connect_sqlite_db <- function(){
DBI::dbConnect(RSQLite::SQLite(),dbname=get_db_path()[['destname']])
DBI::dbConnect(RSQLite::SQLite(),dbname=get_db_path()[['dest_file']])
}

#' Disconnect from the Pediatric Drug Safety database
Expand Down Expand Up @@ -124,20 +133,23 @@ get_db_path <- function(){
url <- "https://tlab-kidsides.s3.amazonaws.com/data/effect_peds_19q2_v0.3_20211119.sqlite.gz"
sqlite_gz_file <- basename(url)

path <- tools::R_user_dir("kidsides",which = "cache")
kidsides_cache <- tools::R_user_dir("kidsides",which = "cache")

full_path <- paste0(dirname(path),"/",
basename(path))
cache <- dirname(kidsides_cache)

if(!dir.exists(full_path)){
dir.create(full_path)
if(!dir.exists(cache)){
stop(message(paste0("Cache directory doesn't exist.\n",
"Should be located at ",cache)))
}

lst <- list()
lst[['url']] <- url
lst[['dest_file']] <- paste0(full_path,"/",sqlite_gz_file)
lst[['sqlite_gz_file']] <- sqlite_gz_file
lst[['sqlite_file']] <- strsplit(sqlite_gz_file,"\\.gz")[[1]][1]
lst[['destname']] <- paste0(full_path,"/",lst[['sqlite_file']])
lst[['cache']] <- cache
lst[['kidsides_cache']] <- kidsides_cache
lst[['dest_gzfile']] <- paste0(kidsides_cache,"/",sqlite_gz_file)
lst[['dest_file']] <- paste0(kidsides_cache,"/",lst[['sqlite_file']])
lst

}
29 changes: 1 addition & 28 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -61,34 +61,7 @@ kidsides::disconnect_sqlite_db(con)
```


# Background

Adverse drug reactions are a leading cause of morbidity and mortality that costs billions of dollars for the healthcare system. In children, there is increased risk for adverse drug reactions with potentially lasting adverse effects into adulthood. The current pediatric drug safety landscape, including clinical trials, is limited as it rarely includes children and relies on extrapolation from adults. Children are not small adults but go through an evolutionarily conserved and physiologically dynamic process of growth and maturation. We hypothesize that adverse drug reactions manifest from the interaction between drug exposure and dynamic biological processes during child growth and development.

We hypothesize that by developing statistical methodologies with prior knowledge of dynamic, shared information during development, we can improve the detection of adverse drug events in children. This data package downloads the SQLite database created by applying covariate-adjusted disproportionality generalized additive models (dGAMs) in a systematic way to develop a resource of nearly half a million adverse drug event (ADE) risk estimates across child development stages.

# Pediatric Drug Safety (PDS) data

## Observation-level data

The observation-level data, case reports for drug(s) potentially linked to adverse event(s), was collected by the Food and Drug Administration Adverse Event System (FAERS) in the US. This data is publicly available on the openFDA platform [here](https://open.fda.gov/data/downloads/) as downloadable [json files](https://api.fda.gov/download.json). However, utilizing this data as-is is non-trivial, where the drug event report data is published in chunks as a nested json structure each quarter per year since the 1990s. With an API key with extended permissions, I developed custom python notebooks and scripts available in the ‘openFDA_drug_event-parsing’ github repository (DOI: https://doi.org/10.5281/zenodo.4464544) to extract and format all drug event reports prior to the third quarter of 2019. This observation-level data used, called Pediatric FAERS, for downstream analyses is stored in the table `ade_raw`.

## Summary-level data

The drugs and adverse events reported were coded into standard, hierarchical vocabularies. Adverse events were standardized by the Medical Dictionary of Regulatory Activities (MedDRA) vocabulary (details of the hierarcy founds [here](https://www.meddra.org/how-to-use/basics/hierarchy)). Drugs were standardized by the Anatomical Therapeutic Class (ATC) vocabulary (details found [here](https://www.who.int/tools/atc-ddd-toolkit/atc-classification)). The reporting of adverse events and drugs can be dependent on the disease context of a report's subject. This was represented by summarizing the number of drugs of a therapeutic class for each report.

## Model-level data

We invented the disproportionality generalized additive model (dGAM) method for detecting adverse drug events from these spontaneous reports. We applied the logistic generalized additive model to all unique drug-event pairs in Pediatric FAERS. The drug-event GAM was used to quantify adverse event risk due to drug exposure versus no exposure across child development stages. Please see the references for the full specification and details on the GAM.

# PDSportal: accessible data access

We provide the [PDSportal](https://pdsportal.shinyapps.io/pdsportal/) as an accessible web application as well as a plaatform to download our database for the community to explore from identifying safety endpoints in clinical trials to evaluating known and novel developmental pharmacology.

# KidSIDES

The `kidsides` R package downloads a sqlite database to your local machine and connects to the database using the `DBI` R package. This is a novel data resource of half a million pediatric drug safety signals across growth and development stages. Please see the references for details on data fields and the [code repository](https://github.com/ngiangre/pediatric_ade_database_study) for the [paper](https://www.ssrn.com/abstract=3898786).
** See the `Overview` vignette for more details on the data and the online portal**

# References

Expand Down
47 changes: 32 additions & 15 deletions tests/testthat/test-functions.R
Original file line number Diff line number Diff line change
Expand Up @@ -7,29 +7,46 @@ test_that("URL is correct",{

})

test_that("Downloaded file is correct and sqlite correction is correct",{

con <- connect_sqlite_db()
test_that("get_db_path() returns list",{

expect_equal(
attr(con,"class")[1],
"SQLiteConnection"
class(get_db_path()),
"list"
)

expect_equal(
basename(attr(con,"dbname")),
basename(get_db_path()[['destname']])
})

test_that("get_db_path() returns more than 1 element",{

expect_true(
length(get_db_path())>1
)

disconnect_sqlite_db(con)
})

test_that("Gives message that it does not download sqlite file unless forced to",{
expect_message(
download_sqlite_db()
)
})

test_that("get_db_path() returns list",{
if(file.exists(get_db_path()[['dest_file']])){
test_that("Downloaded file is correct and sqlite correction is correct",{

expect_equal(
class(get_db_path()),
"list"
)
con <- connect_sqlite_db()

})
expect_equal(
attr(con,"class")[1],
"SQLiteConnection"
)

expect_equal(
basename(attr(con,"dbname")),
basename(get_db_path()[['dest_file']])
)

disconnect_sqlite_db(con)

})

}
2 changes: 2 additions & 0 deletions vignettes/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
*.html
*.R
45 changes: 45 additions & 0 deletions vignettes/Overview.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
---
title: "Overview"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Overview}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---

# Background

Adverse drug reactions are a leading cause of morbidity and mortality that costs billions of dollars for the healthcare system. In children, there is increased risk for adverse drug reactions with potentially lasting adverse effects into adulthood. The current pediatric drug safety landscape, including clinical trials, is limited as it rarely includes children and relies on extrapolation from adults. Children are not small adults but go through an evolutionarily conserved and physiologically dynamic process of growth and maturation. We hypothesize that adverse drug reactions manifest from the interaction between drug exposure and dynamic biological processes during child growth and development.

We hypothesize that by developing statistical methodologies with prior knowledge of dynamic, shared information during development, we can improve the detection of adverse drug events in children. This data package downloads the SQLite database created by applying covariate-adjusted disproportionality generalized additive models (dGAMs) in a systematic way to develop a resource of nearly half a million adverse drug event (ADE) risk estimates across child development stages.

# Pediatric Drug Safety (PDS) data

## Observation-level data

The observation-level data, case reports for drug(s) potentially linked to adverse event(s), was collected by the Food and Drug Administration Adverse Event System (FAERS) in the US. This data is publicly available on the openFDA platform [here](https://open.fda.gov/data/downloads/) as downloadable [json files](https://api.fda.gov/download.json). However, utilizing this data as-is is non-trivial, where the drug event report data is published in chunks as a nested json structure each quarter per year since the 1990s. With an API key with extended permissions, I developed custom python notebooks and scripts available in the ‘openFDA_drug_event-parsing’ github repository (DOI: https://doi.org/10.5281/zenodo.4464544) to extract and format all drug event reports prior to the third quarter of 2019. This observation-level data used, called Pediatric FAERS, for downstream analyses is stored in the table `ade_raw`.

## Summary-level data

The drugs and adverse events reported were coded into standard, hierarchical vocabularies. Adverse events were standardized by the Medical Dictionary of Regulatory Activities (MedDRA) vocabulary (details of the hierarchy founds [here](https://www.meddra.org/how-to-use/basics/hierarchy)). Drugs were standardized by the Anatomical Therapeutic Class (ATC) vocabulary (details found [here](https://www.who.int/tools/atc-ddd-toolkit/atc-classification)). The reporting of adverse events and drugs can be dependent on the disease context of a report's subject. This was represented by summarizing the number of drugs of a therapeutic class for each report.

## Model-level data

We invented the disproportionality generalized additive model (dGAM) method for detecting adverse drug events from these spontaneous reports. We applied the logistic generalized additive model to all unique drug-event pairs in Pediatric FAERS. The drug-event GAM was used to quantify adverse event risk due to drug exposure versus no exposure across child development stages. Please see the references for the full specification and details on the GAM.

# PDSportal: accessible data access

We provide the [PDSportal](https://pdsportal.shinyapps.io/pdsportal/) as an accessible web application as well as a plaatform to download our database for the community to explore from identifying safety endpoints in clinical trials to evaluating known and novel developmental pharmacology.

# KidSIDES

The `kidsides` R package downloads a sqlite database to your local machine and connects to the database using the `DBI` R package. This is a novel data resource of half a million pediatric drug safety signals across growth and development stages. Please see the references for details on data fields and the [code repository](https://github.com/ngiangre/pediatric_ade_database_study) for the [paper](https://doi.org/10.1016/j.medj.2022.06.001).

# References


Giangreco, Nicholas. Mind the developmental gap: Identifying adverse drug effects across childhood to evaluate biological mechanisms from growth and development. 2022. Columbia University, [PhD dissertation](https://doi.org/10.7916/d8-5d9b-6738).

Giangreco NP, Tatonetti NP. A database of pediatric drug effects to evaluate ontogenic mechanisms from child growth and development. Med (N Y). 2022 Aug 12;3(8):579-595.e7. [doi: 10.1016/j.medj.2022.06.001](https://doi.org/10.1016/j.medj.2022.06.001). Epub 2022 Jun 24. PMID: 35752163; PMCID: PMC9378670.


0 comments on commit f4a373a

Please sign in to comment.