### This demo reproduces the survival curve (Figure 2b) from the following paper

_Indicators of retention in remote digital health studies: A cross-study evaluation of 100,000 participants
Abhishek Pratap, Elias Chaibub Neto, Phil Snyder, Carl Stepnowsky, Noémie Elhadad, Daniel Grant, Matthew H. Mohebbi, Sean Mooney, Christine Suver, John Wilbanks, Lara Mangravite, Patrick J. Heagerty, Pat Arean, Larsson Omberg npj Digit. Med. 3, 21 (2020). https://doi.org/10.1038/s41746-020-0224-8_ 


The code below is rehashed from the following github repo:
https://github.com/apratap/digitalHealth_RetentionAnalysis_PublicRelease/



<img src="files/images/NPJ_Pratapetal.png" align="center"/>

### Setting up working environment

In [1]:
rm(list=ls())
options(stringsAsFactors = F)
library("install.load")
install_load("data.table", "gdata", "synapser", "jsonlite", "stringr")
install_load("plyr", "tidyverse", "doMC", "scales", "data.table")
install_load("gridExtra", "pheatmap", "printr", "ggthemes", "anytime")

gdata: read.xls support for 'XLS' (Excel 97-2004) files ENABLED.



gdata: read.xls support for 'XLSX' (Excel 2007+) files ENABLED.


Attaching package: ‘gdata’


The following objects are masked from ‘package:data.table’:

    first, last


The following object is masked from ‘package:stats’:

    nobs


The following object is masked from ‘package:utils’:

    object.size


The following object is masked from ‘package:base’:

    startsWith



New synapser version detected: 
    You are using synapser version 0.9.77.
    synapser version 0.10.101 is detected at http://ran.synapse.org.
    To upgrade to the latest version of synapser, please run the following command:
    install.packages("synapser", repos="http://ran.synapse.org")



TERMS OF USE NOTICE:
  When using Synapse, remember that the terms and conditions of use require that you:
  1) Attribute data contributors when discussing these data or results from these data.
  2) Not discriminate, identify, or recontact individuals o

### Login to synpase using locally stored credentials

In [2]:
syn = synapser::synLogin()

Welcome, Abhishek Pratap!

### Hard Coded Study Specific Colors for Survival Curve

In [3]:
STUDY_COLS = data.frame(study = c('SleepHealth', 'Brighten', 'Asthma', 'ElevateMS',  
                                  'mPower','Phendo','MyHeartCounts', 'Start'),
                        color = c('#4363D8', '#0B9FC1', '#E6194B', '#38A847',
                                  '#F032E6', '#f58231', '#800000', '#808000'))

### 1. Loading Data

Will only demonstrate some code blocks to show the data download from synapse 
Full Script [here](https://github.com/apratap/digitalHealth_RetentionAnalysis_PublicRelease/blob/master/analysis/loadData.R)

### 1.A - mPower Study 

In [4]:
##################
### mPower
##################
get_mpower_engagement_data <- function(){
  df <- fread(synGet("syn20929422")$path, data.table = F) %>%
    dplyr::rename(healthCode = uid)
}
get_mpower_mdata <- function() {
  fread(synGet("syn20929429")$path, data.table = F) %>%
    dplyr::rename(healthCode = uid)

}
mpower_mdata <- get_mpower_mdata()
mpower <- get_mpower_engagement_data()

### Explore downloaded data

In [5]:
head(mpower_mdata)

Unnamed: 0_level_0,study,healthCode,age_group,gender,diseaseStatus,state,race_ethnicity,clinicalReferral
Unnamed: 0_level_1,<chr>,<chr>,<chr>,<chr>,<lgl>,<chr>,<chr>,<lgl>
1,mPower,000240d1-1110-4dd2-a2d0-e344c37efd68,"(29,39]",Male,False,Colorado,Non-Hispanic White,False
2,mPower,0005a31d-e52c-447c-9971-ccc7bef667fb,"(29,39]",Male,False,Washington,Non-Hispanic White,False
3,mPower,00081bd9-9abd-4003-b035-de6cc3e8c922,"(59,120]",Male,False,,Asian,False
4,mPower,00086114-0bb3-460e-8841-94bc35d27d71,"(17,29]",Female,False,New York,Non-Hispanic White,False
5,mPower,001702e9-908d-4419-9c08-8ef5615d6b67,"(59,120]",Male,True,Mississippi,Non-Hispanic White,True
6,mPower,00182372-b75c-48f0-a74d-2c1f447bb0bd,"(17,29]",Male,False,New York,Non-Hispanic White,False


### Repeat the same for 8 studies 
_This step will not work unless the user has access rights to data for each study_

In [None]:
suppressWarnings( devtools::source_url("https://raw.githubusercontent.com/apratap/digitalHealth_RetentionAnalysis_PublicRelease/master/analysis/loadData.R") ) 

[36mℹ[39m SHA-1 hash of file is 57916e7cf67b5a0c119e675f8ec372c0d7a0bb56



Welcome, Abhishek Pratap!

[36mℹ[39m SHA-1 hash of file is 7e5a603a82c4675adabc17739ecb916c910369a8


Attaching package: ‘lubridate’


The following objects are masked from ‘package:data.table’:

    hour, isoweek, mday, minute, month, quarter, second, wday, week,
    yday, year


The following objects are masked from ‘package:base’:

    date, intersect, setdiff, union




### Survival Analysis 
How long participants lasted in a fully remote digital health study 

In [None]:
#### back up option in case the live data download fails
#load("~/Downloads/tmp_digitalHealth_retentiondata.RData")

In [None]:
install_load("survival", "survminer", "synapser")

#### Stratified Log Rank Test

In [None]:
nrow(userStats)
censor <- rep(1, nrow(userStats)) 
fit.test <- survdiff(survival::Surv(time=duration_in_study, event=censor, type = "right") ~ study, data = userStats )
fit.test
fit.plot <- survfit(survival::Surv(time=duration_in_study, event=censor, type = "right") ~ study, data = userStats )
fit.plot

In [None]:
p1 <- ggsurvplot(fit.plot, pval = F, conf.int = T, 
                 xlab = "Duration in study ",  
                 palette = STUDY_COLS$color,
                 risk.table = F,
                 risk.table.height = 0.3,
                 risk.table.y.text = FALSE,
                 legend = "right",
                 legend.labs = STUDY_COLS$study,
                 surv.median.line = "hv", ggtheme = theme_bw(base_size = 15))
p1