In [1]:
import numpy as np
import pandas as pd
from tableone import TableOne

import sys
sys.path.append('src/')
import tir as tir

## Load dataSample

An input dataset for *tir* is a dataframe containing time to each CGM readings along with time-dependent or time-independent covariates of interest. The input dataset should take long format, where
each row contains the information in one time interval in which a CGM is observed. As such, multiple rows in the dataset may correspond to one subject. 

*dataSample* is a simulated dataset perturbated from a real dataset. *dataSample* contains 50 subjects. For each subject, *dataSample* contains values for the following eight variables: *patient_id*, which is the unique identifier for each subject; *glucose*, which is the CGM reading; *value_in_range_70_180*, which indicates wheter the CGM readings in the 70-180mg/dL range (=1) or not (=0); *time*, which records the starting time in minutes of the CGM reading interval; *time2*, which records the ending time in minutes of the CGM reading interval; x1, which is a binary covariate; x2, which is a continuous covariate; *event*, which indicates whether the patient is discharged (event = 1) or not (event = 0) at the end of the CGM reading interval.



In [2]:
data = pd.read_csv('data/dataSample.csv')
data.head()

Unnamed: 0,patient_id,glucose,value_in_range_70_180,time,time2,x1,x2,event
0,EM016,87,1,0,5,1,0.606429,False
1,EM016,90,1,5,10,1,0.606429,False
2,EM016,93,1,10,15,1,0.606429,False
3,EM016,97,1,15,20,1,0.606429,False
4,EM016,103,1,20,25,1,0.606429,False


## Baseline characteristics

We can check the baseline characteristics of *dataSample* using *tableone()* function from python package *TableOne*

In [3]:
data_baseline = data.groupby('patient_id').first().reset_index()
columns = ['x1', 'x2']
categorical = ['x1']
table = TableOne(data_baseline, columns=columns, categorical=categorical)
table

Unnamed: 0,Unnamed: 1,Missing,Overall
n,,,41
"x1, n (%)",0.0,0.0,19 (46.3)
"x1, n (%)",1.0,,22 (53.7)
"x2, mean (SD)",,0.0,-0.0 (1.0)


## Calculate mean TIR with naive method
Function *naive_est()* in our package calculates the naive estimator for mean TIR. Function *naive_est()* provides seven arguments.

- data: a dataframe with the format similar to dataSample.
- min_time: lower bound of time window of interest. The default is zero.
- max_time: upper bound of time window of interest.
- boot: number of bootstrap replicates used to obtain the standard error estimation. The default is 'NULL' which indicates bootstrap is not conducted.
- id_col: parameter that indicates the column name of the subject identifier in data. The default is 'patient_id'.
- time: parameter that indicates the column name of the time that the subject takes CGM in data. The default is 'time'.
- value_in_range: parameter that indicates the column name of the indicator taht the CGM readings are whithin the target range.

The following code is used to calculate mean TIR by the naive method with glucose range 70-180mg/dL, time window 0-3 days (0-4200 minutes)，and 20 bootstrap replicates.

In [8]:
est_naive = tir.naive_est(data, max_time = 4200, boot = 20, value_in_range = 'value_in_range_70_180')

*niave_est()* resturns to a list structured as [estimate, standard error]

In [5]:
est_naive

[0.570091724124689, 0.043516099101601285]

## Calculate mean TIR with the proposed method under non-informative patient's early discharge assumption

Function *proposed_est_noninfo()* in our package calculates the proposed estimator for mean TIR under non-informative patient's early discharge assumption. Aruguments in *proposed_est_noninfo()* are same to that in *naive_est()*.

The following code is used to calculate mean TIR by the proposed method under non-informative patient's early discharge assumption with glucose range 70-180mg/dL, time window 0-3 days (0-4200 minutes)，and 20 bootstrap replicates.

In [6]:
est_proposed_noninformative = tir.proposed_est_noninfo(data, max_time = 4200, boot = 20, value_in_range = 'value_in_range_70_180')

*proposed_est_noninfo()* resturns to a list structured as [estimate, standard error]

In [7]:
est_proposed_noninformative

[0.5802970829721782, 0.04086405187765693]

## Calculate mean TIR with the proposed method under Cox-model-based patient's early discharge assumption

Function *proposed_est_cox()* in our package calculates the proposed estimator for mean TIR under non-informative patient's early discharge assumption. In addtion to the arguments "data", "min_time", "max_time", "boot", "id_col", and "value_in_range" used in *naive_est()* and *proposed_est_noninfo()*, *proposed_est_cox()* requires additional arguments:

- start_col: parameter that indicates the column name for lower bound of single CGM reading interval. The default value is "time".
- stop_col: parameter that indicates the column name for upper bound of single CGM reading interval. The default value is "time2".
- event_col: parameter that indicates the column name for indicator of patient's discharge. The default value is "event".
- formula: parameter that indicates the covariates considered to fit Cox's model. The formula has the format like "x1+x2".


The following code is used to calculate mean TIR by the proposed method under Cox-model-based patient's early discharge assumption with glucose range 70-180mg/dL, time window 0-3 days (0-4200 minutes)，20 bootstrap replicates, and covariates "x1" and "x2".

In [10]:
# A warning might be raised because of some coding style in lifelines to perform Cox regression.
# It will NOT affect the results.
# Here we ignore the warnings.
# Comment the following two lines if you want to see the warnings.
import warnings
warnings.filterwarnings('ignore')

est_proposed_cox = tir.proposed_est_cox(data, max_time = 4200, boot = 20, formula = 'x1+x2', value_in_range = 'value_in_range_70_180')

*proposed_est_cox()* resturns to a list structured as [estimate, standard error]

In [11]:
est_proposed_cox

[0.5812507573495849, 0.04282632878067873]