## Understanding IOBP2 Dataset
This notebook tries provides details on the structure of the IOBP2 dataset and makes suggestions how to process the data.

## The IOBP2 study

**Title**: The Insulin-Only Bionic Pancreas Pivotal Trial: Testing the iLet in Adults and Children with Type 1 Diabetes


**Description**: This multi-center randomized control trial (RCT) will compare efficacy and safety endpoints using the insulin-only configuration of the iLet Bionic Pancreas (BP) System versus a control group using CGM during a 13-week study period.
    
**Devices**: iLet and Dexcom G6 system

**Study Population**: People with T1D ages 6+

# Data
The study data folder is named **IOBP2 RCT Public Dataset**

From the DataGlossary.rtf file, the following relevant files were identified which are stored in the **Data Tables** subfolder.

* **IOBP2DeviceiLet.txt**: All events logged on pump including CGM and insulin delivery 
* **PtRoster.txt**: Patient Roster

These are csv files ("|" separator) and host many columns related to the Tandem pump events and the Dexcom cgm. The glossary provides information about each column. Each file contains a limited amount of columns compared to the FLAIR data. Below are **all** of the columns contained in each file

## IOBP2DeviceiLet
* **PtID**: Patient ID
* **DeviceDtTm**: Local date and time on the device
* **CGMVal**: CGM glucose value
* **BGTarget**: Current target glucose level in mg/dl
* **InsDelivPrev**: Delivered insulin dose (U) of the prior executed step

## Questions


In [2]:
import os, sys, time, random
import pandas as pd
from datetime import datetime, timedelta
import numpy as np
from matplotlib import pyplot as plt

In [3]:
#get the file path
current_dir = os.getcwd(); 
original_data_path = os.path.join(current_dir, '..', 'data/raw')
cleaned_data_path = os.path.join(current_dir,  '..', 'data/cleaned')
path = os.path.join(original_data_path, 'IOBP2 RCT Public Dataset', 'Data Tables', 'IOBP2DeviceiLet.txt')

In [11]:
df_all_events = pd.read_csv(path, sep="|", low_memory=False,
                           usecols=['PtID', 'DeviceDtTm', 'CGMVal', 'BGTarget', 'InsDelivPrev', 'BasalDelivPrev',
                                    'BolusDelivPrev'])

## Check for DateTimes without Time part

In [12]:
print('Datetimes without time: ', len(df_all_events[df_all_events['DeviceDtTm'].str.len() <= 10]))

Datetimes without time:  147


## inspecting the event counts

In [15]:
df_all_events.head()

Unnamed: 0,PtID,DeviceDtTm,CGMVal,BGTarget,InsDelivPrev,BasalDelivPrev,BolusDelivPrev
0,183,8/14/2020 12:01:23 AM,91.0,120,0.0,0.0,0.0
1,183,8/14/2020 12:06:23 AM,102.0,120,0.0,0.0,0.0
2,183,8/14/2020 12:11:23 AM,105.0,120,0.0,0.0,0.0
3,183,8/14/2020 12:16:23 AM,103.0,120,0.0,0.0,0.0
4,183,8/14/2020 12:21:23 AM,98.0,120,0.0,0.0,0.0


## Find long acting users during study period

In [16]:
path = os.path.join(original_data_path, 'IOBP2 RCT Public Dataset', 'Data Tables', 'IOBP2Insulin.txt')
df_insulin_types = pd.read_csv(path, sep="|", low_memory=False)
                           

In [17]:
df_insulin_types.head()

Unnamed: 0,RecID,PtID,InsulinName,InsRoute,InsInjectionFreq,InsTypeStart,InsTypeStartDt,InsTypeStartUnknown,InsTypeStopDt,InsTypeStopUnknown,InsTypeStartEstimate,InsTypeStopEstimate
0,2,347,Humalog (Lispro),Pump,,In use at time of enrollment,,,4/9/2020,,,
1,3,183,Degludec (Tresiba),Injection,Unknown,In use at time of enrollment,,,8/7/2020,,,
2,4,183,Novolog (Aspart),Injection,4,In use at time of enrollment,,,8/7/2020,,,
3,8,413,Humalog (Lispro),Pump,,In use at time of enrollment,,,6/17/2020,,,
4,9,183,Novolog (Aspart),Pump,,Started after enrollment,8/7/2020,,8/14/2020,,,


In [18]:
df_insulin_types.InsulinName.unique()

array(['Humalog (Lispro)', 'Degludec (Tresiba)', 'Novolog (Aspart)',
       'Novolog Fiasp', 'Lantus (Glargine) 2 times per day',
       'Lantus (Glargine) 1 time per day', 'Basaglar (Glargine, U100)',
       'Levemir (Detemir) 1 time per day', 'Admelog',
       'Levemir (Detemir) 2 times per day', 'Humulin N (NPH)',
       'Afrezza (insulin human)', 'Toujeo (Glargine, U300)',
       'Humulin 70/30', 'Regular (R) (Humulin R or Novolin R)',
       'Humalog 50/50'], dtype=object)

## Glargine users

In [20]:
glargine_users = df_insulin_types[df_insulin_types['InsulinName'].str.contains("Glargine")]
glargine_users

Unnamed: 0,RecID,PtID,InsulinName,InsRoute,InsInjectionFreq,InsTypeStart,InsTypeStartDt,InsTypeStartUnknown,InsTypeStopDt,InsTypeStopUnknown,InsTypeStartEstimate,InsTypeStopEstimate
16,24,235,Lantus (Glargine) 2 times per day,Injection,2,In use at time of enrollment,,,7/19/2020,,,
19,30,577,Lantus (Glargine) 1 time per day,Injection,1,In use at time of enrollment,,,9/16/2020,,,
23,36,233,"Basaglar (Glargine, U100)",Injection,2,In use at time of enrollment,,,9/8/2020,,,
27,43,440,"Basaglar (Glargine, U100)",Injection,1,In use at time of enrollment,,,11/2/2019,,,
28,45,554,Lantus (Glargine) 2 times per day,Injection,2,In use at time of enrollment,,,1/2/2020,,,
...,...,...,...,...,...,...,...,...,...,...,...,...
1064,1210,593,"Basaglar (Glargine, U100)",Injection,1,Started after enrollment,10/27/2020,,,,,
1067,1213,302,Lantus (Glargine) 1 time per day,Injection,1,Started after enrollment,9/23/2020,,,,,
1072,1218,366,Lantus (Glargine) 1 time per day,Injection,1,Started after enrollment,12/10/2020,,,,,
1074,1220,429,Lantus (Glargine) 1 time per day,Injection,1,Started after enrollment,3/17/2021,,,,,


In [24]:
display(glargine_users[(glargine_users.InsTypeStopDt.isnull()) & (glargine_users.InsTypeStart == 'In use at time of enrollment')])
print('Number of participants on Glargine at Start with no Stop: ',
      len(glargine_users[(glargine_users.InsTypeStopDt.isnull()) & (glargine_users.InsTypeStart == 'In use at time of enrollment')]))

Unnamed: 0,RecID,PtID,InsulinName,InsRoute,InsInjectionFreq,InsTypeStart,InsTypeStartDt,InsTypeStartUnknown,InsTypeStopDt,InsTypeStopUnknown,InsTypeStartEstimate,InsTypeStopEstimate
73,105,215,Lantus (Glargine) 1 time per day,Injection,1,In use at time of enrollment,,,,,,
123,166,216,Lantus (Glargine) 1 time per day,Injection,Unknown,In use at time of enrollment,,,,,,
127,172,31,Lantus (Glargine) 1 time per day,Injection,Unknown,In use at time of enrollment,,,,,,
140,189,287,"Basaglar (Glargine, U100)",Injection,1,In use at time of enrollment,,,,,,
144,193,422,Lantus (Glargine) 1 time per day,Injection,Unknown,In use at time of enrollment,,,,,,
154,204,52,Lantus (Glargine) 1 time per day,Injection,1,In use at time of enrollment,,,,,,
169,219,201,Lantus (Glargine) 1 time per day,Injection,1,In use at time of enrollment,,,,,,
170,220,566,Lantus (Glargine) 1 time per day,Injection,1,In use at time of enrollment,,,,,,
199,256,320,Lantus (Glargine) 1 time per day,Injection,1,In use at time of enrollment,,,,,,
209,267,368,Lantus (Glargine) 1 time per day,Injection,1,In use at time of enrollment,,,,,,


Number of participants on Glargine at Start with no Stop:  29


In [26]:
display(glargine_users[glargine_users.InsTypeStart == 'Started after enrollment'])

Unnamed: 0,RecID,PtID,InsulinName,InsRoute,InsInjectionFreq,InsTypeStart,InsTypeStartDt,InsTypeStartUnknown,InsTypeStopDt,InsTypeStopUnknown,InsTypeStartEstimate,InsTypeStopEstimate
47,67,76,Lantus (Glargine) 1 time per day,Injection,1,Started after enrollment,7/20/2020,,1/21/2021,,,
50,72,362,Lantus (Glargine) 1 time per day,Injection,1,Started after enrollment,2/4/2020,,,,,
54,77,235,Lantus (Glargine) 2 times per day,Injection,2,Started after enrollment,7/26/2020,,10/11/2020,,,
58,81,440,"Basaglar (Glargine, U100)",Injection,1,Started after enrollment,11/8/2019,,3/15/2020,,,
65,97,233,"Basaglar (Glargine, U100)",Injection,2,Started after enrollment,9/12/2020,,3/2/2021,,,
...,...,...,...,...,...,...,...,...,...,...,...,...
1064,1210,593,"Basaglar (Glargine, U100)",Injection,1,Started after enrollment,10/27/2020,,,,,
1067,1213,302,Lantus (Glargine) 1 time per day,Injection,1,Started after enrollment,9/23/2020,,,,,
1072,1218,366,Lantus (Glargine) 1 time per day,Injection,1,Started after enrollment,12/10/2020,,,,,
1074,1220,429,Lantus (Glargine) 1 time per day,Injection,1,Started after enrollment,3/17/2021,,,,,


In [27]:
display(glargine_users[(glargine_users.InsTypeStopDt.isnull()) & (glargine_users.InsTypeStart == 'Started after enrollment')])

Unnamed: 0,RecID,PtID,InsulinName,InsRoute,InsInjectionFreq,InsTypeStart,InsTypeStartDt,InsTypeStartUnknown,InsTypeStopDt,InsTypeStopUnknown,InsTypeStartEstimate,InsTypeStopEstimate
50,72,362,Lantus (Glargine) 1 time per day,Injection,1,Started after enrollment,2/4/2020,,,,,
618,741,148,"Basaglar (Glargine, U100)",Injection,1,Started after enrollment,12/5/2020,,,,,
656,781,59,"Basaglar (Glargine, U100)",Injection,1,Started after enrollment,12/9/2020,,,,,
666,791,304,Lantus (Glargine) 1 time per day,Injection,Unknown,Started after enrollment,12/4/2020,,,,,
668,793,135,Lantus (Glargine) 1 time per day,Injection,1,Started after enrollment,6/6/2020,,,,,
679,804,235,Lantus (Glargine) 2 times per day,Injection,2,Started after enrollment,1/13/2021,,,,,
725,852,76,Lantus (Glargine) 1 time per day,Injection,1,Started after enrollment,1/22/2021,,,,,
782,921,236,Lantus (Glargine) 1 time per day,Injection,1,Started after enrollment,8/27/2020,,,,,
825,968,315,"Toujeo (Glargine, U300)",Injection,1,Started after enrollment,9/27/2020,,,,,
827,970,314,Lantus (Glargine) 1 time per day,Injection,1,Started after enrollment,9/14/2020,,,,,


In [28]:
display(glargine_users[(~glargine_users.InsTypeStopDt.isnull()) & (glargine_users.InsTypeStart == 'Started after enrollment')])

Unnamed: 0,RecID,PtID,InsulinName,InsRoute,InsInjectionFreq,InsTypeStart,InsTypeStartDt,InsTypeStartUnknown,InsTypeStopDt,InsTypeStopUnknown,InsTypeStartEstimate,InsTypeStopEstimate
47,67,76,Lantus (Glargine) 1 time per day,Injection,1,Started after enrollment,7/20/2020,,1/21/2021,,,
54,77,235,Lantus (Glargine) 2 times per day,Injection,2,Started after enrollment,7/26/2020,,10/11/2020,,,
58,81,440,"Basaglar (Glargine, U100)",Injection,1,Started after enrollment,11/8/2019,,3/15/2020,,,
65,97,233,"Basaglar (Glargine, U100)",Injection,2,Started after enrollment,9/12/2020,,3/2/2021,,,
120,162,59,"Basaglar (Glargine, U100)",Injection,1,Started after enrollment,7/29/2020,,9/9/2020,,,
485,593,304,Lantus (Glargine) 1 time per day,Injection,1,Started after enrollment,9/6/2020,,9/16/2020,,,
525,641,157,"Basaglar (Glargine, U100)",Injection,1,Started after enrollment,2/7/2020,,2/11/2020,,,
550,670,235,Lantus (Glargine) 1 time per day,Injection,1,Started after enrollment,11/18/2020,,11/18/2020,,,
565,686,502,Lantus (Glargine) 1 time per day,Injection,1,Started after enrollment,6/30/2020,,7/1/2020,,,
593,715,534,"Basaglar (Glargine, U100)",Injection,1,Started after enrollment,4/21/2020,,4/23/2020,,,


In [29]:
path = os.path.join(original_data_path, 'IOBP2 RCT Public Dataset', 'Data Tables', 'IOBP2DiabTreatment.txt')
df_treatment = pd.read_csv(path, sep="|", low_memory=False)
df_treatment.head()                      

Unnamed: 0,RecID,PtID,ParentLoginVisitID,Visit,InsModPump,InsModInjections,InsModInhaled,InsModNone,PumpUse,PumpType,...,UnitsInsBasilOrLongAct,UnitsInsBasilOrLongActUnk,NumPumpBolusOrShortAct,NumPumpBolusOrShortActUnk,BGTestAvgNumMeter,BGTestMetDatNotAvail,BGTestAvgNumPtRep,BGTestPtRepNotAvail,BGTestMeterFreq,BGTestPtRepFreq
0,1,71,1291,Week 2,1.0,,,,,Beta Bionics Gen 4 iLet,...,70.0,,3.0,,15.0,,15.0,,per week,per week
1,2,529,1350,Week 2,1.0,,,,,Beta Bionics Gen 4 iLet,...,,1.0,4.0,,0.0,,0.0,,per day,per day
2,3,586,1351,Week 2,1.0,,,,,Beta Bionics Gen 4 iLet,...,,1.0,4.0,,4.0,,,1.0,per week,
3,4,23,1352,Week 2,1.0,,,,,Beta Bionics Gen 4 iLet,...,,1.0,3.0,,0.0,,0.0,,per day,per day
4,5,254,1354,Week 2,1.0,,,,,Beta Bionics Gen 4 iLet,...,,1.0,4.0,,0.0,,0.0,,per day,per day


In [30]:
df_treatment.columns

Index(['RecID', 'PtID', 'ParentLoginVisitID', 'Visit', 'InsModPump',
       'InsModInjections', 'InsModInhaled', 'InsModNone', 'PumpUse',
       'PumpType', 'PumpTypeUnk', 'UnitsInsTotal', 'UnitsInsUnk',
       'UnitsInsBasilOrLongAct', 'UnitsInsBasilOrLongActUnk',
       'NumPumpBolusOrShortAct', 'NumPumpBolusOrShortActUnk',
       'BGTestAvgNumMeter', 'BGTestMetDatNotAvail', 'BGTestAvgNumPtRep',
       'BGTestPtRepNotAvail', 'BGTestMeterFreq', 'BGTestPtRepFreq'],
      dtype='object')

In [31]:
df_treatment[df_treatment.UnitsInsBasilOrLongAct>0]

Unnamed: 0,RecID,PtID,ParentLoginVisitID,Visit,InsModPump,InsModInjections,InsModInhaled,InsModNone,PumpUse,PumpType,...,UnitsInsBasilOrLongAct,UnitsInsBasilOrLongActUnk,NumPumpBolusOrShortAct,NumPumpBolusOrShortActUnk,BGTestAvgNumMeter,BGTestMetDatNotAvail,BGTestAvgNumPtRep,BGTestPtRepNotAvail,BGTestMeterFreq,BGTestPtRepFreq
0,1,71,1291,Week 2,1.0,,,,,Beta Bionics Gen 4 iLet,...,70.0,,3.0,,15.0,,15.0,,per week,per week
5,6,287,1355,Week 2,,1.0,,,,,...,40.0,,5.0,,,1.0,1.0,,,per week
7,8,217,1359,Week 2,1.0,,,,,Beta Bionics Gen 4 iLet,...,25.8,,4.0,,1.0,,1.0,,per day,per week
8,9,24,1414,Week 2,1.0,,,,,Tandem t:slim X2,...,21.8,,8.0,,,1.0,1.0,,,per day
9,10,212,1424,Week 2,1.0,,,,,Medtronic 670G,...,24.9,,5.0,,,1.0,2.0,,,per week
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
491,492,330,6202,Week 13,1.0,,,,,Tandem t:slim X2,...,27.9,,20.0,,0.0,,0.0,,per day,per day
492,493,559,6207,Week 13,1.0,,,,,Tandem t:slim X2 with Control:IQ,...,73.5,,6.0,,0.0,,0.0,,per day,per day
493,494,230,6237,Week 13,1.0,,,,,Tandem t:slim X2,...,23.9,,6.0,,0.0,,0.0,,per day,per day
494,495,239,6274,Week 13,1.0,,,,,Insulet OmniPod Insulin Management System,...,29.1,,5.0,,,1.0,,1.0,,
