# TIR Collection

## Loading relevant libraries

`tidyverse` is an R library that contains core libraries used to read, analyze, and plot data

`rGV` is an R package specifically for reading and analyzing continuous glucose monitor data of different formats. The following link contains the research paper explaining how and why the package was created, as well as its relevance to clinical studies.


https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9631526/

In [3]:
library(tidyverse)
library(rGV)

── [1mAttaching core tidyverse packages[22m ──────────────────────────────────────────────────────────────── tidyverse 2.0.0 ──
[32m✔[39m [34mdplyr    [39m 1.1.2     [32m✔[39m [34mreadr    [39m 2.1.4
[32m✔[39m [34mforcats  [39m 1.0.0     [32m✔[39m [34mstringr  [39m 1.5.0
[32m✔[39m [34mggplot2  [39m 3.4.3     [32m✔[39m [34mtibble   [39m 3.2.1
[32m✔[39m [34mlubridate[39m 1.9.2     [32m✔[39m [34mtidyr    [39m 1.3.0
[32m✔[39m [34mpurrr    [39m 1.0.2     
── [1mConflicts[22m ────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()
[36mℹ[39m Use the conflicted package ([3m[34m<http://conflicted.r-lib.org/>[39m[23m) to force all conflicts to become errors
Loading required package: chron


Attaching package: 'chron'


The following objects are 

## Dexcom
Given that the exported .csv files for Dexcom and Freestyle Libre are formatted differently, reading and wrangling must also be carried out differently. Following this, the process for TIR collection is the same for both CGMS.

### Defining Function

In [38]:
dexcom <- function(file) {
    suppressWarnings({
        read <- read_csv(file, show_col_types = FALSE)
        
        # renaming columns for ease of modifying
        names(read)[8] = 'Glucose_Value'
        names(read)[2] = 'Timestamp'
        names(read)[14] = 'Transmitter_ID'
        names(read)[5] = 'Patient_Info'
        
        # creating string based on user name and birthdate for id
        info <- filter(read, Patient_Info != 'NA')
        info_string <- info %>% pull(Patient_Info)
        id_string <- paste(info_string, collapse = '')
        id_string
        
        # removing top 11 rows that contain user's name and alert types
        rows <- filter(read, Transmitter_ID != 'NA') %>%
                filter(Glucose_Value != 'NA')
        
        # selecting only timestamp and glucose value columns
        cols <- select(rows, Timestamp, Glucose_Value)
        
        # pulling vectors from columns
        glucose_vector_str <- cols %>% pull(Glucose_Value)
        glucose_vector <- as.double(glucose_vector_str) # as decimals
        
        time_vector <- cols %>% pull(Timestamp)
        times <- minute(time_vector) # pulling only minutes
        
        # using rGV library to perform calculations
        MAG <- mag(x=glucose_vector, times=times) # mean absolute glucose
        CV <- cv(x=glucose_vector, times=times, overall=TRUE) # coefficient of variation
        SD <- st_dev(x=glucose_vector, times=times, overall=TRUE) # standard deviation
        TIR <- tir(x=glucose_vector, low=3.9, high=10.0) # time in range
        TBR <- tir(x=glucose_vector, low=0.0, high=3.8) # time below range
        TAR <- tir(x=glucose_vector, low=10.01) # time above range
        
        # data frame
        df <- data.frame(id_string, TIR, TBR, TAR, MAG, SD, CV)
        
    })
}

### Applying Function
Calling the `dexcom` function on all .csv files in the Dexcom folder to create a data frame with all data from participants who use Dexcom. 

In the final data frame and .csv file (adding both Dexcom and Freestyle Libre using participants to a single data frame), the participants' `id_string`, i.e. their name and birthday, will not be visible and they will be assigned to a unique ID.

In [39]:
# output_file <- 'test_df.csv' (used for testing purposes)

dex_data <- list.files(path = 'data/Dexcom',    
                       pattern = "*.csv", full.names = TRUE) %>% 
  lapply(dexcom) %>%                                           
  bind_rows
  
dex_data

id_string,TIR,TBR,TAR,MAG,SD,CV
<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
JonDoe1/3/1959,49.54582,1.472793,48.98139,4265.263,3.764559,0.3655763
JonDoe1/3/1959,49.537,1.472793,48.99021,4256.526,3.764705,0.3655777
LisaDoe1/3/1960,49.537,1.472793,48.99021,4256.526,3.764705,0.3655777


## Freestyle Libre

### Defining Function

In [40]:
libre <- function(file) {
    suppressWarnings({
        
        # initial reading to create id
        read_id <- read_csv(file, show_col_types=FALSE)
        names(read_id)[1] = 'Patient_report'
        names(read_id)[2] = 'Generated_on'
        
        # creating string based on user name and birthdate for id
        info <- filter(read_id, Patient_report != 'FreeStyle Libre 2') %>%
                filter(Patient_report != 'Device') %>%
                select(Patient_report, Generated_on)
        patient_string <- info %>% pull(Patient_report)
        date_string <- info %>% pull(Generated_on)
        id_string <- paste(patient_string, date_string, collapse = '')
        
        read <- read_csv(file, skip=2, show_col_types=FALSE)
        
        # renaming columns for ease of modifying
        names(read)[3]='Timestamp'
        names(read)[5]='Glucose_Value'
        
        # removing NA values in Glucose_Value column
        rows <- filter(read, Glucose_Value != 'NA')
        
        # selecting only timestamp and glucose value columns
        cols <- select(rows, Timestamp, Glucose_Value)
        
        # pulling vectors from columns
        glucose_vector <- cols %>% pull(Glucose_Value)
        
        time_vector <- cols %>% pull(Timestamp)
        time_dttm <- ymd_hms(time_vector)
        times <- minutes(time_dttm)
        
        # using rGV library to perform calculations
        MAG <- mag(x=glucose_vector, times=times) # mean absolute glucose
        CV <- cv(x=glucose_vector, times=times, overall=TRUE) # coefficient of variation
        SD <- st_dev(x=glucose_vector, times=times, overall=TRUE) # standard deviation
        TIR <- tir(x=glucose_vector, low=3.9, high=10.0) # time in range
        TBR <- tir(x=glucose_vector, low=0.0, high=3.8) # time below range
        TAR <- tir(x=glucose_vector, low=10.01) # time above range
        
        # data frame
        df <- data.frame(id_string, TIR, TBR, TAR, MAG, SD, CV)
        
    })
}

### Applying Function

Calling the `libre` function on all .csv files in the Libre folder to create a data frame with all data from participants who use Freestyle Libre.

In [41]:
# output_file <- 'final_df.csv' (used for testing purposes)

libre_data <- list.files(path = 'data/Libre',    
                       pattern = "*.csv", full.names = TRUE) %>% 
  lapply(libre) %>%                                           
  bind_rows                                                      

libre_data

[1m[22mNew names:
[36m•[39m `` -> `...6`
[36m•[39m `` -> `...7`
[36m•[39m `` -> `...8`
[36m•[39m `` -> `...9`
[36m•[39m `` -> `...10`
[36m•[39m `` -> `...11`
[36m•[39m `` -> `...12`
[36m•[39m `` -> `...13`
[36m•[39m `` -> `...14`
[36m•[39m `` -> `...15`
[36m•[39m `` -> `...16`
[36m•[39m `` -> `...17`
[36m•[39m `` -> `...18`
[36m•[39m `` -> `...19`


id_string,TIR,TBR,TAR,MAG,SD,CV
<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
Jonathan Doe 11-09-1971,98.65501,0.549923806,0.7950706,12995.7391,1.137257,0.1811168
Jon Doe 19-11-1963,56.29213,0.0,43.7078652,805.3043,1.248841,0.1241643
Lisa Doe 27-11-1964,61.84714,0.006173602,38.1466848,15482.3478,1.954583,0.2011599


## Putting it all together

Binding the dataframes from Dexcom users and Freestyle Libre users together into a single dataframe, which is written into a .csv file that can be downloaded.

In [42]:
output_file <- 'new_df.csv'

collection <- rbind(dex_data, libre_data)

collection_id <- transform(collection, ID = as.numeric(factor(id_string)))

final_collection <- collection_id %>% select(TIR, TBR, TAR, MAG, SD, CV, ID) %>% arrange(ID)
final_collection
write.csv(final_collection, output_file)

TIR,TBR,TAR,MAG,SD,CV,ID
<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
56.29213,0.0,43.7078652,805.3043,1.248841,0.1241643,1
98.65501,0.549923806,0.7950706,12995.7391,1.137257,0.1811168,2
49.54582,1.472793015,48.9813917,4265.2632,3.764559,0.3655763,3
49.537,1.472793015,48.9902108,4256.5263,3.764705,0.3655777,3
61.84714,0.006173602,38.1466848,15482.3478,1.954583,0.2011599,4
49.537,1.472793015,48.9902108,4256.5263,3.764705,0.3655777,5


## A note on user ID

A participant's ID is generated based on the unique string of their name (first and last) and birthday which is found in their point in time .csv output file. In the final data frame, this string is not visible to maintain anonymity of participants.

This will be particularly beneficial when comparing a participant's time in range at the three-month and six-month mark. The data frames from each mark can be bound together with the ID number arranged sequentially so that a participant's data can be easily compared. 

As an example, the `id_string` was identical for two of the files from Dexcom users, which is then reflected in the two `3` values seen in the ID column of the `final_collection` data frame above.