# IVSA data processing of cohort01

**Note:**
Edits made to IVSA original file !2019-04-05 to remove data of Cohort02

**METADATA:**

|Sex    | ICSS+ShA | ICSS+LgA   | NO-ICSS+ShA | NO-ICSS+LgA |
|:-----:|:--------:|:----------:|:-----------:|:-----------:|
|Males  | SG17     | SG15       | SG11        | SG23        |
|-------|----------|------------|-------------|-------------|
|Females| SG14     | SG24, SG20 | SG16        | SG26        |
|-------|----------|------------|-------------|-------------|

**OUTLINE:**

1. Preprocessing data
    1. Cleaning
    2. Annotating
    3. Overview
2. Normalizing data
    1.
    2.
    3. 
3. Plotting data
    1. Raw values
        1. Pass 1-4
        2. Pass 2-4
    2. Normalized values (Pass 2-4)
    3. Normalized values (Mean of Passes 2-4)
4. Saving data in 'Prism' friendly format
---

In [37]:
%%bash
# # using the preprocessIVSAFiles script to parse MedPC files and generating csv files that has different event types
# # defining userName to make running the cell in machine agnostic manner
# userName=$(echo $USER)
# 
# # We move to the folder where the cohort data is stored
# # cd </path/to/cohort/data>
# cd /Users/$userName/Dropbox\ \(Partners\ HealthCare\)/Projects/R01_2017_OxycSA-NASh-Glutamate/_data_R01_2017/_data_R01_2017_IVSA/Male-Female/ShA+LgA/PM_ICSS/Cohort01/_rigFiles
# 
# # running the preprocessing script over all files in the directory
# for fileName in $(ls \!2019-0* | sort)
# do
#     preprocessIVSAFiles --file $fileName;
# done

In [1]:
%load_ext rpy2.ipython

In [5]:
%%capture
%%R
library(tidyverse)
library(lubridate)

In [None]:
%%capture
%%bash
# # using collateIVSAData to collate all the data into a single data frame of the format ...
# # DATA TABLE DESCRIPTION:
# # 
# #   date       |  cohort      | regimen   | group | subjectID | eventType   | eventTime
# #   -----------|--------------|-----------|-------|-----------|-------------|-----------
# #   2018-10-30 |  ICSS+IVSA   | 6H        | MALES | SG7       | rewards     | ...
# #   -----------|--------------|-----------|-------|-----------|-------------|-----------
# #   2018-10-30 |  ICSS+IVSA   | 6H        | MALES | SG7       | corrLever   | ...
# #   -----------|--------------|-----------|-------|-----------|-------------|-----------
# #   2018-10-30 |  ICSS+IVSA   | 6H        | MALES | SG7       | incorrLever | ...
# #   ...
# #   ...
# #   ...
# 
# userName=$(echo $USER)
# 
# # We move to the folder where the cohort files are stored
# # cd </path/to/cohort/data>
# cd /Users/$userName/Dropbox\ \(Partners\ HealthCare\)/Projects/R01_2017_OxycSA-NASh-Glutamate/_data_R01_2017/_data_R01_2017_IVSA/Male-Female/ShA+LgA/PM_ICSS/Cohort01/_csvFiles
# 
# # running the preprocessing script over all files in the directory
# for fileName in $(ls *rewards.csv | sort)
# do
#     collateIVSAData $fileName;
# done

In [6]:
%%capture
%%R
# loading data: set the path to 
IVSA_dataDir <- "~/Dropbox (Partners HealthCare)/Projects/R01_2017_OxycSA-NASh-Glutamate/_data_R01_2017/_data_R01_2017_IVSA/Male-Female/ShA+LgA/PM_ICSS/Cohort01/_csvFiles"
IVSA_fileList <- list.files(path = IVSA_dataDir, pattern = "collated.csv")
## generating combined data table
IVSA_data <- IVSA_fileList %>% map(~ read_csv(file.path(IVSA_dataDir, .))) %>% reduce(rbind)

In [7]:
%%R
IVSA_data %>% print

[90m# A tibble: 26,708 x 7[39m
   date       cohort    regimen group   subjectID eventType eventTime
   [3m[90m<date>[39m[23m     [3m[90m<chr>[39m[23m     [3m[90m<chr>[39m[23m   [3m[90m<chr>[39m[23m   [3m[90m<chr>[39m[23m     [3m[90m<chr>[39m[23m         [3m[90m<int>[39m[23m
[90m 1[39m 2019-02-20 ICSS+IVSA 1H      FEMALES SG14      rewards          49
[90m 2[39m 2019-02-20 ICSS+IVSA 1H      FEMALES SG14      rewards         323
[90m 3[39m 2019-02-20 ICSS+IVSA 1H      FEMALES SG14      rewards         853
[90m 4[39m 2019-02-20 ICSS+IVSA 1H      FEMALES SG14      rewards        [4m2[24m600
[90m 5[39m 2019-02-20 ICSS+IVSA 1H      FEMALES SG14      rewards        [4m2[24m616
[90m 6[39m 2019-02-20 ICSS+IVSA 1H      FEMALES SG14      rewards        [4m2[24m697
[90m 7[39m 2019-02-20 ICSS+IVSA 1H      FEMALES SG14      corrLever        49
[90m 8[39m 2019-02-20 ICSS+IVSA 1H      FEMALES SG14      corrLever        49
[90m 9[39m 2019-02-20 ICS

In [8]:
%%R
# re-assigning data types to individual columns. This gets broken when there are missing values
# Ideal data types
# date   |  cohort  |  regimen   |  group   |  subjectID   |  eventType  | eventTime
# <date> | <factor> |  <factor>  | <factor> |  <factor>    |  <factor>   |  <int> 

IVSA_data$cohort <- IVSA_data$cohort %>% as.factor
IVSA_data$regimen <- IVSA_data$regimen %>% as.factor
IVSA_data$group <- IVSA_data$group %>% as.factor
IVSA_data$subjectID <- IVSA_data$subjectID %>% as.factor
IVSA_data$eventType <- IVSA_data$eventType %>% as.factor

In [9]:
%%R
IVSA_data %>% print

[90m# A tibble: 26,708 x 7[39m
   date       cohort    regimen group   subjectID eventType eventTime
   [3m[90m<date>[39m[23m     [3m[90m<fct>[39m[23m     [3m[90m<fct>[39m[23m   [3m[90m<fct>[39m[23m   [3m[90m<fct>[39m[23m     [3m[90m<fct>[39m[23m         [3m[90m<int>[39m[23m
[90m 1[39m 2019-02-20 ICSS+IVSA 1H      FEMALES SG14      rewards          49
[90m 2[39m 2019-02-20 ICSS+IVSA 1H      FEMALES SG14      rewards         323
[90m 3[39m 2019-02-20 ICSS+IVSA 1H      FEMALES SG14      rewards         853
[90m 4[39m 2019-02-20 ICSS+IVSA 1H      FEMALES SG14      rewards        [4m2[24m600
[90m 5[39m 2019-02-20 ICSS+IVSA 1H      FEMALES SG14      rewards        [4m2[24m616
[90m 6[39m 2019-02-20 ICSS+IVSA 1H      FEMALES SG14      rewards        [4m2[24m697
[90m 7[39m 2019-02-20 ICSS+IVSA 1H      FEMALES SG14      corrLever        49
[90m 8[39m 2019-02-20 ICSS+IVSA 1H      FEMALES SG14      corrLever        49
[90m 9[39m 2019-02-20 ICS

In [10]:
%%R
IVSA_data$cohort %>% unique

[1] ICSS+IVSA     NO-ICSS+IVSA  ICS+IVSA      NO-ICSSS+IVSA
Levels: ICS+IVSA ICSS+IVSA NO-ICSS+IVSA NO-ICSSS+IVSA


In [40]:
%%R
# cleaning up data
# replace "ICS+IVSA" with "ICSS+IVSA"
# replace "NO-ICSSS+IVSA" with "NO-ICSS+IVSA"
IVSA_data$cohort[IVSA_data$cohort == "ICS+IVSA"] <- "ICSS+IVSA"
IVSA_data$cohort[IVSA_data$cohort == "NO-ICSSS+IVSA"] <- "NO-ICSS+IVSA"

In [41]:
%%R
IVSA_data$cohort %>% unique()

[1] ICSS+IVSA    NO-ICSS+IVSA
Levels: ICS+IVSA ICSS+IVSA NO-ICSS+IVSA NO-ICSSS+IVSA


In [42]:
%%R
IVSA_data$regimen %>% unique()

[1] 1H 6H 2H
Levels: 1H 2H 6H


In [43]:
%%R
IVSA_data$group %>% unique()

[1] FEMALES MALES  
Levels: FEMALES MALES


In [44]:
%%R
IVSA_data$subjectID %>% unique()

 [1] SG14  SG20  SG24  SG13  SG15  SG17  SG19  SG16  SG22  SG26  SG11  SG21 
[13] SG23  SG126
14 Levels: SG11 SG126 SG13 SG14 SG15 SG16 SG17 SG19 SG20 SG21 SG22 ... SG26


In [45]:
%%R
IVSA_data$eventType %>% unique

[1] rewards     corrLever   incorrLever
Levels: corrLever incorrLever rewards


In [None]:
%%R
# cleaning up data
# removing subjects that we do not intend to use for further analysis
