# Initial Medication Date and Category

2018-09-13 Hirotaka Iwaki

Medicaiton information is recoreded in several files. 
Here we use the following files to derive the initation date of each drug category LD (levodopa), DA (Dopamin Agonist), DRT (levodopa or dopamine anonist, and ALL (any anti-PD drug).    
The CSVs to use.
1. Concomitant_Medications.csv
2. MDS_UPDRS_Part_III.csv (NOTE: Use_of_PD_Medication.csv seems to be combined to this file) 
3. Initiation of PD_Medication incidents.csv
If there is inconsistency among these files, take the ealiest recorded date.

New Variables
1. DNAME: standardized PD drug names are given    
  "LDOPA", Levodopa; "SPPRT", Support drug. DCI, COMT inhibitor; "RPNRL", Ropinirole; "PRMXL", Pramipexole; "ROTIG", Rotigotine; "PRBDL", Piribedil; "APMLP", Apomorphine; "MAOBI", MAO-B inhibitor; "AMTDN", Amantadine; "ANCHL", Anti-cholinergics, etc.    
2. STARTDT_XX = START DATE for drug XX.
    XX is; LD (Levodopa), DA (Dopamine Agonist), DRT (LD or DA), ALL (Any anti-PD drug)

Get the medication start date for each category of medication.
1. PDmed_record.csv: Obs with PD medication were extracted from Concomitant_Medications.csv
2. MED_INIT.csv: Initiation date for each drug with contniously prescribed period from the initiation. derived only from Concomitant_Medications.csv
3. MED_INIT2.csv:  Initiation date for each drug. Consistent with MDS_UPDRS_Part_III.csv, too.

In [1]:
library(data.table)
library(dplyr)
library(zoo)
FOLDER = c("PPMI180910")
OUTPUT = c("out180910")
options("width"=130)


Attaching package: 'dplyr'

The following objects are masked from 'package:data.table':

    between, first, last

The following objects are masked from 'package:stats':

    filter, lag

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union


Attaching package: 'zoo'

The following objects are masked from 'package:base':

    as.Date, as.Date.numeric



In [2]:
ref = fread(paste(OUTPUT, "PATNO_EVENTID_DATE.csv", sep = "/"), colClasses = c("PATNO"="character")) %>% 
  select(PATNO, EVENT_ID, DATE)


read_func <- function(file, vec_str, event = T, num_conv = T){
  # noref = T -> no EVENT_ID
  # num_conv -> convert values from character to numbers
  data = fread(paste(FOLDER, file, sep='/'), header = T, quote="\"", 
               colClasses = c("PATNO"="character")) %>% na_if("") %>%data.frame
  if(num_conv ==T){ # convert string values to numbers
    data = data %>% mutate_at(vars(vec_str), funs(as.numeric))
  }
  if(event != T){ # no matching with ref by EVENT_ID
    data = data %>%       
      select(PATNO, vec_str) %>%
      filter(!is.na(PATNO))
  }else{
    data = data %>%
      select(PATNO, EVENT_ID, vec_str) %>%
      left_join(., ref, by=c("PATNO", "EVENT_ID")) %>% 
      filter(!is.na(PATNO)) %>% arrange(PATNO, DATE)
    missing_DATE = data %>% select(PATNO, EVENT_ID, DATE) %>% filter(is.na(DATE))
    if(nrow(missing_DATE) > 0){
      print(missing_DATE)
      comfunc("Not in EVENT_ID_DATE file")
    }
  }
  return(data)
}

# Concomitant_Medications.csv

In [3]:
cat("Create a list of standardized drug names and categorize PD drugs from drug usage record in the following steps\n")
drugRec = read_func("Concomitant_Medications.csv", 
    c("CMTRT", "CMDOSE", "CMDOSU", "CMDOSFRQ", "STARTDT","STOPDT","ONGOING","CMINDC","DISMED","TOTDDOSE","LEDD","CMTRT", "WHODRUG"), F,F) 
drugRecPD = drugRec %>% .[grepl("park|^PD$| PD$", ignore.case = T, .$CMINDC),] 
cat("\nList of CMDINC selected as probably indicating Parkinson's disease. the list is used to identiry anti-PD drugs.
NOTE: This list may not be exhaustive. Double-check the CMDINC which are not listed here but using anti-PD drugs in a later step.")
drugRecPD  %>% distinct(CMINDC) %>% t %>% as.vector
cat("DISMED is an indicator for PD medication, but there are some drugUse record which are for the above disease but DISMED!=1\n")
drugRecPD %>% with(table(DISMED, useNA = "always"))
drugRecPD %>% filter(DISMED !=1) %>% distinct(WHODRUG) %>% t %>% as.vector
cat("Additionally, check the relationship between stop_date_record and Ongoing indicator.
It seems if the drug is not ongoing, they have the stop date")
drugRecPD %>% mutate(STOP_DATE_RECORDED = !is.na(STOPDT)) %>% with(table(STOP_DATE_RECORDED, ONGOING, useNA = "always"))

Create a list of standardized drug names and categorize PD drugs from drug usage record in the following steps

List of CMDINC selected as probably indicating Parkinson's disease. the list is used to identiry anti-PD drugs.
NOTE: This list may not be exhaustive. Double-check the CMDINC which are not listed here but using anti-PD drugs in a later step.

DISMED is an indicator for PD medication, but there are some drugUse record which are for the above disease but DISMED!=1


DISMED
   0    1 <NA> 
  25 4907   16 

Additionally, check the relationship between stop_date_record and Ongoing indicator.
It seems if the drug is not ongoing, they have the stop date

                  ONGOING
STOP_DATE_RECORDED    0    1 <NA>
             FALSE    0 1629  470
             TRUE  2840    2    7
             <NA>     0    0    0

In [4]:
cat("Standardize the drugs name by extracting obs by the above CMINDC. All drugs should be given a categorized name.")
drugmatch = drugRecPD %>% 
  filter(!(is.na(WHODRUG) | WHODRUG=="")) %>% 
  distinct(WHODRUG) %>% 
  mutate(DNAME = case_when(
    grepl("botox|DASANTAFIL|INOSINE|INVESTIG|israd|mucuna|NILOTINIB|RITALIN|STEM|TIZANIDINE|TRIAMTERENE|COENZYME|DOMPERIDONA|GLUTATHIONE|NALTREXONE", ignore.case = T, WHODRUG) ~ "ETC",
    grepl("comtan|^entacapone|lodosy|^BENSERAZIDE$|^OPICAPONE$|^CARBIDOPA$", ignore.case = T, WHODRUG)~"SPPRT",
    grepl("dop|levo|sinemet|aktipar|nacom|rytary", ignore.case = T, WHODRUG) ~ "LDOPA",
    grepl("ropini|requip", ignore.case = T, WHODRUG) ~"RPNRL",
    grepl("pram|mirap|sifrol", ignore.case = T, WHODRUG) ~"PRMXL",
    grepl("rotig|neupro", ignore.case = T, WHODRUG) ~"ROTIG",
    grepl("piribedil|clarium", ignore.case = T, WHODRUG) ~"PRBDL",
    grepl("apo-go|apokyn|apomor",ignore.case = T, WHODRUG) ~"APMLP",
    grepl("benz|trih|artane|akine|biperi|BORNAP|cogentin|dekinet|parkinsan",ignore.case = T, WHODRUG) ~"ANCHL",
    grepl("AZILECT|ELDEPRYL|jumex|rasagi|safina|selegil",ignore.case = T, WHODRUG) ~ "MAOBI",
    grepl("amant|MANTADIX|merz|SYMMETREL",ignore.case = T, WHODRUG) ~ "AMTDN",
    grepl("ALEVE|ACETYLSALICYL",ignore.case = T, WHODRUG) ~ "PAINK",
    grepl("canna", ignore.case = T, WHODRUG)~"HERBS",
    grepl("clona", ignore.case = T, WHODRUG)~"CLNZP",
    grepl("cloza", ignore.case = T, WHODRUG)~"CLZPN",
    grepl("gabap", ignore.case = T, WHODRUG)~"GBPNT",
    grepl("NORTRIPT", ignore.case = T, WHODRUG)~"DPRES",
    grepl("PROPRA|PROPAN", ignore.case = T, WHODRUG)~"BETAB",
    grepl("NAMENDA", ignore.case = T, WHODRUG)~"MEMTN",
    grepl("RIVASTI|exelon", ignore.case = T, WHODRUG)~"RVSTG",
    grepl("Donepezil", ignore.case = T, WHODRUG)~"DNPZL",
    TRUE ~ "others"
    )) %>% 
  arrange(DNAME, WHODRUG)
print(drugmatch)

Standardize the drugs name by extracting obs by the above CMINDC. All drugs should be given a categorized name.                                 WHODRUG  DNAME
1                              AMANTADIN  AMTDN
2                             AMANTADINA  AMTDN
3                             AMANTADINE  AMTDN
4                         AMANTADINE HCL  AMTDN
5               AMANTADINE HYDROCHLORIDE  AMTDN
6                               MANTADIX  AMTDN
7                                PK MERZ  AMTDN
8                                PK-MERZ  AMTDN
9                              SYMMETREL  AMTDN
10                              AKINETON  ANCHL
11                                ARTANE  ANCHL
12                          BENZOTROPINE  ANCHL
13                           BENZTROPINE  ANCHL
14                  BENZTROPINE MESYLATE  ANCHL
15                             BIPERIDEN  ANCHL
16              BORNAPRINE HYDROCHLORIDE  ANCHL
17                              COGENTIN  ANCHL
18                       

In [5]:
cat("Create the list of CMINDC other than the above, using anti-PD drugs. 
Then check how many of the drug records with these CMINDC are associated with PD drug usage. 
Some CMINDC names suggesting PD which were not listed above and drug usage for these CMINDC names are mostly anti-PD drugs derived in the above process. 
So the standardization of PD drugs looks valid.
")
drugRec %>% left_join(., drugmatch, by ="WHODRUG") %>% 
  filter(DNAME %in% c("LDOPA", "RPNRL", "SPPRT", "PRMXL", "ROTIG", "PRBDL", "APMLP", "MAOBI", "AMTDN")) %>% 
  .[-grep("park|^PD$| PD$", ignore.case = T, .$CMINDC),] %>% 
  distinct(CMINDC) %>% # list of the diseases
  inner_join(drugRec, ., by ="CMINDC") %>% # Obs of involving the list of diseases
  left_join(., drugmatch, by = "WHODRUG") %>% #  Join with the drug name
  arrange(CMINDC) %>% mutate(No_of_OBS_with_PDDRUG = ifelse(is.na(DNAME), "NO", "YES")) %>% 
  with(table(CMINDC, No_of_OBS_with_PDDRUG, useNA = 'always'))

Create the list of CMINDC other than the above, using anti-PD drugs. 
Then check how many of the drug records with these CMINDC are associated with PD drug usage. 
Some CMINDC names suggesting PD which were not listed above and drug usage for these CMINDC names are mostly anti-PD drugs derived in the above process. 
So the standardization of PD drugs looks valid.


                                                No_of_OBS_with_PDDRUG
CMINDC                                            NO YES <NA>
  AS NEEDED FOR DYSTONIA                           0   1    0
  CLINICAL STUDY DRUG PARTICIPANT                  0   1    0
  DOPAMINE AGONIST                                 0   1    0
  DYSKINESIA                                       0   3    0
  DYSKINESIAS                                      0   2    0
  DYSTONIA                                         5   2    0
  HEALTH PROPHYLAXIS                             288   7    0
  LEG TREMOR. MUSCLE STIFFNESS. NEUROPROTECTIVE.   0   1    0
  LEWY BODY DISEASE                                0   2    0
  MILD PAIN/HEADACHE                               0   1    0
  NAUSEA                                          34   3    0
  NEUROPREOTECTION                                 0   1    0
  P.D                                              0   4    0
  PAKINSON'S DISEASE                               0   4    0


In [6]:
cat("Some data structure checks:\n
Number of records per anti-PD drug vs LEDD calculated")
drugRecPD_std = drugRec %>% left_join(., drugmatch, by ="WHODRUG") %>% 
  filter(DNAME %in% c("LDOPA", "SPPRT", "RPNRL", "PRMXL", "ROTIG", "PRBDL", "APMLP", "MAOBI", "AMTDN", "ANCHL"))
drugRecPD_std %>% mutate(LEDD_calculated=!is.na(LEDD)) %>% with(table(DNAME, LEDD_calculated, useNA = 'always'))
cat("Anti-PD med vs DISMED")
drugRecPD_std %>% with(table(DNAME, DISMED))
cat("Most of drug records are labeled as DISMED==1, but not all.
NOTE: Some could be prescribed for different diseases (ex.PRMXL for RLS)")

Some data structure checks:

Number of records per anti-PD drug vs LEDD calculated

       LEDD_calculated
DNAME   FALSE TRUE <NA>
  AMTDN    35  278    0
  ANCHL    77    0    0
  APMLP     7   12    0
  LDOPA   399 1893    0
  MAOBI    68  512    0
  PRBDL     1   24    0
  PRMXL    50  538    0
  ROTIG    20  157    0
  RPNRL    29  477    0
  SPPRT   256   95    0
  <NA>      0    0    0

Anti-PD med vs DISMED

       DISMED
DNAME      0    1
  AMTDN    4  308
  ANCHL    1   76
  APMLP    0   19
  LDOPA    5 2283
  MAOBI    2  576
  PRBDL    0   25
  PRMXL   13  570
  ROTIG    1  176
  RPNRL   16  489
  SPPRT    5  344

Most of drug records are labeled as DISMED==1, but not all.
NOTE: Some could be prescribed for different diseases (ex.PRMXL for RLS)

In [7]:
write.csv(drugRecPD_std, paste(OUTPUT, "PDmed_record.csv", sep = "/"), row.names=F)

In [8]:
cat("Calculate the inital date for the following medication initiation
1. Levodopa, 
2. Dopamine agnosits
3. Dopamine replacement therapy (Levodopa or dopamine anosit)
4. Any anti-PD medication")
drugdur_func = function(DATA){
  data = DATA %>% 
    filter(!is.na(STARTDT)) %>% 
    mutate(START = as.Date(paste("01", STARTDT, sep = "/"), format = "%d/%m/%Y") %>% as.numeric,
           STOP = as.Date(paste("01", STOPDT, sep = "/"), format = "%d/%m/%Y") %>% as.numeric) %>% 
    arrange(PATNO, START) %>% select(PATNO, STARTDT, START, STOP)
  IDs = unique(data$PATNO)
  DRUGDAYS = rep(NA, length(IDs))
  for(i in 1:length(IDs)){
    data1 = data %>% filter(PATNO==IDs[i])
    for(j in 1:nrow(data1)){
      if(is.na(data1$STOP[j])){
        DRUGDAYS[i] = 99999 # give the large number
        break
      }else if(is.na(data1$START[j+1])){ # No further data
        DRUGDAYS[i] = as.numeric(data1$STOP[j] - data1$START[1])
      }else if(data1$START[j+1] - data1$STOP[j] <= 62){ # Allow 62 days gap (Raw input is in month so 1 month gap allowed.)
        next
      }else{
        DRUGDAYS[i] = as.numeric(data1$STOP[j] - data1$START[1])
        break
      }
    }
  }

  data2 = data %>% distinct(PATNO, .keep_all = T) %>% bind_cols(., DRUGDAYS = DRUGDAYS) %>% 
    select(PATNO, START, DRUGDAYS)
  return(data2)
}
# data = drugRecPD_std %>% 
#   filter(DNAME == "LDOPA") %>% filter(PATNO=="3400")
# head(data)
    
    
Init_LD = drugRecPD_std %>% 
  filter(DNAME == "LDOPA") %>% 
  drugdur_func(.) %>% 
  rename(STARTDT_LD = START, DRUGDAYS_LD = DRUGDAYS)
Init_DA = drugRecPD_std %>% 
  filter(DNAME %in% c("RPNRL", "PRMXL", "ROTIG", "PRBDL", "APMLP")) %>% 
  drugdur_func(.) %>% 
  rename(STARTDT_DA = START, DRUGDAYS_DA = DRUGDAYS)

Init_DRT = drugRecPD_std %>% 
  filter(DNAME %in% c("LDOPA", "RPNRL", "PRMXL", "ROTIG", "PRBDL", "APMLP")) %>% 
  drugdur_func(.) %>% 
  rename(STARTDT_DRT = START, DRUGDAYS_DRT = DRUGDAYS)

Init_ALL = drugRecPD_std %>% 
  filter(DNAME %in% c("LDOPA", "RPNRL", "SPPRT", "PRMXL", "ROTIG", "PRBDL", "APMLP", "MAOBI", "AMTDN", "ANCHL")) %>% 
  drugdur_func(.) %>% 
  rename(STARTDT_ALL = START, DRUGDAYS_ALL = DRUGDAYS)
drug_init = full_join(Init_LD, Init_DA, by = "PATNO") %>% 
  full_join(., Init_DRT, by = "PATNO") %>%
  full_join(., Init_ALL, by = "PATNO")
summary(drug_init)
cat("STARTDT is the drug initiation day (days from 1970-01-01). 
DRUGDAYS is the duration under the drug after the initial initiation. 99999 means Ongoing medication.")

Calculate the inital date for the following medication initiation
1. Levodopa, 
2. Dopamine agnosits
3. Dopamine replacement therapy (Levodopa or dopamine anosit)
4. Any anti-PD medication

    PATNO             STARTDT_LD     DRUGDAYS_LD      STARTDT_DA     DRUGDAYS_DA     STARTDT_DRT     DRUGDAYS_DRT  
 Length:839         Min.   : 6756   Min.   :    0   Min.   :10743   Min.   :    0   Min.   : 6756   Min.   :    0  
 Class :character   1st Qu.:15614   1st Qu.:99999   1st Qu.:15461   1st Qu.: 1165   1st Qu.:15438   1st Qu.:99999  
 Mode  :character   Median :16130   Median :99999   Median :15857   Median :99999   Median :15872   Median :99999  
                    Mean   :15945   Mean   :90806   Mean   :15741   Mean   :68146   Mean   :15704   Mean   :88051  
                    3rd Qu.:16648   3rd Qu.:99999   3rd Qu.:16375   3rd Qu.:99999   3rd Qu.:16405   3rd Qu.:99999  
                    Max.   :17866   Max.   :99999   Max.   :17683   Max.   :99999   Max.   :17713   Max.   :99999  
                    NA's   :126     NA's   :126     NA's   :369     NA's   :369     NA's   :33      NA's   :33     
  STARTDT_ALL     DRUGDAYS_ALL  
 Min.   : 6756   Min.   :    0  
 1st Q

STARTDT is the drug initiation day (days from 1970-01-01). 
DRUGDAYS is the duration under the drug after the initial initiation. 99999 means Ongoing medication.

In [9]:
drug_init %>% mutate_at(vars(starts_with("DRUGDAYS")), funs(case_when(
    .==0 ~"No Continuation", 
    . <60 ~"less than 2 month", 
    . <99999~"more than 2 month", 
    . ==99999~"Ongoing", 
    TRUE~"NA"))) %>% select(starts_with("DRUGDAYS")) %>% mutate_if(is.character, funs(as.factor)) %>% summary
cat("Once drug started, most patients keep using the drug more than 2 mongth. (DA has a bit higher discontinuation within 2 month though)
\n
save this file as MED_INIT.csv")
write.csv(drug_init, paste(OUTPUT, "MED_INIT.csv", sep="/"), row.names=F)

            DRUGDAYS_LD             DRUGDAYS_DA             DRUGDAYS_DRT            DRUGDAYS_ALL
 NA               :126   NA               :369   NA               : 33   No Continuation  :  2  
 No Continuation  :  4   No Continuation  :  6   No Continuation  :  7   Ongoing          :750  
 Ongoing          :647   Ongoing          :319   Ongoing          :709   less than 2 month:  7  
 less than 2 month:  8   less than 2 month: 13   less than 2 month:  8   more than 2 month: 80  
 more than 2 month: 54   more than 2 month:132   more than 2 month: 82                          

Once drug started, most patients keep using the drug more than 2 mongth. (DA has a bit higher discontinuation within 2 month though)


save this file as MED_INIT.csv

# Combine with the information from UPDRS3 records    

In [10]:
cat("MDS_UPDRS_Part_III.csv has medication information for the observation.
Compare with this file.")
UPDRS3 = fread(paste(FOLDER, "MDS_UPDRS_Part_III.csv", sep="/"), header = T, quote="\"", colClasses = c("PATNO"="character")) %>%
    na_if("") %>%
    mutate(PDMED = PD_MED_USE) %>%
    select(PATNO, EVENT_ID, PDMED) %>%
    left_join(., ref, by=c("PATNO", "EVENT_ID")) %>% 
    filter(!is.na(PATNO))
inner_join(drug_init, UPDRS3, by = "PATNO") %>% 
    mutate(any_drug_started = ifelse(DATE>=STARTDT_ALL, "yes", "no")) %>% with(table(any_drug_started, PDMED, useNA = 'always'))
cat("PDMED == 'PD_MED_USE': 0=No, 1=Lv, 2=Ag, 3=Oth, 4=1+3, 5=1+2, 6=2+3, 7=1+2+3")
inner_join(drug_init, UPDRS3, by = "PATNO") %>% 
    mutate(LD_started = ifelse(DATE>=STARTDT_LD, "yes", "no")) %>% with(table(LD_started, PDMED, useNA = 'always'))
cat("PDMED 1,4,5,7 should be yes. But not. 
Based on MDS_UPDRS_Part_III.csv, some people are using LD before the STARTDT_LD in drug_init file.
Need to update START_LD, START_DRT, START_ALL according to MDS_UPDRS_Part_III.csv file.
STARTDT_LD will be updated when PDMED is 1,4,5 or 7 before the current LD_STARTED.")


MDS_UPDRS_Part_III.csv has medication information for the observation.
Compare with this file.

                PDMED
any_drug_started    0    1    2    3    4    5    6    7 <NA>
            no   1909    6    1    9    1    6    0    2    0
            yes   631 2061  590  554  891  647  452  726    0
            <NA>    0    0    0    0    0    0    0    0    0

PDMED == 'PD_MED_USE': 0=No, 1=Lv, 2=Ag, 3=Oth, 4=1+3, 5=1+2, 6=2+3, 7=1+2+3

          PDMED
LD_started    0    1    2    3    4    5    6    7 <NA>
      no   1761   18  348  356    3   18  248    6    0
      yes   315 2037   64   47  873  622   40  710    0
      <NA>  464   12  179  160   16   13  164   12    0

PDMED 1,4,5,7 should be yes. But not. 
Based on MDS_UPDRS_Part_III.csv, some people are using LD before the STARTDT_LD in drug_init file.
Need to update START_LD, START_DRT, START_ALL according to MDS_UPDRS_Part_III.csv file.
STARTDT_LD will be updated when PDMED is 1,4,5 or 7 before the current LD_STARTED.

In [11]:
drug_init2 = UPDRS3 %>% select(PATNO, DATE, PDMED) %>%
    left_join(., drug_init, by ="PATNO") %>%
    mutate(STARTDT_LD = case_when(
        is.na(STARTDT_LD) & PDMED %in% c(1,4,5,7) ~ DATE,
        is.na(STARTDT_LD)~ NA_integer_,
        (STARTDT_LD > DATE) & PDMED %in% c(1,4,5,7) ~ DATE,
        TRUE ~ as.integer(STARTDT_LD)
    )) %>% 
    mutate(STARTDT_DA = case_when(
        is.na(STARTDT_DA) & PDMED %in% c(2,5,6,7) ~ DATE,
        is.na(STARTDT_DA)~ NA_integer_,
        (STARTDT_DA > DATE) & PDMED %in% c(2,5,6,7) ~ DATE,
        TRUE ~ as.integer(STARTDT_DA)
    )) %>% 
    mutate(STARTDT_DRT = case_when(
        is.na(STARTDT_DRT) & PDMED %in% c(1,2,4,5,6,7) ~ DATE,
        is.na(STARTDT_DRT)~ NA_integer_,
        (STARTDT_DRT > DATE) & PDMED %in% c(1,2,4,5,6,7) ~ DATE,
        TRUE ~ as.integer(STARTDT_DRT)
    )) %>% 
    mutate(STARTDT_ALL = case_when(
        is.na(STARTDT_ALL) & PDMED %in% c(1,2,3,4,5,6,7) ~ DATE,
        is.na(STARTDT_ALL)~ NA_integer_,
        (STARTDT_ALL > DATE) & PDMED %in% c(1,2,3,4,5,6,7) ~ DATE,
        TRUE ~ as.integer(STARTDT_ALL)
    )) %>% 
    group_by(PATNO) %>% mutate(
        STARTDT_LD = min(STARTDT_LD, na.rm = T),
        STARTDT_DA = min(STARTDT_DA, na.rm = T),
        STARTDT_DRT = min(STARTDT_DRT, na.rm = T),
        STARTDT_ALL = min(STARTDT_ALL, na.rm = T),
    ) %>% ungroup %>%
    select(names(drug_init)) %>% select(-one_of("DRUGDAYS_LD", "DRUGDAYS_DA", "DRUGDAYS_DRT", "DRUGDAYS_ALL")) %>%
    distinct(PATNO, .keep_all = T)
drug_init2[drug_init2==Inf] = NA























































































































"no non-missing arguments to min; returning Inf"

In [12]:
inner_join(drug_init2, UPDRS3, by = "PATNO") %>% 
    mutate(LD_started = ifelse(DATE>=STARTDT_LD, "yes", "no")) %>% with(table(LD_started, PDMED, useNA = 'always'))
inner_join(drug_init2, UPDRS3, by = "PATNO") %>% 
    mutate(LD_or_DA_started = ifelse(DATE>=STARTDT_DRT, "yes", "no")) %>% with(table(LD_or_DA_started, PDMED, useNA = 'always'))
inner_join(drug_init2, UPDRS3, by = "PATNO") %>% 
    mutate(any_drug_started = ifelse(DATE>=STARTDT_ALL, "yes", "no")) %>% with(table(any_drug_started, PDMED, useNA = 'always'))

          PDMED
LD_started    0    1    2    3    4    5    6    7 <NA>
      no   1853    0  370  366    0    0  270    0    0
      yes   327 2102   86   58  933  654   63  756    0
      <NA> 4586    0  136  147    0    0  125    0    0

                PDMED
LD_or_DA_started    0    1    2    3    4    5    6    7 <NA>
            no   1974    0    0  320    0    0    0    0    0
            yes   503 2102  592  183  933  654  458  756    0
            <NA> 4289    0    0   68    0    0    0    0    0

                PDMED
any_drug_started    0    1    2    3    4    5    6    7 <NA>
            no   1936    0    0    0    0    0    0    0    0
            yes   646 2102  592  571  933  654  458  756    0
            <NA> 4184    0    0    0    0    0    0    0    0

In [19]:
cat("save this file as MED_INIT2.csv.")
write.csv(drug_init2, paste(OUTPUT, "MED_INIT2.csv", sep="/"))

save this file as MED_INIT2.csv.

In [14]:
## To see if Use_of_PD_Medication.csv has additional information from UDPRS3 -> Probably not
# drug1 = read_func('Use_of_PD_Medication.csv', c("PDMEDYN","ONLDOPA", "ONDOPAG", "ONOTHER"))
# drug2 = drug1 %>%  
#   mutate(PDMEDYN_sum = rowSums(.[, c("ONLDOPA", "ONDOPAG", "ONOTHER"),], na.rm = T)) # > 1 if on any drug.
# drug2 %>% with(table(PDMEDYN, PDMEDYN_sum, useNA = 'always'))
# cat("Not using any PD medications: PDMEDYN==0 & PDMEDYN_sum== 0.
# Using any of the PD medications  : PDMEDYN==1 & PDMEDYN_sum > 0. 
# Excluded other types of observations")
# drug3 = drug2 %>% filter(I(PDMEDYN==0 & PDMEDYN_sum==0) | I(PDMEDYN==1 & PDMEDYN_sum > 0)) %>% select(-PDMEDYN_sum)
# drug3 %>% mutate(PDMEDYN_sum = rowSums(.[, c("ONLDOPA", "ONDOPAG", "ONOTHER"),], na.rm = T)) %>%
#  with(table(PDMEDYN, PDMEDYN_sum, useNA = 'always'))
# inner_join(drug3, UPDRS3, by = c("PATNO", "EVENT_ID", "DATE")) %>% with(table(ONLDOPA, PDMED, useNA = 'always'))
# inner_join(drug3, UPDRS3, by = c("PATNO", "EVENT_ID", "DATE")) %>% with(table(ONDOPAG, PDMED, useNA = 'always'))
# cat("Compare this file (REF) with the working dataset")
# inner_join(drug_init2, drug3, by="PATNO") %>% 
#     mutate(Consistent_with_Use_of_PD_Medication.csv_LD = case_when(
#         is.na(ONLDOPA) & is.na(STARTDT_LD)~"OK",
#         is.na(ONLDOPA) & DATE <= STARTDT_LD~"OK",
#         is.na(ONLDOPA) ~ "NotUsedInREF",
#         is.na(STARTDT_LD)~ "OnlyInREF",
#         DATE <  STARTDT_LD~"EarlierInREF",
#         DATE >=  STARTDT_LD~"OK"
#     ))%>%
#     with(table(Consistent_with_Use_of_PD_Medication.csv_LD, useNA = "always"))
# inner_join(drug_init2, drug3, by="PATNO") %>% 
#     mutate(Consistent_with_Use_of_PD_Medication.csv_DA = case_when(
#         is.na(ONDOPAG)~ "NotUsedInREF",
#         is.na(STARTDT_DA)~ "OnlyInREF",
#         DATE < STARTDT_DA~"EarlierInREF",
#         DATE >=  STARTDT_DA~"OK"
#     ))%>%
#     with(table(Consistent_with_Use_of_PD_Medication.csv_DA, useNA = "always"))

# Validate with Initiation_of_PD_Medication-_incidents.csv
Reference this file with the working dataset

In [15]:
cat("Compare the file MED_INIT.csv against 'Initiation_of_PD_Medication-_incidents.csv'(REF)")
init1 = read_func("Initiation_of_PD_Medication-_incidents.csv", c("INITMDDT", "INITMDVS"), F,F) %>% 
  filter(!is.na(INITMDDT)) %>% 
  mutate(INITMDDT=as.Date(paste("01", INITMDDT, sep = "/"), format = "%d/%m/%Y") %>% as.numeric) 
full_join(drug_init, init1, by="PATNO") %>% 
    mutate(Match_with_Initiation_of_PD_Medication_incidents = case_when(
    is.na(STARTDT_ALL) ~ "0. No information of drug initiation in both",
    STARTDT_ALL== INITMDDT ~ "1. Match as initiation of Any anti-PD Drug",
    STARTDT_DRT== INITMDDT ~ "2. Not Above, But DRT Initiation",
    STARTDT_LD== INITMDDT ~ "3. Not Above, But LD Initiation",
    STARTDT_DA== INITMDDT ~ "4. Not Above, But DA Initiation",
    TRUE~ "5. No Match with REF"
    ),
    INITMDDT_exists = !is.na(INITMDDT)) %>%
  with(table(Match_with_Initiation_of_PD_Medication_incidents, INITMDDT_exists, useNA = "always"))

Compare the file MED_INIT.csv against 'Initiation_of_PD_Medication-_incidents.csv'(REF)

                                                INITMDDT_exists
Match_with_Initiation_of_PD_Medication_incidents FALSE TRUE <NA>
    0. No information of drug initiation in both     0    1    0
    1. Match as initiation of Any anti-PD Drug       0  391    0
    2. Not Above, But DRT Initiation                 0    1    0
    5. No Match with REF                           414   35    0
    <NA>                                             0    0    0

In [16]:
full_join(drug_init2, init1, by="PATNO") %>% 
    mutate(Match_with_Initiation_of_PD_Medication_incidents = case_when(
    is.na(STARTDT_ALL) ~ "0. No information of drug initiation in both",
    STARTDT_ALL== INITMDDT ~ "1. Match as initiation of Any anti-PD Drug",
    STARTDT_DRT== INITMDDT ~ "2. Not Above, But DRT Initiation",
    STARTDT_LD== INITMDDT ~ "3. Not Above, But LD Initiation",
    STARTDT_DA== INITMDDT ~ "4. Not Above, But DA Initiation",
    TRUE~ "5. No Match with REF"
    ),
    INITMDDT_exists = !is.na(INITMDDT)) %>%
  with(table(Match_with_Initiation_of_PD_Medication_incidents, INITMDDT_exists, useNA = "always"))

                                                INITMDDT_exists
Match_with_Initiation_of_PD_Medication_incidents FALSE TRUE <NA>
    0. No information of drug initiation in both  1143    0    0
    1. Match as initiation of Any anti-PD Drug       0  384    0
    2. Not Above, But DRT Initiation                 0    2    0
    3. Not Above, But LD Initiation                  0    1    0
    5. No Match with REF                           447   41    0
    <NA>                                             0    0    0