This repository includes the R code used for the paper Non-parametric Treatment Effect Bounds for Left-Censored Outcomes: Estimating the Effect of Herbicide Use on 2,4-D Exposure.
The simulation study code is in the folder simulation_study_1.
The study is designed to be run on a parallel computing cluster using the SLURM workload manager. The workload manager spawns an array of jobs (for the manuscript, we ran 250 jobs), where each job is assigned a unique job index through the SLURM_ARRAY_TASK_ID environment variable. The simulation_study.R script uses the assigned job index to set a unique random number seed for its simulated datasets.
To run a single job without using SLURM, you can set the required environment variables manually:
Sys.setenv(SLURM_ARRAY_TASK_ID = 1)
Sys.setenv(CACHE_PATH = "/path/to/cache/directory")
source("simulation_study_1/simulation_study.R")The CACHE_PATH environment variable specifies the directory where the results of the job will be saved.
After multiple jobs are executed, there will be multiple results files saved in the cache path. To combine these cached results into a single analysis dataset, use the script simulation_study_1/collect.R, with the RESULTS_PATH environment variable specifying a folder for the results file:
Sys.setenv(CACHE_PATH = "/path/to/cache/directory")
Sys.setenv(RESULTS_PATH = "/path/to/results/folder")
source("simulation_study_1/collect.R")
Finally, the script simulation_study_1/analyze.R loads the results, calculates bias/variance metrics, and generates the tables for the manuscript:
Sys.setenv(RESULTS_PATH = "/path/to/results/folder")
source("simulation_study_1/analyze.R")
# Table 1
results_table_1
# Table 1, formatted for LaTeX
results_table_1_latex
# Table 2
results_table_2
# Table 2, formatted for LaTeX
results_table_2_latex
Code for replicating the NHANES data analysis is located in the folder analysis.
The required NHANES data from the cycles 2009-2010, 2011-2012, 2013-2014, and 2015-2016 for the case study can be downloaded from the National Center for Health Statistics. The rquired files are:
- Demographics: DEMO_F.xpt, DEMO_G.xpt, DEMO_H.xpt, DEMO_I.xpt
- Pesticide Use: PUQMEC_F.xpt, PUQMEC_G.xpt, PUQMEC_H.xpt, PUQMEC_I.xpt
- Albumin & Creatinine - Urine: ALB_CR_F.xpt, ALB_CR_G.xpt, ALB_CR_H.xpt, ALB_CR_I.xpt
- Pyrethroids, Herbicides, & Organophosphorus Metabolites - Urine: UPHOPM_F.xpt, UPHOPM_G.xpt, UPHOPM_H.xpt, UPHOPM_I.xpt
Save the files to the analysis folder. Then run the analysis/analysis.R script to generate the results in the manuscript:
source("analysis/analysis.R")