The focus of this report is to provide examples on conducting initial data analysis in a reproducible manner in the context of intended regression analyses.
Objective: Develop initial data analysis plan (IDAP) and provide sample reports for IDA before executing the statistical analysis plan (SAP) for regression modeling.
Six steps of the IDA framework [Ref 1] are
- Meta data set-up
- Data cleaning
- Data screening
- Initial data reporting
- Updating/refining the statistical analysis plan
- Reporting of IDA findings in research papers [Ref 2]
For our objective, we assume that meta data exist and data cleaning has already been done. We created hypothetical statistical analyses plans for each of the data sets.
- Sample IDA plans and IDA reports to illustrate data screening and initial reporting (steps 3 and 4 of the IDA framework)
- Recommendations for numerical and graphical summaries (step 3 of IDA framework)
- Explanation and elaboration of potential consequences to the SAP as a result of IDA findings (step 5 of IDA framework)
- Recommendations for reporting of IDA for regression analyses (step 6 of IDA framework)
- Manuscript with scope of regression model, generic IDA strategy, examples with IDA discoveries and consequences
- main - General report files
- data-raw - Repository for original data sets and their data dictionaries
- data - Repository for analysis data sets
- R - R functions for data visualization and transformations used in the R markdown files
- docs - report in website
- report - report in MS word format
https://docs.google.com/spreadsheets/d/1Ft5eyenvDnMBoLvJmcBaklfrYcwyW-rkt-ivIkaphdA/edit?usp=sharing
[1] Huebner M, le Cessie S, Schmidt CO, Vach W . A contemporary conceptual framework for initial data analysis. Observational Studies 2018; 4: 171-192. Link
[2] Huebner M, Vach W, le Cessie S, Schmidt C, Lusa L. Hidden Analyses: a review of reporting practice and recommendations for more transparent reporting of initial data analyses. BMC Med Res Meth 2020; 20:61 Link
[3] Ratzinger F, Dedeyan M, Rammerstorfer M, Perkmann T, Burgmann H, et al. (2014) A Risk Prediction Model for Screening Bacteremic Patients: A Cross Sectional Study. PLoS ONE 9(9): e106765. doi:10.1371/journal.pone.0106765
SAP - statistical analysis plan
IDA - initial data analysis
IDAP - initial data analysis plan
None.
Contributors are from the STRATOS Initiative.
-TG2: Selection of variables and functional forms in multivariable analyses.
-TG3: Initial data analysis
Mark Baillie
Novartis,
Email: mark.baillie@novartis.com
Georg Heinze
Medical University, Vienna, Austria
Email: georg.heinze@meduniwien.ac.at
Marianne Huebner
Department of Statistics and Probability, Michigan State University, East Lansing, MI, USA
Email: huebner@msu.edu