Analysis source code for: Machine Learning and Risk Assessment: Random Forest Does not Outperform Logistic Regression in the Prediction of Sexual Recidivism
Paper by Sonja Etzler, Felix D. Schönbrodt, Florian Pargent, Reinhard Eher, & Martin Rettenberger
Analysis code by Florian Pargent & Felix Schönbrodt.
This source code is distributed under a MIT license (see LICENSE file).
Software Heritage Identifier: swh:1:dir:0a8b190c964211498953b329cdb4454535b8b8da
We are not allowed to provide the sensitive raw data, hence the results are not fully reproducible. Nonetheless, we provide the full source code to allow an evaluation of our analytical approach. Only two files are relevant:
- analysis.Rmd: This rmarkdown file does all computations, generates plots and results tables, and creates an html file as a report.
- preprocessing.R: This cleans the raw data file. This is automatically sourced when you run analysis.Rmd.
Using the Software Heritage ID (SWHID) you can verify that you have the exact version of the source code that was used for the final computations of the paper.
The final computations have been done in R version 4.1.3; all package versions have been recorded in the renv.lock lockfile. When you open the .Rproj file for the first time in RStudio, renv will warn you that "The project library is out of sync with the lockfile". You can download the correct package versions to the local project environment with renv::restore() (nothing is changed in you usual local package location). If you do not work with RStudio, you have to call the renv::restore() command manually.
The code should run on a regular PC within a couple of minutes (~8 min. on a 2021 Macbook Pro); there are no unusual requirements regarding working memory or other resources.