Skip to content

Analysis code for "Machine Learning and Risk Assessment: Random Forest Does not Outperform Logistic Regression in the Prediction of Sexual Recidivism" by Etzler et al.

License

Notifications You must be signed in to change notification settings

nicebread/sexual_recidivism

Repository files navigation

SWH

Analysis source code for: Machine Learning and Risk Assessment: Random Forest Does not Outperform Logistic Regression in the Prediction of Sexual Recidivism

Paper by Sonja Etzler, Felix D. Schönbrodt, Florian Pargent, Reinhard Eher, & Martin Rettenberger

Analysis code by Florian Pargent & Felix Schönbrodt.

This source code is distributed under a MIT license (see LICENSE file).

Software Heritage Identifier: swh:1:dir:0a8b190c964211498953b329cdb4454535b8b8da

Reproducing the analysis

We are not allowed to provide the sensitive raw data, hence the results are not fully reproducible. Nonetheless, we provide the full source code to allow an evaluation of our analytical approach. Only two files are relevant:

  • analysis.Rmd: This rmarkdown file does all computations, generates plots and results tables, and creates an html file as a report.
  • preprocessing.R: This cleans the raw data file. This is automatically sourced when you run analysis.Rmd.

Using the Software Heritage ID (SWHID) you can verify that you have the exact version of the source code that was used for the final computations of the paper.

Computing environment

The final computations have been done in R version 4.1.3; all package versions have been recorded in the renv.lock lockfile. When you open the .Rproj file for the first time in RStudio, renv will warn you that "The project library is out of sync with the lockfile". You can download the correct package versions to the local project environment with renv::restore() (nothing is changed in you usual local package location). If you do not work with RStudio, you have to call the renv::restore() command manually.

Computational requirements

The code should run on a regular PC within a couple of minutes (~8 min. on a 2021 Macbook Pro); there are no unusual requirements regarding working memory or other resources.

About

Analysis code for "Machine Learning and Risk Assessment: Random Forest Does not Outperform Logistic Regression in the Prediction of Sexual Recidivism" by Etzler et al.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •