SATISFY EASY LADDERS: Accessible machine learning to facilitate analysis of larger sensitive text data samples in criminology

This repository contains the Python code used to train, evaluate and apply accessible machine-learning models to a sample of Child Safeguarding Practice Review (CSPR) PDF documents publicly available in the NSPCC National Case Review Repository. Written permission was acquired to use the CSPRs for the research.

Models were developed to detect sentences that mentioned the following 4 concepts:

Missing from home or care (MFHC)
Exploitation
Special educational needs or disabilities (SEND)
School exclusion

This code was used in the paper:

Fox, N. (2026). SATISFY EASY LADDERS: Accessible machine learning to facilitate analysis of larger sensitive text data samples in criminology. Evidence Base. https://doi.org/10.1080/30679125.2025.2612199

What does the code do?

Sentencise documents
Split labelled data
Experiment:
- Develop ML models with 14 labelled CSPRs
- Classify using a keyword (KW) approach
- Compare ML and KW approaches
- Evaluate ML models on 'test' set
Apply ML models to 193 unlabelled CSPRs
Analyse human-in-the-loop (HITL) review stats

Data

The child safeguarding practice review (CSPR) documents are publicly available in the NSPCC National Case Review Repository, and NSPCC permission has been obtained for their use in the research.

Author

Nicola Fox created this code as part of a Criminology PhD at the University of Manchester (UK), supervised by Dr Réka Solymosi, Dr Caroline Miles, and Dr Riza Batista-Navarro.

Citation

Fox, N. (2026). SATISFY EASY LADDERS: Accessible machine learning to facilitate analysis of larger sensitive text data samples in criminology. Evidence Base. https://doi.org/10.1080/30679125.2025.2612199

Licence

This code is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this repository are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/

Resources

The resources below provide a useful introduction for those beginning to explore natural language processing (NLP) applications with machine learning (ML) in Python or R.

Books

Müller, A. C., & Guido, S. (2016). Introduction to machine learning with Python: A guide for data scientists. O’Reilly Media.

Bengfort, B., Bilbro, R. and Ojeda, T. (2018). Applied text analysis with python. O’Reilly Media.

Hvitfeldt E., and Silge, J. (2022). Supervised machine learning for text analysis in R. CRC Press.

Silge, J. and Robinson, D. (2017). Text mining with R. O’Reilly Media.

Burger, S. V. (2018). Introduction to machine learning with R. O’Reilly Media.

James, G., Witten, D., Hastie, T., Tibshirani, R. and Taylor, J. (2021). An introduction to statistical learning (2nd ed.). Springer. - either the R version (2021) or Python version (2023)

Python scikit-learn documentation

The official scikit-learn documentation explains the Python scikit-learn functions used in the code in this repository and includes some helpful tutorials. For example:

scikit-learn. (n.d.). Working with text data. scikit-learn. https://scikit-learn.org/1.4/tutorial/text_analytics/working_with_text_data.html

Blogs and videos

Markham, K. (2016, May 24). Machine learning with text in scikit-learn [Video]. YouTube. https://www.youtube.com/watch?v=8QmkFAthuPU

The Machine Learning Mastery website machinelearningmastery.com contains a large number of helpful explainer blog posts. For example:

Brownlee, J. (2020, August 2). Precision, recall and F-measure for imbalanced classification. Machine Learning Mastery. https://machinelearningmastery.com/precision-recall-and-f-measure-for-imbalanced-classification/

Brownlee, J. (2021, May 1). Tour of evaluation metrics for imbalanced classification. Machine Learning Mastery. https://machinelearningmastery.com/tour-of-evaluation-metrics-for-imbalanced-classification/

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
1_sentencise_documents		1_sentencise_documents
2_split_labelled_data		2_split_labelled_data
3_experiment		3_experiment
4_apply_ml_models		4_apply_ml_models
5_analyse_hitl_review_stats		5_analyse_hitl_review_stats
6_other		6_other
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SATISFY EASY LADDERS: Accessible machine learning to facilitate analysis of larger sensitive text data samples in criminology

What does the code do?

Data

Author

Citation

Licence

Resources

Books

Python scikit-learn documentation

Blogs and videos

About

Uh oh!

Releases

Packages

Languages

foxnic/accessible_machine_learning

Folders and files

Latest commit

History

Repository files navigation

SATISFY EASY LADDERS: Accessible machine learning to facilitate analysis of larger sensitive text data samples in criminology

What does the code do?

Data

Author

Citation

Licence

Resources

Books

Python scikit-learn documentation

Blogs and videos

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages