Skip to content

foxnic/accessible_machine_learning

Repository files navigation

SATISFY EASY LADDERS: Accessible machine learning to facilitate analysis of larger sensitive text data samples in criminology

This repository contains the Python code used to train, evaluate and apply accessible machine-learning models to a sample of Child Safeguarding Practice Review (CSPR) PDF documents publicly available in the NSPCC National Case Review Repository. Written permission was acquired to use the CSPRs for the research.

Models were developed to detect sentences that mentioned the following 4 concepts:

  1. Missing from home or care (MFHC)
  2. Exploitation
  3. Special educational needs or disabilities (SEND)
  4. School exclusion

This code was used in the paper:

Fox, N. (2026). SATISFY EASY LADDERS: Accessible machine learning to facilitate analysis of larger sensitive text data samples in criminology. Evidence Base. https://doi.org/10.1080/30679125.2025.2612199

What does the code do?

  • Sentencise documents
  • Split labelled data
  • Experiment:
    • Develop ML models with 14 labelled CSPRs
    • Classify using a keyword (KW) approach
    • Compare ML and KW approaches
    • Evaluate ML models on 'test' set
  • Apply ML models to 193 unlabelled CSPRs
  • Analyse human-in-the-loop (HITL) review stats

Data

The child safeguarding practice review (CSPR) documents are publicly available in the NSPCC National Case Review Repository, and NSPCC permission has been obtained for their use in the research.

Author

Nicola Fox created this code as part of a Criminology PhD at the University of Manchester (UK), supervised by Dr Réka Solymosi, Dr Caroline Miles, and Dr Riza Batista-Navarro.

Citation

Fox, N. (2026). SATISFY EASY LADDERS: Accessible machine learning to facilitate analysis of larger sensitive text data samples in criminology. Evidence Base. https://doi.org/10.1080/30679125.2025.2612199

Licence

This code is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this repository are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/

Resources

The resources below provide a useful introduction for those beginning to explore natural language processing (NLP) applications with machine learning (ML) in Python or R.

Books

Müller, A. C., & Guido, S. (2016). Introduction to machine learning with Python: A guide for data scientists. O’Reilly Media.

Bengfort, B., Bilbro, R. and Ojeda, T. (2018). Applied text analysis with python. O’Reilly Media.

Hvitfeldt E., and Silge, J. (2022). Supervised machine learning for text analysis in R. CRC Press.

Silge, J. and Robinson, D. (2017). Text mining with R. O’Reilly Media.

Burger, S. V. (2018). Introduction to machine learning with R. O’Reilly Media.

James, G., Witten, D., Hastie, T., Tibshirani, R. and Taylor, J. (2021). An introduction to statistical learning (2nd ed.). Springer. - either the R version (2021) or Python version (2023)

Python scikit-learn documentation

The official scikit-learn documentation explains the Python scikit-learn functions used in the code in this repository and includes some helpful tutorials. For example:

scikit-learn. (n.d.). Working with text data. scikit-learn. https://scikit-learn.org/1.4/tutorial/text_analytics/working_with_text_data.html

Blogs and videos

Markham, K. (2016, May 24). Machine learning with text in scikit-learn [Video]. YouTube. https://www.youtube.com/watch?v=8QmkFAthuPU

The Machine Learning Mastery website machinelearningmastery.com contains a large number of helpful explainer blog posts. For example:

Brownlee, J. (2020, August 2). Precision, recall and F-measure for imbalanced classification. Machine Learning Mastery. https://machinelearningmastery.com/precision-recall-and-f-measure-for-imbalanced-classification/

Brownlee, J. (2021, May 1). Tour of evaluation metrics for imbalanced classification. Machine Learning Mastery. https://machinelearningmastery.com/tour-of-evaluation-metrics-for-imbalanced-classification/

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published