This repository contains the code and data for a paper on the automated extraction of literary genres based on distributions of parts of speech.
The data are taken from the ETCBC.
The script experiment.R is set to be run from within RStudio.
The results include a PCA biplot in 2D and one in 3D as well as a correlation plot. They can be inspected in the results folder.
Please cite the actual paper in the Journal for North-West Semitic Languages when using this repo.
Johan de Joode, "The Distribution of Parts of Speech in the Literary Genres of the Hebrew Bible: A Digital Stylistic Approach, Journal of Northwest Semitic Languages 46/1 (2020), pp. 67-90
The research for this article was conducted as part of the project The Genes of Genre: Classifying Literary Text Types Using Statistical Modelling (3H180173, KU Leuven, with as principal investigators Eibert Tigchelaar, Pierre Van Hecke, and Dirk Speelman).