Using LDA to uncover proto-registers = pregisters (SFB1412/A04 Humboldt-Universität)
In this project, we use LDA to uncover proto-registers (pregisters) in large corpora using lexical and gramatical surface features. This is a much more plausible approach than Douglas Biber's MDA, because it allows for a probabilistic many-to-many mapping between documetns and pregisters as well as between pregisters and features.
The project builds on previous work from the COW initiative, where we developed CoREX, a surface feature extractor, which in turn is based on COWTek16 – the COW annotation pipeline – for German.
Investigators (alphabetically):
Felix Bildhauer
Elizabeth Pankratz
Roland Schäfer