Skip to content

Code for training and evaluating subword segmental language models, as in the Findings of EMNLP paper "Subword Segmental Language Modelling for Nguni Languages" (Meyer and Buys, 2022)

Notifications You must be signed in to change notification settings

francois-meyer/subword-segmental-lm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

Subword Segmental Language Modelling for Nguni Languages

Paper: Subword Segmental Language Modelling for Nguni Languages

Francois Meyer and Jan Buys, Findings of EMNLP 2022

Train subword segmental language models (SSLMs) - language models that learn how to segment words while being trained for autoregressive language modelling.

Evaluate trained SSLMs on intrinsic language modelling performance (BPC) and as unsupervised morphological segmenters.

The datasets and models produced for this paper are publicly available:

Dependencies (Python)

  • NumPy
  • PyTorch
  • Torchtext
  • tqdm
  • NLTK

About

Code for training and evaluating subword segmental language models, as in the Findings of EMNLP paper "Subword Segmental Language Modelling for Nguni Languages" (Meyer and Buys, 2022)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages