This repo contains code and data for "An information-theoretic characterization of morphological fusion" (at EMNLP 2021).
Contact neilrathi@gmail.com with any questions!
code
contains the code for creating fusion data for a language, as well as analysis coderesult_plots
contains the plots used in the paper (main figure, paradigm size vs. fusion, frequency vs. fusion)langdata
has data for- fusion by part-of-speech and language
- paradigm size by part-of-speech and language (vs. fusion)
- form frequency by feature and language (vs. fusion)
- R. We used version 4.0.3. Analyses and plot generation require
tidyr
,dplyr
,ggplot2
, andrPref
. - Python 3.8
- GPU TensorFlow. We used version 2.2.0.