homophony-as-renyi-entropy

This code accompanies the paper On Homophony and Rényi Entropy (Pimentel et al., EMNLP 2021). It is a study of the pressures of homophony in language, analysing homophony through the lens of the Rényi collision entropy.

Data

Download the CELEX data and place the raw LDC96L14.tar.gz file into data/celex/raw/ path. You can then extract its data with command:

$ make get_celex

Install

To install dependencies run:

$ conda env create -f environment.yml

Activate the created conda environment with command:

$ source activate.sh

Finally, install the appropriate version of pytorch:

$ conda install pytorch torchvision cudatoolkit=10.1 -c pytorch
# $ conda install pytorch torchvision cpuonly -c pytorch

Preprocess data

To preprocess a language's data run:

$ make get_data MONOMORPHEMIC=True LANGUAGE=<language>

where language can be one of: eng (English), deu (German), or nld (Dutch).

Train models

To train a language's phonotactic model run:

$ make train MONOMORPHEMIC=True LANGUAGE=<language> MODEL=<model>

where model can be one of: lstm, or ngram.

Evaluate models

There are three commands to evaluate the trained phonotactic models. The first evaluates it on the test set to get its cross-entropy:

$ make eval MONOMORPHEMIC=True LANGUAGE=<language> MODEL=<model>

The second analyses all words with probability above a threshold delta to approximate its renyi entropy:

$ make get_renyi MONOMORPHEMIC=True LANGUAGE=<language> MODEL=<model>

Finally, the third samples artificial lexica from the language models' to run the null hypothesis test:

$ make sample_renyi MONOMORPHEMIC=True LANGUAGE=<language> MODEL=<model>

Analyse models

Finally, to analyse the models and print results run:

$ make analyse MONOMORPHEMIC=True LANGUAGE=<language>

Extra Information

Citation

If this code or the paper were usefull to you, consider citing it:

@inproceedings{pimentel-etal-2021-homophony,
    title = "On Homophony and Rényi Entropy",
    author = "Pimentel, Tiago and
    Meister, Clara and
    Teufel, Simone and
    Cotterell, Ryan",
    booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP)",
    year = "2021",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/2109.13766",
}

Contact

To ask questions or report problems, please open an issue.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.circleci		.circleci
checkpoint/celex		checkpoint/celex
src		src
.gitignore		.gitignore
.pylintrc		.pylintrc
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
activate.sh		activate.sh
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

homophony-as-renyi-entropy

Data

Install

Preprocess data

Train models

Evaluate models

Analyse models

Extra Information

Citation

Contact

About

Releases

Packages

Languages

License

rycolab/homophony-as-renyi-entropy

Folders and files

Latest commit

History

Repository files navigation

homophony-as-renyi-entropy

Data

Install

Preprocess data

Train models

Evaluate models

Analyse models

Extra Information

Citation

Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages