hidden_folk_repository

This is the code used for the following paper:

Manex Agirrezabal, Sidsel Boldsen, Nora Hollenstein (2023), The Hidden Folk: Linguistic Properties encoded in Multilingual Contextual Character Representations, Proceedings of the Workshop on Computation and Written Language (CAWL 2023), pages 6–13, Association for Computational Linguistics

Utils

This script downloads a word list from the OPUS collection[1]. You need to specify the language code, e.g. da, fi, fo, en, and so on.

sh 01_get_freq_list.sh <LANGCODE>

This script gets the relevant pronunciation dictionaries from Wikipron. These dictionaries are obtained through the Wikipron Github repository[2].

sh 02_get_prons.sh

The next script uses m2m-aligner to align letters and phones from the Wikipron dictionary. We need to specify the pronunciation dictionaries name and the language code. The command shows how this is done for the Danish language.

sh 03_align_charsphones.sh dan_latn_narrow.tsv da

In the "ipa" directory, we have to run the following command, which will obtains the phonetic features from phones using the package Ipapy. The command shows how this is done for the Danish language.

python3 get_ipa_embeddings_bogstavlyd_ipapy.py da

In the "canine" directory, we will run the following command, which obtains the Canine embeddings of the letters in a list of words.

python3 get_canine_embeddings_general.py da

To conclude these two commands are used to run the experiments using a Train/test validation mechanism and also the Leave-One-Letter -Out (LOLO).

python3 run_experiment_lolo_extra.py da

python3 run_experiment.py da

[1] https://opus.nlpl.eu/ [2] https://github.com/CUNY-CL/wikipron/tree/master/data/scrape/tsv

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
canine		canine
ipa		ipa
utils		utils
README.md		README.md
char_freqs.csv		char_freqs.csv
char_frequency.py		char_frequency.py
generate_results_table-v2-lolo.py		generate_results_table-v2-lolo.py
generate_results_table-v2.py		generate_results_table-v2.py
plot_lolo_heatmaps.py		plot_lolo_heatmaps.py
plot_results.py		plot_results.py
run_experiment.py		run_experiment.py
run_experiment_lolo_extra.py		run_experiment_lolo_extra.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

canine

canine

ipa

ipa

utils

utils

README.md

README.md

char_freqs.csv

char_freqs.csv

char_frequency.py

char_frequency.py

generate_results_table-v2-lolo.py

generate_results_table-v2-lolo.py

generate_results_table-v2.py

generate_results_table-v2.py

plot_lolo_heatmaps.py

plot_lolo_heatmaps.py

plot_results.py

plot_results.py

run_experiment.py

run_experiment.py

run_experiment_lolo_extra.py

run_experiment_lolo_extra.py

Repository files navigation

hidden_folk_repository

Utils

About

Releases

Packages

Languages

manexagirrezabal/hidden_folk_repository

Folders and files

Latest commit

History

Repository files navigation

hidden_folk_repository

Utils

About

Resources

Stars

Watchers

Forks

Languages