Benchmark Sound Law LSTM

Part of a project that tries to automatically derive sound laws from a list of cognates.

This project uses the ielex dataset as provided in Jäger et al. 2017, "Using support vector machines and state-of-the-art algorithms for phonetic alignment to ientify cognates in multi-lingual wordlists".

Prepare data

Obtain NorthEuraLex dataset by running wget http://www.sfs.uni-tuebingen.de/~jdellert/northeuralex/0.9/northeuralex-0.9-forms.tsv.
Obtain cognate set dataset and merge it with NorthEuraLex by using wikt_reader library. You would get a family file.
Prepare input data by running

python scripts/process_data_wikt.py --data_path <path_to_family_file> --source <src> --targets <tgt_langs> --no_need_transcriber

For instance, for the Germanic language family, run

python scripts/process_data_wikt.py --data_path data/Germanic.tsv --source gem-pro --targets eng deu isl nor swe dan nld --no_need_transcriber

Dependencies

various packages in requirements.txt. Run pip install -r requirements.txt.
boost packages are needed. On Ubuntu, run sudo apt-get install libboost-all-dev.
Install spdlog with the static lib version.

Name		Name	Last commit message	Last commit date
Latest commit History 639 Commits
data		data
dev_misc @ 6c59303		dev_misc @ 6c59303
editdistance @ 0da4742		editdistance @ 0da4742
pypheature @ f64e249		pypheature @ f64e249
scripts		scripts
sound_law		sound_law
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
install.sh		install.sh
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Benchmark Sound Law LSTM

Prepare data

Dependencies

About

Releases

Packages

Languages

j-luo93/ASLI

Folders and files

Latest commit

History

Repository files navigation

Benchmark Sound Law LSTM

Prepare data

Dependencies

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages