Part of a project that tries to automatically derive sound laws from a list of cognates.
This project uses the ielex dataset as provided in Jäger et al. 2017, "Using support vector machines and state-of-the-art algorithms for phonetic alignment to ientify cognates in multi-lingual wordlists".
- Obtain NorthEuraLex dataset by running
wget http://www.sfs.uni-tuebingen.de/~jdellert/northeuralex/0.9/northeuralex-0.9-forms.tsv
. - Obtain cognate set dataset and merge it with NorthEuraLex by using
wikt_reader
library. You would get a family file. - Prepare input data by running
python scripts/process_data_wikt.py --data_path <path_to_family_file> --source <src> --targets <tgt_langs> --no_need_transcriber
For instance, for the Germanic language family, run
python scripts/process_data_wikt.py --data_path data/Germanic.tsv --source gem-pro --targets eng deu isl nor swe dan nld --no_need_transcriber
- various packages in
requirements.txt
. Runpip install -r requirements.txt
. boost
packages are needed. On Ubuntu, runsudo apt-get install libboost-all-dev
.- Install
spdlog
with the static lib version.