non-empty languages: 164, dedup+entry>10: 105
data/preprocessed/colex_pron_langs.csv
-
with (lat, long) geo information
data/preprocessed/colex_pron_geo.csv
-
dedupped:
data/preprocessed/colex_pron_geo_dedup.csv
data/preprocessed/langs_nr.json
phonemes vs. concretness
IPA
start with plosive/fricative (binary)
kiki bouba effect concretness
NEXT STEPS:
- COMPARE ALSO WITH Gast paper, phylogenetic relations (DataStageV)
- compare different set of lexicons with different previous work
- make some nice visualizations of colexifications.
- entries: 945382
- lexicalizations 129677, colexification 510198
- ignoring the lexical form per (colex, lang): 859296
- 6713 pair of languages.
- concepts intotal: 22348, having more than 1 language: 20176
data/phon/lang2lang_phon.csv
-
clear support for hypothesis : the conepts closer in concreteness are more probable to colexify
- analyze it monolingually and crosslingually
- linear regression, coefficients.
-
related work about conceptual background for colexification
- clear support for phonology/geo
- farther the languages' distances are, further the phonetic similarity
- colex doesn't have this clear-cut support
top-family, isocode, etc. https://glottolog.org/resourcemap.json?rsc=language