This folder contains (i) code and (ii) appendix to the paper 'Indefinite pronouns optimize the simplicity/informativeness trade-off' by Milica Denić, Shane Steinert-Threlkeld and Jakub Szymanik.
The appendix file is appendix_negative_indefinites.pdf.
In the remainder, we describe the code for Experiments 1 and 2. Experiments 1-i and 2-i, Experiments 1-ii and 2-ii, and Experiments 1-iii and 2-iii work similarly, with the relevant files in their respective folders (note that the same languages are used across the four experimental settings).
Python and R scripts are in the folder src. CSV files needed for scripts to run and generated by them are in the folder data.
-
- Description: It extracts the prior probability distribution over flavors from the annotated corpus from Beekhuizen et al.'s (2017) study downloaded from here, and stores it in Beekhuizen_priors.csv file.
- Dependencies: beekhuizen_full_set.csv
-
- Description: It generates the minimum-length feature-based descriptions of all logically possible indefinite pronouns (in terms of which combination of flavors they can express) and stores them in minimum-desc-indef.csv file.
-
- Description: Definitions of a series of useful functions for Experiments 1 and 2.
- Dependencies: Beekhuizen_priors.csv
-
- Description: It imports the data file with Haspelmath's 40 natural languages, generates 10000 aritificial languages used in Experiment 1. It stores the languages of Experiment 1 in languages_exp1.csv.
- Dependencies: Indefinites_functions.R, languages_real_40_updated.csv
-
- Description: It imports the data file with languages of Experiment 1, and performs overlap and coverage matching and stores the matched languages in Exp1_languages_matched001_timeout.csv.
- Dependencies: Indefinites_functions.R, languages_exp1.csv
-
Exp1_languages_cost_complexity.R
- Description: It computes communicative cost and complexity of languages of Experiment 1 matched for overlap and coverage and stores them into all_complexity_cost_exp1.csv.
- Dependencies: Indefinites_functions.R, languages_exp1.csv, Exp1_languages_matched001_timeout.csv, minimum-desc-indef.csv
-
- It generates 10 000 artificial languages used in Experiment 1 (5000 Haspel-ok and 5000 Not Haspel-ok languages). It stores the languages of Experiment 2 in languages_exp2.csv.
- Dependencies: Indefinites_functions.R,
-
- Description: It imports the data file with languages of Experiment 2, and performs overlap and coverage matching and stores the matched languages in Exp2_languages_matched001_timeout.csv.
- Dependencies: Indefinites_functions.R, languages_exp2.csv
-
Exp2_languages_cost_complexity.R
- Description: It computes communicative cost and complexity of languages of Experiment 2 matched for overlap and coverage and stores them into all_complexity_cost_exp2.csv.
- Dependencies: Indefinites_functions.R, languages_exp2.csv, Exp2_languages_matched001_timeout.csv, minimum-desc-indef.csv
-
- Description: It runs an evolutionary algorithm selecting for Pareto optimal languages for 100 generations. It stores the complexity and communicative cost measures of the final generation in finalgencostcom.csv. Finally, it selects dominant languages in terms of complexity and communicative cost from the final generation and the languages used in Experiment 1 and stores them in pareto_dominant.csv.
- Dependencies: all_complexity_cost_exp1.csv, minimum-desc-indef.csv, allitems.csv, (a file with all logically possible items), Beekhuizen_priors.csv
-
- Description: It imports the data file with dominant languages in terms of complexity and communicative cost and based on them estimates the Pareto frontier for indefinite pronouns. It plots languages of Experiment 1 and languages of Experiment 2 with respect to the frontier. It computes minimum Euclidian distances of languages of Experiment 1 and 2 from the Pareto frontier, and stores them in natural_distances_pareto.csv, artificial_distances_pareto.csv, Haspok_distances_pareto.csv, Haspnotok_distances_pareto.csv. Finally, it establishes that (i) natural languages are closer to the frontier than artificial languages; and (ii) that languages which satisfy Haspelmath's universals are closer to the frontier than languages which do not satisfy them.
- Dependencies: Indefinites_functions.R, all_complexity_cost_exp1.csv, all_complexity_cost_exp2.csv, pareto_dominant.csv