This repository accompanies the following paper:
- Iris van de Pol, Shane Steinert-Threlkeld, and Jakub Szymanik, "Complexity and learnability in the explanation of semantic universals of quantifiers", Proceedings of the 41st Annual Meeting of the Cognitive Science Society (CogSci 2019).
This project explores the complexity of quantifiers in the explanation of semantic universals, and compares these to earlier results on learnability in the explanation of semantic universals (see Steinert-Threlkeld and Szymanik (in press)). In particular, we compute the Lempel-Ziv (LZ) complexity (as an approximation to Kolmogorov complexity) of several quantifiers, some satisfying proposed universals (monotonicity, quantity, and conservativity) and some not.
Our results indicate that the monotonicity universal can be explained by complexity while the conservativity universal cannot. For quantity we did not find a robust result. We also found that learnability and complexity pattern together in the monotonicity and conservativity cases that we consider, while that pattern is less robust in the quantity cases
This repository contains all the code needed to replicate the data reported in that paper. It also contains the data and figures that are reported therein.
If you have any questions and/or want to extend the code-base and/or generate your own quantifier complexity data, feel free to get in touch!
Python 3, lempel_ziv_complexity, Pandas, plotnine, scipy, statsmodels
The core method for generating the data is LZ_dist_quantifier_cumulative in gen_data.py.
The LZ_dist_quantifier_cumulative method accepts the following parameters:
quantifiers: a list of quantifier objectsmax_model_size: an integer, determining the size (i.e., number of elements) of the quantifier modelsnum_model_orders: an integer, determing the number of different model-order permutationsnum_chars: an integer, determining the number of characters used in the quantifier representation, (the number of subareas—--AB, AnotB, BnotA, and neither---of the quantifier model that are considered) needs to be either 2, 3, or 4num_procs: an integer, determining the number of processors used for parallel computingout_file: a string, determining the name of the csv file in which the results are storedorder_type: either 'lex' (for lexicographic model orders) or 'rand' (for random model orders)
Example of a run from the command-line:
python gen_data.py --num_procs 4 --max_model_size 10 --jobtype cumulative --num_chars 4 --quantifiers all_quantifiers --num_model_orders 12 --order_type lex --out_file output.csv
LZ_dist_quantifier_cumulative stores the results in an .csv file.
Note that in addition to the LZ complexity of quantifiers, this code also computes the complexity based on two other compression methods, gzip (based on LZ compression), and bzip2 (a block sorting compressor). Plots showing LZ and and bzip2 complexities can be found in this repository.
The plots in the data report average complexity with confidence interval zones and were created with method make_summary_complexity_per_size_plot in analysis.py.
The make_summary_complexity_per_size_plot method accepts the following parameters:
data: a pandas data frameorder: an integer, the name of the model order of the quantifier representationquants: a list of quantifier namesout_file: a string, name for a csv file (including .csv)add_measure: True or False, set to True when using facet_wrap
The make_summary_complexity_per_size_plot method is called by the plot method. The plots are stored as .png files.
All of the quantifiers are defined in quantifiers.py. The core is a verification method, named models_quantifier_name(model). This takes in a sequence of elements from (0,1,2,3) and outputs True or False. This method implements the core semantics of a quantifier.
The verification method is then wrapped inside a Quantifier object, which provides a name for the quantifier and assigns it certain semantic properties. The Quantifier object calls the models_ function in its __call__ method, so that you can treat it as a method.
- Iris van de Pol
- Shane Steinert-Threlkeld
- Jakub Szymanik