pic-analysis

Analyses for "Word forms - not just their lengths - are optimized for efficient communication"

To download / clean / compute frequency and in-context surprisal estimates, see smeylan/ngrawk. To download these estimates instead, in the appropriate directory structure:

wget cocosci.berkeley.edu/smeylan/pic/results.zip && unzip results.zip
wget cocosci.berkeley.edu/smeylan/pic/token_results.zip && unzip token_results.zip

The analysis requires other data sources including the Clearpond database, dates of first use from the Oxford English Dictionary, word lists from OPUS, and a list of plurals. To download these:

wget cocosci.berkeley.edu/smeylan/pic/data.zip && unzip data.zip

To limit the analysis to morphologically simple words, you will need a copy of the CELEX2 corpus. Add a symlink with ln -s to the data/ directory after decompressing.

To check against the Piantadosi et al. (2011) results, download and unzip publicly avalable data from the Colala website and place them in data/Google1T_Piantadosi. To filter the words, you will also need to request the OPUS wordlists from the Colala lab and place them in data/OPUS_Piantadosi.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Clearpond Neighbors Per Word Length.ipynb		Clearpond Neighbors Per Word Length.ipynb
Cross-linguistic PIC (1T, Books 2012, OPUS, BNC).ipynb		Cross-linguistic PIC (1T, Books 2012, OPUS, BNC).ipynb
PIC by Century of First Appearance.ipynb		PIC by Century of First Appearance.ipynb
Piantadosi et al. (2011) Revisited.ipynb		Piantadosi et al. (2011) Revisited.ipynb
README.md		README.md
piantadosi_analysis.R		piantadosi_analysis.R
ss_analysis.R		ss_analysis.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pic-analysis

About

Releases

Packages

Languages

smeylan/pic-analysis

Folders and files

Latest commit

History

Repository files navigation

pic-analysis

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages