Roland Schäfer¹ & Elizabeth Pankratz² (accepted w/minor rev. on 3 June 2018 by Morphology [JOMO])
¹Freie Universität Berlin, ²Humboldt Universität zu Berlin
- Paper website http://rolandschaefer.net/?p=1532
- Data storage https://github.com/rsling/linkingelements
- Contact email mail@rolandschaefer.net
This is the README file for data package and paper LaTeX/knitr sources.
- To replicate the full analysis, you have to get some large data files (if you checked this out from Github). To download them, type the following in Bash (MacOS or GNU/Linux - or hypothetically Windows with Bash and POSIX tools) after descending to this folder:
Data/get_bigfiles.sh
- You can check how the count files and other data bases were generated by checking out the following Bash scripts:
Data/Database/make_counts.sh
Data/Database/make_data.sh
Data/Database/make_real_blacklists.sh
This is usually not required, however. The generated files are included in the distribution, and it takes a long time to re-generate them.
- To see how we did the data generation for the corpus study and creation of stimuli, check the following file:
Data/R/corpus.data.R
- To check how we analysed the (manually annotated) corpus data in R, see this file:
Data/R/corpus.analyse.R
There is a large number of original query results (also downloaded by Data/get_bigfiles.sh, see 1) in
Data/Corpusstudy/Queries/Output/
- To check how we analysed the split-100 experiment in R, see this file:
Data/R/split100.analyse.R
- The data from the experiment (including PsychPy files), which are analysed by the script in step 5 is located in
Data/Split100/
- To see how we integrated the data into the main paper, see the knitr sources including the make file:
Paper/leglossa.Rnw
Paper/Makefile
Notice that it was not feasible to include the whole data generation and analysis process in the knitr file. Compiling the document would take hours.