The scripts in this repository contain the analyses used in our paper and can be used to to reproduce our findings.
- clean_raw_dataCL.py
- sentiment_analyse_clean_dataCL.py
- syntax_analyse_clean_dataCL.py
- feature_extract_clean_dataCL.py
The next three scripts combine these analyses in one file for straightforward plotting and stats in R
- combine_BIG4_csvs.r
- combine_features_sentiment_syntax.r
- extract_final_features.r
- run_lme4_lin_regressions.r
- run_lme4_log_regressions.r
- plot_figure_1.r
- plot_figure_2.r
- plot_figure_3_a.r
- plot_figure_3_b.r
- plot_figure_4.r
- plot_fig_5.r
- The license we purchased for the NOW corpus cannot be shared, but researchers can purchase a license at https://www.english-corpora.org/now/
- ABC News Australia - this dataset is freely available for download on Kaggle at https://www.kaggle.com/datasets/therohk/million-headlines/data
- The Times of India: this dataset is freely available for download on the Harvard Dataverse at https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/DPQMQH
- The official APIs for The Guardian (https://open-platform.theguardian.com/documentation/) and The New York Times (https://developer.nytimes.com/apis) can be used to collect the headlines for these outlets.
- Upworthy headlines: The exploratory package can be downloaded from OSF at https://osf.io/3vqmp
- arXiv: The arXiv dataset can be downloaded from Hugging Face at https://huggingface.co/datasets/arxiv_dataset