Skip to content

mkrzyzanska/TwitterArchaeology

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TwitterArchaeology

Code to accompany the article: Positive sentiment and expertise predict the diffusion of archaeological content on social media.

Introduction:

This repository contains codes used for the collection and management of data and for the analysis undertaken for the article 'Positive sentiment and expertise predict the diffusion of archaeological content on social media'. The code was run in R, python and mongo database shell.

Data collection:

Historical data from Twitter was extracted via the Academic API using the full-archive search endpoint. We extracted tweets containing hashtag #archaeology for the period from 01.02.2023 to 08.02.2023, inclusive. The data was collected using the academictwitteR library and as part of the set up, Twitter App was created within the approved academic project and the bearer token was set as an R environmental variable. It included both tweets and additional files with user data. The code is available here.

Data processing:

Subsequently, data was imported into Mongo Database and cleaned. This involved checking for duplicates, separating the replies from original tweets and setting indexes on text fields (see here).

Analysis:

Sentiment analysis:

Sentiment scores were calculated using Valence Aware Dictionary and sEntiment Reasoner and assigned to each tweet in the database (see here).

Threat:

Threat words were identified based on the dictionary provided in Table S2 in the supplementary material to Choi et al. 2022. The dictionary was imported into the mongo database and used to identify threat level in tweets.

Topic models:

The LDA topic models were run on the text of the tweets with topic number (n) between 2 and 29 (see here) and coherence scores were calculated for each n. The model with the highest score was selected and an interactive visualisation was constructed to help with the analysis of the topics. Subsequently, the topic probabilities were assigned to the tweets in the database. The same procedure was repeated for user categories with the visualisation available here.

Bayesian model:

The Zero-Inflated model which inlcudes threat, sentiment topcis, and tweet topics and user categories as variables was constructed in stan) and run on the collected data. Subsequently, the model results were analysed and plotted here, and the model itself is available here. Furthermore, a model that includes additional variables with associated files is available in the extra_variables folder.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors