Skip to content

Incivility classifier used in Theocharis et al (2020, Sage Open)


Notifications You must be signed in to change notification settings


Repository files navigation

Replication materials: Theocharis et al, 2020, Sage Open

Replication materials for "The Dynamics of Political Incivility on Twitter", by Yannis Theocharis, Pablo Barberá, Zoltán Fazekas, and Sebastian Adrian Popa, published in Sage Open.

Abstract: Online incivility and harassment in political communication have become an important topic of concern among politicians, journalists, and academics. This study provides a descriptive account of uncivil interactions between citizens and politicians on Twitter. We develop a conceptual framework for understanding the dynamics of incivility at three distinct levels: macro (temporal), meso (contextual), and micro (individual). Using longitudinal data from the Twitter communication mentioning Members of Congress in the United States across a time span of over a year and relying on supervised machine learning methods and topic models, we offer new insights about the prevalence and dynamics of incivility toward legislators. We find that uncivil tweets represent consistently around 18% of all tweets mentioning legislators, but with spikes that correspond to controversial policy debates and political events. Although we find evidence of coordinated attacks, our analysis reveals that the use of uncivil language is common to a large number of users.

This README file provides an overview of the materials We are releasing in addition to the article:

  • code/01-creating-synthetic-labels.R contains the code we used to create the synthetic labels to expand the training dataset. We used Google's Perspective API to create high-quality features that allowed us to expand our labeled set of tweets at a low cost. See article for more details.
  • code/02-classifier.R contains the code to train the incivility classifier we use in the paper.
  • code/03-predict.R contains examples showing how to predict incivility on new, unseen tweets.
  • data is a folder with the training dataset, document-feature matrix, and classifier objects.

How to use our classifier

The code we provide allows any researcher to fit our incivility classifier to new tweets (English only) without having to re-train the classifier.

First, load the quanteda package (which we use for preprocessing the text), the classifier functions available in functions.r and the DFM/classifier objects.


Here’s how to compute the probability that a single tweet is uncivil, according to the definition we use in the paper.

# predicting a single tweet
tweet <- "politicians are morons"
predict_incivility(text=tweet, old_dfm = dfm, classifier = lasso)
## [1] 0.9025192

And here’s how to do the same, but for multiple tweets.

# predicting multiple tweets
df <- data.frame(
  text = c( # no incivility
            "I respect your opinion", "you are an example of leadership",
            # some incivility
            "oh shut up", "you are a traitor",
            # very uncivil
            "what an asshole and a loser", "spineless piece of shit")
                   old_dfm = dfm,
                   classifier = lasso)
## [1] 0.2181500 0.1727687 0.6525642 0.6325972 0.9681808 0.9774939


Incivility classifier used in Theocharis et al (2020, Sage Open)







No releases published