Skip to content
Topic Analysis, Constructiveness and Toxicity for online articles and comments
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.


Topic Analysis, Constructiveness and Toxicity for online articles and comments

Project Overview

We set out to investigate trends in the toxicity, constructiveness and topics of online comments. Online comments by themselves are interesting but it also helps to look at them in context - we do this using information about the articles on which these comments appeared.

To find these trends, we use various machine-learning-based systems and approaches. For constructiveness, we use CHECK::REFERENCE, a constructiveness system developed at Simon Fraser University by Dr Maite Taboada and Dr Varada Kolhatkar. For toxicity, we use Google's Perspective API. For topic modelling, we settled on creating a Latent Dirichlet Allocation (LDA) model, the details of which are discussed in subsequent sections.

The results of this project will provide insights to online news publications that monitor the comment sections on their websites.

Key findings

  1. Non-constructive and non-toxic comments make up the majority of online comments
  2. Non-constructive comments are more common than constructive comments
  3. Constructive comments tend to almost always contain a small proportion of toxicity
  4. The most common topics discussed in articles in The Globe and Mail are related to politics (global, national and regional)
  5. The proportions of the topics discussed in comments correlate directly with those of articles
  6. People comment more about politics than other topics, but they bring in personal experience and anecdotes when they do so
  7. There is a higher degree of constructiveness in the comments relating to topics about which more articles are written
  8. Toxicity in comments is not higher in certain topics over others; it is likely a fixed feature of online language

Folder Structure

  • doc: Contains documentation on the methods and the results
  • src: Contains all the code for this project - preprocessing, getting constructiveness and toxicity predictions, topic modelling, and visualizations
  • img: Contains images, some generated by code in this project, others generated using external websites


Vasundhara Gautam (

Maite Taboada (

You can’t perform that action at this time.