Skip to content

This repository auto-configures an Apache Pinot and Superset cluster for analyzing IRA tweets from FiveThirtyEight.

License

Notifications You must be signed in to change notification settings

imranansari/russian-troll-analysis

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Time Series Analysis of Russian IRA Tweets

This repository contains a recipe for bootstrapping a project that does time series analysis on tweets from the Internet Research Agency (IRA) open sourced by FiveThirtyEight. The analysis in this project is bootstrapped using Apache Pinot and Superset.

Warning!

This dataset contains some of the most offensive and toxic text I've ever seen. The tweets contained within the original dataset attempted to hide or obscure the ideological nature of text that the trolls intended to bleed into mainstream media.

The raw text of tweets contained within the dataset will elicit an emotional response, as it was designed to do, and as such, I do not recommend exposing the raw text to any reader without providing this warning.

Usage

The example application in this repository bootstraps an Apache Pinot recipe for importing tweets by fake IRA Twitter accounts for analysis with Apache Superset.

To start the cluster, run the following commands.

$ docker network create PinotNetwork
$ docker-compose up -d
$ docker-compose logs -f --tail=100

After the Docker containers have started and are running, you'll need to bootstrap the cluster with the Twitter data and charts. The following command will download the raw CSV data from this repository and start the Pinot ingestion job.

$ sh ./bootstrap.sh

After the bootstrap script has completed, you should be able to see data in Apache Pinot and be able to login to the Superset website. After logging into Superset, navigate to the dashboards to view the time series analysis of the IRA tweets.

Example Dashboard

The screenshot below is the default dashboard that comes with the example project.

Superset Russian Troll Dashboard

About

This repository auto-configures an Apache Pinot and Superset cluster for analyzing IRA tweets from FiveThirtyEight.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Shell 72.5%
  • Python 27.5%