GitHub - soliblue/Reddit-Politics: Code for a large-scale analysis of political subcommunities on Reddit, including content, relationships with other subreddits, and distribution of attention. Based on a dataset of over 100 million posts from around 5 million users.

A Characterization of Political Communities on Reddit

This repository contains the code and results of our paper "A Characterization of Political Communities on Reddit".

In order to run the code you need to download the Reddit data first. The data can be downloaded from: https://files.pushshift.io/reddit/
The path for the downloaded data should then be updated in all notebooks.
The files and folders need to run in the correct order: 01 before 02 etc.

Required Libraries

numpy
fastText
nltk
plotly
matplotlib
gzip
pickle
vaderSentiment
sklearn

Results

The results are stored in two folders.

Files > 100MB are only available on google drive: https://drive.google.com/drive/folders/1VetvLETa-9_jao9ihAtZjwvc4ofI8xrO?usp=sharing.

Plots: The plots folder contains the plots and tables generated.
Results: The Results folder contains the results of the calculations.
- Links
  - comment_links.pickle.gz
    - stores information about the frequencies of all top level domains posted on the studied subreddits.
    - Example: comment_links['The_Donald'] returns all domains posted on The_Donald alongside their frequency.
    - This is a Counter object so one can type comment_links['The_Donald'].most_common(100) to get the top 100 shared domains.
  - comment_full_links.pickle.gz
    - stores information about the frequencies of each link posted on studied subreddits.
    - Example: comment_full_links['politics'] returns all links posted on politics alongside their frequency.
    - This is a Counter object so one can type comment_links['politics'].most_common(100) to get the top 100 shared links.
- Word Frequencies
  - word_freq.pickle.gz
    - stores information about the relative count of all words posted on the studied subreddits.
    - Example: word_freq['The_Donald'] returns all words posted on The_Donald alongside their frequency.
    - This is a Counter object so one can type word_freq['The_Donald'].most_common(100) to get the top 100 shared words.
  - word_freq_unique.pickle.gz
    - stores information about the relative diff count of all words posted on the studied subreddits.
- Sentiment
  - sentiment_The_Donald.pickle.gz: contains sentiment score for all comments posted on The_Donald
  - sentiment_politics.pickle.gz: contains sentiment score for all comments posted on politics
  - ...
  - sentiment_altright.pickle.gz: contains sentiment score for all comments posted on politics
- WordEmbeddings
  - We have the word embeddings for all studied subreddits.
- Subreddit Embeddings
  - We have subreddit embeddings calcualted as described in this article: https://fivethirtyeight.com/features/dissecting-trumps-most-rabid-online-following/.

Remarks

We are still cleaning the code and results to make it easier to understand and reuse.

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
01_Content_Analysis		01_Content_Analysis
02_Cross_Reddit_Activity		02_Cross_Reddit_Activity
03_General		03_General
Plots		Plots
Results		Results
Unsctructured_Code		Unsctructured_Code
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A Characterization of Political Communities on Reddit

Required Libraries

Results

About

Releases

Packages

Languages

soliblue/Reddit-Politics

Folders and files

Latest commit

History

Repository files navigation

A Characterization of Political Communities on Reddit

Required Libraries

Results

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages