Skip to content
master
Switch branches/tags
Go to file
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 
 
 

README.md

Reddit Clusters

Understanding hateful subreddits through text clustering.

Directory Structure

.
├── clustering
│   ├── results
│   │   ├── CringeAnarchy.txt
│   │   ├── The_Donald.txt
│   │   └── TheRedPill.txt
│   ├── run_nmfs.sh
│   └── tfidf_nmf.py
├── data
│   ├── bigquery
│   │   └── 2017
│   │       └── 11-12
│   │           └── download_data.sh
│   └── stoplist.txt
├── LICENSE
├── README.md
└── wordclouds
    ├── blackdisc.jpeg
    ├── images
    │   ├── CringeAnarchy
    │   │   ├── 0.06%.png
    │   │   ├── 0.12%.png
    │   │   ├── ...
    │   │   ├── ...
    │   ├── The_Donald
    │   │   ├── 0.36%.png
    │   │   ├── 0.51%.png
    │   │   ├── ...
    │   │   ├── ...
    │   └── TheRedPill
    │       ├── 0.56%.png
    │       ├── 0.91%.png
    │       ├── ...
    │       └── ...
    ├── make_wordclouds.ipynb
    └── OCR-A-Std-Regular.ttf

The clustering directory contains all code used to vectorize and cluster the subreddits. tfidf_nmf.py is the main program, and run_nmfs.sh is simply a driver script. The results subdirectory contains the log files when running tfidf_nmf.py on /r/TheRedPill, /r/The_Donald and /r/CringeAnarchy.

The data directory contains download_data.sh (which downloads the Reddit data from my Google Cloud Storage) and stoplist.txt (which includes Reddit-specific words such as "moderator", "karma", etc.).

The wordclouds directory contains images (which contains png files of the wordclouds themselves) and make_wordclouds.ipynb (which generates the wordclouds). Note that 1) blackdisc.jpeg and OCR-A-Std-Regular.ttf are just helper files to create the wordclouds, and 2) the png files are named after their cluster importance (e.g. 0.50%.png is a wordcloud whose cluster has an importance of 0.50%).

About

Understanding hateful subreddits through text clustering

Topics

Resources

License

Releases

No releases published

Packages

No packages published