Skip to content
Pre-release

This dataset contains z^2 scores for all possible pairs of 4,924 subreddits (these subreddits had more than 500 comments made by more than 100 unique users in the month of June 2016).

Each file in the double_z folder corresponds to one subreddit and each line in a file is formatted as follows:
anchor_subreddit pair_subreddit z^2_score author_similarity_rank phrase_similarity_rank raw_author_similarity raw_phrase_similarity

The graph (.gml) files contain author and term similarity networks where nodes are subreddits and edges are determined by author and term similarity respectively.
Each graph has the following node attributes:
id, name (subreddit name), modularity (community id determined by multilevel algorithm) and subcount (subscriber count of the subreddit as per June 2016).

If you make use of this dataset, please cite the following:
Srayan Datta, Chanda Phelan, and Eytan Adar. 2017. Identifying Misaligned Inter-Group Links and Communities. Proc. ACM Hum.-Comput. Interact. 1, 2, Article 37 (November 2017), 23 pages.
https://doi.org/10.1145/3134672

Assets 3
You can’t perform that action at this time.