Overview

This repository provides replication data and code for the following paper:

@article{wihbey_exploring_2017,
  title = {The social silos of journalism? Twitter, news media and partisan segregation,
  archivePrefix = {arXiv},
  eprinttype = {arxiv},
  eprint = {1708.06727},
  primaryClass = {cs},
  journal = {arXiv:1708.06727 [cs]},
  author = {Wihbey, John and Coleman, Thalita Dias and Joseph, Kenneth and Lazer, David},
  month = aug,
  year = {2017},
  keywords = {Computer Science - Social and Information Networks}
}

The primary file for replication is analysis.R

In order to keep the data anonymous, the only data provided are anonymized data input to create figures in the text and to run the regression model. If you would like to replicate other parts of the paper that involve additional, deanonymized data, please contact the authors. While those we study are public figures, and thus such data can potentially be made available, we do not wish to make results that could be used against individuals fully public.

Twitter Ideology Method

We provide the code used to create our Twitter ideology score, although note that the data required are only available from the authors (as noted above). Given this data, results can be replicated as follows:

Enter into the data directory and untar heavy_user_friends.tgz and reporter_friends.tgz.
Download the follower data for Congresspeople, put it into the data directory, and untar it. From the command line, you can run steps 1 and 2 as follows:

cd data
tar -xzvf heavy_user_friends.tgz
tar -xzvf reporter_friends.tgz
wget https://www.dropbox.com/s/y5hfrgah0ldcei7/congress_followers.tgz?dl=0
mv congress_followers.tgz?dl=0 congress_followers.tgz
tar -xzvf congress_followers.tgz

Run twitter_ideology_method.ipynb

News Ideology Method

We also cannot release the newspaper data we collected. However, we do provide the script newspaper.py, which shows how, given a list of articles from an author (extracted from MuckRack), we preprocess the data for input into our method. We then run news_ideology_method.ipynb to generate the text-based ideology score.

Putting it all together to replicate paper

Run analysis.R. Note that in order to compare our results to the work from Bakshy et al., you will have to request the file top500.tab from their Dataverse repository, rename it to top500.csv, and put it into the data directory.

More info on `data` directory

We collected bill sponsorship data from GovTrack and congressional social media data from the awesome congress-legislators github respository using the following commands:

wget https://www.govtrack.us/data/us/115/stats/sponsorshipanalysis_h.txt
wget https://www.govtrack.us/data/us/115/stats/sponsorshipanalysis_s.txt
wget https://github.com/unitedstates/congress-legislators/blob/master/legislators-current.yaml
wget https://github.com/unitedstates/congress-legislators/blob/master/legislators-social-media.yaml

We used the twitter_dm Github library to collect basic information about the Twitter accounts of our journalist accounts and our heavy political users. This data is in data/basic_twitter_info.tsv and data/heavy_pol_basic_twitter_info.tsv. This was done during May of 2017.
We also used twitter_dm to collect the followers of Congressional accounts and the friends of the heavy political users and journalists. This was also done during May of 2017 (these are the tar files from Steps 1. and 2. above)
The file data/org_info.tsv provides hand-constructed information on the news organizations we considered for this study, plus several others.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.ipynb_checkpoints		.ipynb_checkpoints
data		data
.gitignore		.gitignore
Readme.md		Readme.md
analysis.R		analysis.R
anonymized_nms_journalist.Rproj		anonymized_nms_journalist.Rproj
news_ideology_method.ipynb		news_ideology_method.ipynb
newspaper.py		newspaper.py
twitter_ideology_method.ipynb		twitter_ideology_method.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.ipynb_checkpoints

.ipynb_checkpoints

data

data

.gitignore

.gitignore

Readme.md

Readme.md

analysis.R

analysis.R

anonymized_nms_journalist.Rproj

anonymized_nms_journalist.Rproj

news_ideology_method.ipynb

news_ideology_method.ipynb

newspaper.py

newspaper.py

twitter_ideology_method.ipynb

twitter_ideology_method.ipynb

Repository files navigation

Overview

Twitter Ideology Method

News Ideology Method

Putting it all together to replicate paper

More info on `data` directory

About

Releases

Packages

Languages

kennyjoseph/anonymized_nms_journalist

Folders and files

Latest commit

History

Repository files navigation

Overview

Twitter Ideology Method

News Ideology Method

Putting it all together to replicate paper

More info on data directory

About

Resources

Stars

Watchers

Forks

Languages

More info on `data` directory