RedBlue

A political language classifier for news articles

Here is a presentation that provides a high-level overview of the project.

Here is a report that goes over the entire project in detail.

Quick Start

This quick start is intended to help you replicate our process.

Clone the repository:

$ git clone https://github.com/samgoodgame/RedBlue.git
$ cd redblue

Create a virtualenv and install the dependencies:

$ virtualenv venv
$ source venv/bin/activate
$ pip install -r requirements.txt

Normally, you'd need to run the dem_parse.py and rep_parse.py scripts to pull the training data from the internet and parse it into useable form. Since this repository includes the training data in the /data/debate_data/ directory, you don't need to run these scripts.
Build the models by running the classification script. Make sure that you modify the script to pickle the models into the right directory (modify the paths in lines 68, 357, 365, and 371).

$ cd scripts
$ python classify_svm.py

You'll receive a number of different results as your output. The most important number is the last one, which is the accuracy of the SVM model.

Classify the RSS data. You'll need to go into predict.py and adjust the path to the dataset (news source) that you wish to analyze, and you'll also need to make sure the script is pulling the pickled models from the right directory (modify the paths in lines 51 and 81).

$ python predict.py

Your results will appear in your CLI. To see results for each news source, simply redirect the classify_svm.py script to run in each news source's directory, under /data/sources/text/.

About

RedBlue is a political language classifier for news articles. It trains a Support Vector Machine (SVM) algorithm using training data from the 2016 Democratic and Republican presidential primary debates. It then uses Baleen to ingest RSS feeds into MongoDB, parse the feeds, remove stop words, and vectorize the data.

Once the RSS data is in the proper format (a sparse matrix with words as features and documents as instances), we pass it to our fitted model, which predicts if articles are "red" (Republican) or "blue" (Democratic).

Attribution

We generated our word cloud from an open-source Python word cloud package. The words are from Democratic and Republican presidential primary debates.

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
Scripts		Scripts
baleen		baleen
data		data
models		models
results		results
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RedBlue

Quick Start

About

Attribution

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RedBlue

Quick Start

About

Attribution

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages