Skip to content

lukasgebhard/Political-News-Filter

master
Switch branches/tags
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 

Political News Filter

Political News Filter classifies English news articles regarding whether they cover policy topics.

It uses a broad characterization of politics: Politics is about "who gets what, when, and how" (Lasswell, 1936). As a result, Political News Filter may consider business news or tech news as political, depending on actual contents.

Requirements

  • Python 3.6
  • Pandas 0.24.1
  • NumPy 1.18.1
  • Keras 2.3.1
  • TensorFlow 2.1.0

Political News Filter supports both CPU and GPU processing. The latter is faster but requires a CUDA-capable graphics card and the CUDA toolkit.

Setup

  1. Clone this repository:

    $ git clone https://github.com/lukasgebhard/Political-News-Filter.git
    $ cd Political-News-Filter
  2. Download and extract pon_classifier.zip into the repository folder. Its inflated size is 1.2 GB.

  3. Install Python dependencies. For example, create a virtual environment:

    $ virtualenv --python=python3.6 venv
    $ source venv/bin/activate
    $ pip install -r requirements.txt
  4. Verify the installation was successful:

    $ ./check_installation.sh
    Hooray! Political News Filter is properly installed and ready to use.

Usage Demo

Start a Python session:

$ python3

Create exemplary articles:

>>> political_article = '''White House declares war against terror. The US government officially announced a ''' \
                        '''large-scale military offensive against terrorism. Today, the Senate agreed to spend an ''' \
                        '''additional 300 billion dollars on the advancement of combat drones to be used against ''' \
                        '''global terrorism. Opposition members sharply criticize the government. ''' \
                        '''"War leads to fear and suffering. ''' \
                        '''Fear and suffering is the ideal breeding ground for terrorism. So talking about a ''' \
                        '''war against terror is cynical. It's actually a war supporting terror."'''
>>> nonpolitical_article = '''Table tennis world cup 2025 takes place in South Korea. ''' \
                           '''The 2025 world cup in table tennis will be hosted by South Korea, ''' \
                           '''the Table Tennis World Commitee announced yesterday. ''' \
                           '''Three-time world champion, Hu Ho Han, did not pass the qualification round, ''' \
                           '''to the advantage of underdog Bob Bobby who has been playing outstanding matches ''' \
                           '''in the National Table Tennis League this year.'''

To filter a list of news articles, call filter_news:

>>> from political_news_filter import filter_news
>>> political_article == filter_news([political_article, nonpolitical_article])[0]
True

If you need more flexibility, you can directly call the underlying classifier:

>>> from political_news_filter import Classifier
>>> classifier = Classifier()
>>> probabilities = classifier.estimate([political_article, nonpolitical_article])
>>> probabilities[0] > 0.99
True
>>> probabilities[1] < 0.01
True

Please read the docstrings for further information.

Runtime Performance

Below are some benchmarks on a notebook with 6 CPU cores @ 2.6 GHz, a GPU with 4 GB GRAM and CUDA capability 7.5, 32 GB RAM, and a PCIe SSD drive:

Task On CPU On GPU
One-time Initialization 30 sec 15 sec
Classification of 1,000 articles 1.8 sec 1.3 sec

Architecture

The classifier is based on a model by Heng Zheng submitted to Kaggle under the Apache 2.0 license. It is a convolutional neural network with a 100-dimensional GloVe embedding layer, three convolutional layers, each one followed by a ReLu layer and a pooling layer, and finally a softmax output layer. During training, a cross-entropy loss function is minimized using dropout regularization.

Training & Evaluation

I created a labeled set of 0.57M news articles, selected from:

After fitting the classifier on 87.5 % of the articles, testing it on the remaining 12.5 % yields:

  • F1 = 94.4
  • Precision = 95.6
  • Recall = 93.2

How to Cite

If you use Political News Filter, please cite our poster:

@InProceedings{POLUSA,
  author     = {Gebhard, Lukas and Hamborg, Felix},
  title      = {The POLUSA Dataset: 0.9M Political News Articles Balanced by Time and Outlet Popularity},
  year       = {2020},
  month      = {August},
  booktitle  = {Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020 (JCDL '20)},
  venue      = {Virtual event, China},
  publisher  = {Association for Computing Machinery},
  doi        = {10.1145/3383583.3398567}
}

About

A classifier that distinguishes political from non-political news articles.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published