Skip to content

A classifier that distinguishes political from non-political news articles.

License

Notifications You must be signed in to change notification settings

fhamborg/Political-News-Filter

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Political News Filter

Political News Filter classifies English news articles based on whether they cover policy topics.

It uses a broad characterization of politics: Politics is about "who gets what, when, and how" (Lasswell, 1936). As a result, Political News Filter may consider business news or tech news as political, depending on actual contents.

Setup

  1. Clone this repository
  2. conda create --yes -n polnewsfilter python=3.7
    conda activate polnewsfilter 
    conda install --yes pandas
    conda install --yes -c conda-forge keras
    
  3. Download and extract pon_classifier.zip into the repository folder. Its inflated size is 1.2 GB.

Usage

Start a Python session:

$ python3

Create exemplary articles:

>>> political_article = '''White House declares war against terror. The US government officially announced a ''' \
                        '''large-scale military offensive against terrorism. Today, the Senate agreed to spend an ''' \
                        '''additional 300 billion dollars on the advancement of combat drones to be used against ''' \
                        '''global terrorism. Opposition members sharply criticize the government. ''' \
                        '''"War leads to fear and suffering. ''' \
                        '''Fear and suffering is the ideal breeding ground for terrorism. So talking about a ''' \
                        '''war against terror is cynical. It's actually a war supporting terror."'''
>>> nonpolitical_article = '''Table tennis world cup 2025 takes place in South Korea. ''' \
                           '''The 2025 world cup in table tennis will be hosted by South Korea, ''' \
                           '''the Table Tennis World Commitee announced yesterday. ''' \
                           '''Three-time world champion, Hu Ho Han, did not pass the qualification round, ''' \
                           '''to the advantage of underdog Bob Bobby who has been playing outstanding matches ''' \
                           '''in the National Table Tennis League this year.'''

To filter a list of news articles, call filter_news:

>>> from political_news_filter import filter_news
>>> political_article == filter_news([political_article, nonpolitical_article])[0]
True

If you need more flexibility, you can directly call the underlying classifier:

>>> from political_news_filter import Classifier
>>> classifier = Classifier()
>>> probabilities = classifier.estimate([political_article, nonpolitical_article])
>>> probabilities[0] > 0.99
True
>>> probabilities[1] < 0.01
True

Please read the docstrings for further information.

Runtime Performance

Below are some benchmarks on a notebook with 6 CPU cores @ 2.6 GHz, a GPU with 4 GB GRAM and CUDA capability 7.5, 32 GB RAM, and a PCIe SSD drive:

Task On CPU On GPU
One-time Initialization 30 sec 15 sec
Classification of 1,000 articles 1.8 sec 1.3 sec

Architecture

The classifier is based on a model by Heng Zheng submitted to Kaggle under the Apache 2.0 license. It is a convolutional neural network with a 100-dimensional GloVe embedding layer, three convolutional layers, each one followed by a ReLu layer and a pooling layer, and finally a softmax output layer. During training, a cross-entropy loss function is minimized using dropout regularization.

Training & Evaluation

I created a labeled set of 0.57M news articles, selected from:

After fitting the classifier on 87.5 % of the articles, testing it on the remaining 12.5 % yields:

  • F1 = 94.4
  • Precision = 95.2
  • Recall = 91.8

About

A classifier that distinguishes political from non-political news articles.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%