TruthAI -- News Article Political Bias Analyzer

Analyzes news articles for political bias using a fine-tuned BERT neural network. Paste text or provide a URL the app reviews the article and classifies it on a left-right political spectrum.

Works on macOS, Windows, and Linux.

How It Works

TruthAI uses a model called TruthNN: a BERT transformer fine-tuned on ~8,000 politically labeled news articles. When you submit an article, the model:

Tokenizes the text into subword pieces using BERT's WordPiece vocabulary
Encodes the tokens through 12 transformer layers, building contextual representations of each word
Classifies the article using a regression head on top of BERT's [CLS] token embedding
Estimates confidence by running 30 forward passes with dropout enabled (Monte Carlo Dropout) and measuring prediction variance
Identifies influential words using gradient-based attribution: computing how much each token pushes the score left or right

The output is a bias score from -10 (far-left) to +10 (far-right), a confidence percentage, and a list of the most influential words in the article.

Setup

pip install -r requirements.txt

Download the trained model from Hugging Face and place truthnn.pth in the project root.

Usage

1. Prepare Training Data

Pull labeled data from Hugging Face:

# AllSides dataset (~5K articles, editor-labeled per article)
python data_preparation.py --allsides

# BABE dataset (expert-annotated sentences)
python data_preparation.py --babe

# Both at once
python data_preparation.py --all

Merge local CSV files into the combined dataset (automatically deduplicates):

python data_preparation.py new_articles.csv

# Deduplicate the existing combined file
python data_preparation.py

2. Train the Model

python train_bert_improved.py combined_full_articles_training.csv

Auto-detects the best available device:

NVIDIA GPU (CUDA): fastest
Apple Silicon (MPS): fast on M1/M2/M3/M4 Macs
CPU: works everywhere, uses half your cores to keep the machine usable

Training runs for up to 20 epochs with early stopping. Saves the best model to truthnn.pth.

3. Run the Web App

python app.py

Open http://localhost:5000 enter a URL and/or paste article text to analyze. (More reliable looking at a whole article)

JSON API

# Analyze by URL
curl -X POST http://localhost:5000/analyze \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com/article"}'

# Analyze by text
curl -X POST http://localhost:5000/analyze \
  -H "Content-Type: application/json" \
  -d '{"text": "Article text here..."}'

Example response:

{
  "coordinates": {"x": -4.52},
  "confidence": 78.3,
  "classification": "Left",
  "model_type": "TruthNN",
  "influential_words": {
    "left": ["inequality", "reform", "workers"],
    "right": ["taxes", "enforcement"]
  }
}

Model Details

Architecture

Input Text
    |
BERT Tokenizer (WordPiece, max 128 tokens)
    |
BERT Base (12 layers, 768-dim hidden, 110M params)
    |
[CLS] Token Embedding (768-dim)
    |
Dropout (0.3) → Linear (768 → 256) → ReLU
    |
Dropout (0.3) → Linear (256 → 128) → ReLU
    |
Linear (128 → 1) → Tanh × 10
    |
Bias Score (-10 to +10)

Training

Base model: bert-base-uncased (Google, 110M parameters)
Training data: ~8,000 articles from AllSides, BABE, and manually collected sources
Loss function: Mean Squared Error (regression)
Optimizer: AdamW (lr=2e-5, weight decay=0.01)
Balanced sampling: Inverse-frequency weighting so underrepresented bias categories get more training time
Data augmentation: Random truncation of 20% of training samples to improve robustness on partial articles
Early stopping: Patience of 5 epochs on validation loss

Bias Scale

Score	Classification
-10 to -6	Far-Left
-6 to -3	Left
-3 to +3	Center
+3 to +6	Right
+6 to +10	Far-Right

Dataset Format

text,bias_x
"Article text here...",-6.0
"Another article...",4.0

Or provide text and source columns... known sources are auto-mapped to bias scores:

text,source
"Article text here...",cnn
"Another article...",foxnews

Project Structure

python_nn_news_bias/
├── app.py                              # Flask web app (URL scraping + classification)
├── truthnn.py                       # TruthNN model definition + inference
├── data_preparation.py                 # Dataset loading, merging, Hugging Face import
├── train_bert_improved.py              # Training script (CUDA / MPS / CPU)
├── combined_full_articles_training.csv # Training dataset (~8K articles)
├── templates/
│   └── index.html                      # Web UI
├── requirements.txt                    # Dependencies
└── README.md

Limitations

Most training labels are source-based (for efficiency in training) every article from the same outlet gets the same score regardless of content (EX: A News Oppinion vs another A News article)
Political bias is inherently subjective and multidimensional; a single left-right score is a simplification
URL scraping may not work on paywalled or JavaScript-rendered sites
Confidence estimates can be miscalibrated on out-of-distribution text (e.g., non-English articles, opinion columns, satire)
Per-article human labels would significantly improve accuracy

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TruthAI -- News Article Political Bias Analyzer

How It Works

Setup

Usage

1. Prepare Training Data

2. Train the Model

3. Run the Web App

JSON API

Model Details

Architecture

Training

Bias Scale

Dataset Format

Project Structure

Limitations

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
templates		templates
README.md		README.md
TruthAI.png		TruthAI.png
app.py		app.py
combined_full_articles_training.csv		combined_full_articles_training.csv
data_preparation.py		data_preparation.py
requirements.txt		requirements.txt
train_bert_improved.py		train_bert_improved.py
truthnn.py		truthnn.py

Folders and files

Latest commit

History

Repository files navigation

TruthAI -- News Article Political Bias Analyzer

How It Works

Setup

Usage

1. Prepare Training Data

2. Train the Model

3. Run the Web App

JSON API

Model Details

Architecture

Training

Bias Scale

Dataset Format

Project Structure

Limitations

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages