# Understanding Political Bias in News Text Classification

In this notebook we train a BERT classifier on the cleaned Political Bias dataset and inspect its predictions with token-level attributions. We also test a masking strategy to see whether structural phrases influence the model.

## Step 1: Setup

Import the libraries needed for data handling, modeling, and evaluation.

In [1]:
import matplotlib.pyplot as plt
import pandas as pd 
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix, f1_score
from sklearn.utils.class_weight import compute_class_weight
from collections import defaultdict
import numpy as np
from transformers_interpret import SequenceClassificationExplainer
import torch
import torch.nn.functional as F
from transformers import AutoModelForSequenceClassification,TrainingArguments, Trainer, AutoTokenizer
from datasets import Dataset


  from .autonotebook import tqdm as notebook_tqdm
2026-01-05 00:48:07.269248: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2026-01-05 00:48:07.749205: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI AVX512_BF16 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2026-01-05 00:48:08.883637: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.


## Step 2: Data preparation

Load the cleaned dataset and confirm the text and label columns are ready for modeling.

Load the cleaned dataset exported from the preprocessing notebook.

In [2]:
df = pd.read_csv("datasets/Political_cleaned.csv")

Define features (text) and target labels for the 3-class setting.

In [3]:
X = df["Text"]
y = df["Bias_3"]


## Step 3: Model training

Create train/test splits with stratification to keep class balance.

In [4]:

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=13, stratify=y
)


Build train/test DataFrames used by Hugging Face Datasets.

In [5]:
train_df = df.loc[X_train.index][["Text", "Bias_3"]].copy()
test_df  = df.loc[X_test.index][["Text", "Bias_3"]].copy()
 

Define label-id mappings used by the model and trainer.

In [6]:
label2id = {
    "left": 0,
    "center": 1,
    "right": 2
}

id2label = {v: k for k, v in label2id.items()}


In [7]:
train_df["label"] = train_df["Bias_3"].map(label2id)
test_df["label"]  = test_df["Bias_3"].map(label2id)


Convert string labels into numeric ids.

In [8]:
train_df["label"].isna().sum(), test_df["label"].isna().sum()


(0, 0)

Convert Pandas DataFrames to Hugging Face Dataset objects.

In [9]:
train_dataset = Dataset.from_pandas(train_df[["Text", "label"]])
test_dataset  = Dataset.from_pandas(test_df[["Text", "label"]])


In [10]:
train_dataset, test_dataset


(Dataset({
     features: ['Text', 'label', '__index_level_0__'],
     num_rows: 5116
 }),
 Dataset({
     features: ['Text', 'label', '__index_level_0__'],
     num_rows: 1280
 }))

Load the BERT tokenizer.

In [11]:
model_name = "bert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(model_name)


Define a tokenization function with truncation and padding.

In [12]:
MAX_LEN = 512

def tokenize(batch):
    return tokenizer(
        batch["Text"],
        truncation=True,
        padding="max_length",
        max_length=MAX_LEN
    )


Tokenize the datasets and set PyTorch format.

In [13]:
train_dataset = train_dataset.map(tokenize, batched=True)
test_dataset  = test_dataset.map(tokenize, batched=True)


Map: 100%|██████████| 5116/5116 [00:01<00:00, 4164.42 examples/s]
Map: 100%|██████████| 1280/1280 [00:00<00:00, 5484.08 examples/s]


In [14]:
train_dataset = train_dataset.remove_columns(["Text"])
test_dataset  = test_dataset.remove_columns(["Text"])

train_dataset.set_format("torch")
test_dataset.set_format("torch")


In [15]:
train_dataset = train_dataset.remove_columns(["__index_level_0__"])
test_dataset  = test_dataset.remove_columns(["__index_level_0__"])


### BERT fine-tuning

In [16]:
model_name = "bert-base-uncased"

bert_model = AutoModelForSequenceClassification.from_pretrained(
    model_name,
    num_labels=3,
    id2label=id2label,
    label2id=label2id
)


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Define evaluation metrics (macro F1).

In [17]:
def compute_metrics(eval_pred):
    logits, labels = eval_pred
    preds = np.argmax(logits, axis=1)
    return {
        "macro_f1": f1_score(labels, preds, average="macro")
    }


In [18]:
class_weights = compute_class_weight(
    class_weight="balanced",
    classes=np.array([0, 1, 2]),
    y=train_df["label"].values
)

class_weights = torch.tensor(class_weights, dtype=torch.float)
class_weights


tensor([0.4998, 3.5977, 1.3864])

Custom Trainer that applies class-weighted loss.

In [19]:
class WeightedTrainer(Trainer):
    def compute_loss(self, model, inputs, return_outputs=False, **kwargs):
        labels = inputs.get("labels")
        outputs = model(**inputs)
        logits = outputs.get("logits")

        loss_fct = torch.nn.CrossEntropyLoss(
            weight=class_weights.to(logits.device)
        )
        loss = loss_fct(logits, labels)

        return (loss, outputs) if return_outputs else loss



Set training hyperparameters and evaluation strategy.

In [20]:
training_args = TrainingArguments(
    output_dir="./bert_bias",
    eval_strategy="epoch",
    save_strategy="epoch",
    num_train_epochs=5,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    learning_rate=2e-5,
    weight_decay=0.01,
    logging_steps=50,
    load_best_model_at_end=True,
    metric_for_best_model="macro_f1",
    greater_is_better=True,
    report_to="none"
)


Train the BERT model.

In [21]:
trainer = WeightedTrainer(
    model=bert_model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=test_dataset,
    compute_metrics=compute_metrics
)

trainer.train()


Epoch,Training Loss,Validation Loss,Macro F1
1,0.4638,0.494037,0.789393
2,0.4684,0.537777,0.834903
3,0.1678,0.569954,0.845153
4,0.0911,0.791647,0.858719
5,0.0652,0.769442,0.854929


TrainOutput(global_step=3200, training_loss=0.3080393965728581, metrics={'train_runtime': 484.0183, 'train_samples_per_second': 52.849, 'train_steps_per_second': 6.611, 'total_flos': 6730441225482240.0, 'train_loss': 0.3080393965728581, 'epoch': 5.0})

Evaluate on the test set and print the classification report.

In [22]:
preds = trainer.predict(test_dataset)
y_true = preds.label_ids 
y_pred = np.argmax(preds.predictions, axis=1)

print(classification_report(
    y_true,
    y_pred,
    target_names=[id2label[i] for i in range(3)],
    digits=3
))


              precision    recall  f1-score   support

        left      0.928     0.948     0.938       854
      center      0.785     0.712     0.747       118
       right      0.903     0.880     0.891       308

    accuracy                          0.910      1280
   macro avg      0.872     0.847     0.859      1280
weighted avg      0.909     0.910     0.909      1280



In [23]:
bert_model.eval()


BertForSequenceClassification(
  (bert): BertModel(
    (embeddings): BertEmbeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (token_type_embeddings): Embedding(2, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): BertEncoder(
      (layer): ModuleList(
        (0-11): 12 x BertLayer(
          (attention): BertAttention(
            (self): BertSdpaSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e

Compute probabilities and confidence scores from model logits.

In [24]:
def softmax(x):
    exp_x = np.exp(x - np.max(x, axis=1, keepdims=True))
    return exp_x / exp_x.sum(axis=1, keepdims=True)

probs = softmax(preds.predictions)
conf = probs.max(axis=1)
y_pred = np.argmax(probs, axis=1)


Aggregate predictions and confidence into a results table.

In [25]:
results_bert = pd.DataFrame({
    "text": test_df["Text"],   
    "true": [id2label[i] for i in y_true],
    "pred": [id2label[i] for i in y_pred],
    "confidence": conf
})


Helper to sample examples by correctness and confidence.

In [26]:
def select_instances(
    df,
    true_label,
    correct=True,
    conf_min=None,
    conf_max=None,
    n=10
):
    subset = df[df["true"] == true_label]

    if correct is not None:
        if correct:
            subset = subset[subset["true"] == subset["pred"]]
        else:
            subset = subset[subset["true"] != subset["pred"]]

    if conf_min is not None:
        subset = subset[subset["confidence"] >= conf_min]

    if conf_max is not None:
        subset = subset[subset["confidence"] <= conf_max]

    return subset.sort_values("confidence", ascending=False).head(n)


Pick high-confidence correct examples per class.

In [27]:
left_high = select_instances(results_bert, "left", True, 0.75, None, n=10)
right_high = select_instances(results_bert, "right", True, 0.75, None, n=10)
center_high = select_instances(results_bert, "center", True, 0.75, None, n=10)

Quickly inspect example texts and predictions.

In [28]:
print(left_high.iloc[0]["text"][:1000])
print(left_high.iloc[0]["true"])
print(left_high.iloc[0]["pred"])
print("==============================")
print(right_high.iloc[0]["text"][:1000])
print(right_high.iloc[0]["true"])
print(right_high.iloc[0]["pred"]) 



Cardinal Dr. Robert Prevost has officially taken the name Pope Leo XIV, becoming the first American head of the Catholic Church in history. And a tweet Prevost wrote earlier this year about Vice President JD Vance is going viral. In February, Prevost amplified an op-ed published in the National Catholic Reporter in which author Kat Armas criticized the vice president (who converted to Catholicism in 2019) over his remarks suggesting there was a hierarchy of Christian priorities. Vance told Fox News in late January: "There is a Christian concept that you love your family and then you love your neighbor, and then you love your community, and then you love your fellow citizens, and then after that, prioritize the rest of the world. A lot of the far left has completely inverted that." Prevost's tweet repeated the headline of the op-ed: "JD Vance is wrong: Jesus doesn't ask us to rank our love for others." READ MORE: (Opinion) Jasmine Crockett shames Republicans  again "[The Apostle] Paul r

Select low-confidence correct and incorrect cases.

In [29]:
low_conf_correct = results_bert[
    (results_bert["true"] == results_bert["pred"]) &
    (results_bert["confidence"] >= 0.50) &
    (results_bert["confidence"] <= 0.55)
].sort_values("confidence").head(1)

low_conf_wrong = results_bert[
    (results_bert["true"] != results_bert["pred"]) &
    (results_bert["confidence"] >= 0.50) &
    (results_bert["confidence"] <= 0.55)
].sort_values("confidence").head(4)


Select confident errors for analysis.

In [30]:
confident_error = results_bert[
    results_bert["true"] != results_bert["pred"]
].sort_values("confidence", ascending=False).head(4)

In [31]:
print(low_conf_correct.iloc[0]["text"][:1000])
print(low_conf_correct.iloc[0]["true"])
print(low_conf_correct.iloc[0]["pred"])
print(low_conf_correct.iloc[0]["confidence"])



President Donald Trump has never kept his contempt for the Federal Emergency Management Agency a secret, contending that the agency has been operating poorly and rarely helped disaster victims. On Friday, while touring North Carolina neighborhoods that were ravaged by Hurricane Helene, the president said he was planning an executive order that would "begin the process of fundamentally reforming and overhauling FEMA, or maybe getting rid of them." His order would create a task force that would look for reforms, according to sources. However, Trump's authority does not give him the power to terminate the agency unilaterally, according to federal laws. Doing so would require congressional action. FEMA is part of the Department of Homeland Security, which has an operating budget and disaster relief fund that needs to be replenished by Congress every year to help states deal with disaster recovery, preparedness, response and mitigation efforts. The U.S. government uses the funds to reimburs

Assemble a balanced set of candidates for explanation.

In [32]:
bert_candidates = pd.concat([
    left_high.iloc[[0]],
    left_high.iloc[[3]],
    right_high.iloc[[0]],
    right_high.iloc[[1]],
    center_high.iloc[[0]],
    low_conf_correct,
    low_conf_wrong.iloc[[0]],
    confident_error.iloc[[1]]
]).reset_index(drop=True)

bert_candidates

Unnamed: 0,text,true,pred,confidence
0,Cardinal Dr. Robert Prevost has officially tak...,left,left,0.999684
1,Less than one week after Donald Trump's inaugu...,left,left,0.999678
2,"A Phoenix, Arizona, woman confronted and shot ...",right,right,0.999269
3,The Biden administration allegedly discriminat...,right,right,0.999245
4,President Donald Trump campaigned on a promise...,center,center,0.999611
5,President Donald Trump has never kept his cont...,left,left,0.504865
6,Musk steps down from government efficiency role,center,left,0.519047
7,President Donald Trump on Friday fired at leas...,center,left,0.999586


## Step 4: Explainability (BERT)

We approximate token importance using transformers-interpret. Because texts can be long, we split them into overlapping chunks and aggregate token attributions across chunks.

Split long texts into overlapping chunks for attribution.

In [33]:
def chunk_text_with_overlap(text, tokenizer, max_length=512, overlap=50):
    tokens = tokenizer.encode(text, add_special_tokens=False)
    
    chunks = []
    start = 0
    
    while start < len(tokens):
        end = start + max_length
        chunk_tokens = tokens[start:end]
        
        chunk_text = tokenizer.decode(chunk_tokens, skip_special_tokens=True)
        chunks.append(chunk_text)
        
        start += max_length - overlap
    
    return chunks


Aggregate token attributions across chunks.

In [34]:

def explain_bert_instance(
    text,
    explainer,
    tokenizer,
    max_length=400,
    overlap=50,
    top_k=15
):
    chunks = chunk_text_with_overlap(
        text,
        tokenizer,
        max_length=max_length,
        overlap=overlap
    )
    
    all_attributions = defaultdict(list)
    
    for chunk in chunks:
        word_attributions = explainer(chunk)
        #explainer.visualize()
        for token, score in word_attributions:
            all_attributions[token].append(score)
    
    aggregated_attributions = {
        token: np.mean(scores)
        for token, scores in all_attributions.items()
    }
    
    top_tokens = sorted(
        aggregated_attributions.items(),
        key=lambda x: abs(x[1]),
        reverse=True
    )[:top_k]
    
    return top_tokens


Generate token attributions for each selected example.

In [35]:

explainer = SequenceClassificationExplainer(
    bert_model,
    tokenizer
)

for i in range(len(bert_candidates)):
    row = bert_candidates.iloc[i]
    
    text = row["text"]
    true_label = row["true"]
    pred_label = row["pred"]
    conf = row["confidence"]
    
    print("=" * 70)
    print(f"INSTANCE {i+1}")
    print(f"TRUE: {true_label} | PRED: {pred_label} | CONF: {conf:.2f}")
    
    top_tokens = explain_bert_instance(
        text,
        explainer,
        tokenizer
    )
    
    print("Top tokens:")
    for tok, score in top_tokens:
        print(f"{tok:15s} {score:+.3f}")


Token indices sequence length is longer than the specified maximum sequence length for this model (685 > 512). Running this sequence through the model will result in indexing errors


INSTANCE 1
TRUE: left | PRED: left | CONF: 1.00
Top tokens:
robert          +0.182
head            -0.168
author          +0.104
family          +0.101
earlier         -0.097
d               -0.095
february        +0.078
news            -0.077
history         +0.076
national        +0.074
he              -0.073
uk              +0.072
reporter        +0.067
2019            +0.062
stop            -0.057
INSTANCE 2
TRUE: left | PRED: left | CONF: 1.00
Top tokens:
one             +0.176
building        -0.104
of              +0.104
week            +0.100
joseph          +0.097
"               +0.089
than            -0.076
president       +0.075
signed          +0.060
127             +0.050
donald          +0.047
has             +0.046
office          +0.043
martin          +0.041
rhode           +0.041
INSTANCE 3
TRUE: right | PRED: right | CONF: 1.00
Top tokens:
shot            +0.237
##eit           +0.140
br              +0.125
inside          +0.109
female          +0.061
winning      

## Step 5: Masking structural phrases

To mimic the earlier LIME-based analysis, we mask common structural phrases (e.g., 'READ MORE') and compare predicted probabilities. If confidence drops sharply after masking, the model may be relying on structure rather than content.

In [36]:

def bert_predict_proba(texts, model, tokenizer, max_length=512):
    device = next(model.parameters()).device
    enc = tokenizer(
        texts,
        truncation=True,
        padding=True,
        max_length=max_length,
        return_tensors="pt"
    )
    enc = {k: v.to(device) for k, v in enc.items()}
    with torch.no_grad():
        logits = model(**enc).logits
        probs = F.softmax(logits, dim=-1)
    return probs.cpu().numpy()

def mask_structure(text):
    for phrase in ["READ MORE:", "Read More", "read more", "Olivia Rondeau is a politics reporter for Breitbart News based in Washington, DC. Find her on X/Twitter and Instagram.", "Breitbart News", "RELATED STORIES", "Learn more about our emails. Unsubscribe anytime. By entering your email, you agree to the Terms & Conditions and acknowledge the Privacy Policy.", "Download the SAN app today to stay up-to-date with Unbiased. Straight Facts. Point phone camera here"]:
        text = text.replace(phrase, "[UNK]")
    return text

class_names = [id2label[i] for i in range(3)]
examples = {
    "left": left_high.iloc[0]["text"],
    "right": right_high.iloc[0]["text"],
    "center": center_high.iloc[0]["text"],
}

for label, text in examples.items():
    masked = mask_structure(text)
    probs = bert_predict_proba([text, masked], bert_model, tokenizer)
    print("=" * 70)
    print(masked)
    print(f"Example: {label}")
    print("Original:", {cls: f"{p:.3f}" for cls, p in zip(class_names, probs[0])})
    print("Masked:  ", {cls: f"{p:.3f}" for cls, p in zip(class_names, probs[1])})


Cardinal Dr. Robert Prevost has officially taken the name Pope Leo XIV, becoming the first American head of the Catholic Church in history. And a tweet Prevost wrote earlier this year about Vice President JD Vance is going viral. In February, Prevost amplified an op-ed published in the National Catholic Reporter in which author Kat Armas criticized the vice president (who converted to Catholicism in 2019) over his remarks suggesting there was a hierarchy of Christian priorities. Vance told Fox News in late January: "There is a Christian concept that you love your family and then you love your neighbor, and then you love your community, and then you love your fellow citizens, and then after that, prioritize the rest of the world. A lot of the far left has completely inverted that." Prevost's tweet repeated the headline of the op-ed: "JD Vance is wrong: Jesus doesn't ask us to rank our love for others." [UNK] (Opinion) Jasmine Crockett shames Republicans  again "[The Apostle] Paul remind

### Token attributions on masked text

We can also run the interpreter on the masked version to see which tokens are most influential after removing structural phrases. This helps compare how the model shifts its attention.

In [37]:
def explain_with_mask(text, explainer, tokenizer, max_length=400, overlap=50, top_k=15):
    masked = mask_structure(text)
    orig_top = explain_bert_instance(text, explainer, tokenizer, max_length=max_length, overlap=overlap, top_k=top_k)
    masked_top = explain_bert_instance(masked, explainer, tokenizer, max_length=max_length, overlap=overlap, top_k=top_k)
    return masked, orig_top, masked_top

examples = {
    "left": left_high.iloc[0]["text"],
    "right": right_high.iloc[0]["text"],
    "center": center_high.iloc[0]["text"],
}

for label, text in examples.items():
    masked_text, orig_top, masked_top = explain_with_mask(text, explainer, tokenizer)
    print("=" * 70)
    print(f"Example: {label}")
    print("Top tokens (original):")
    for tok, score in orig_top:
        print(f"{tok:15s} {score:+.3f}")
    print("Top tokens (masked):")
    for tok, score in masked_top:
        print(f"{tok:15s} {score:+.3f}")


Example: left
Top tokens (original):
robert          +0.182
head            -0.168
author          +0.104
family          +0.101
earlier         -0.097
d               -0.095
february        +0.078
news            -0.077
history         +0.076
national        +0.074
he              -0.073
uk              +0.072
reporter        +0.067
2019            +0.062
stop            -0.057
Top tokens (masked):
jasmine         +0.250
opinion         +0.147
uk              +0.122
him             +0.118
(               +0.100
republicans     -0.097
if              +0.087
trade           -0.080
speech          -0.078
by              +0.076
business        -0.073
michael         +0.072
left            +0.070
##uche          +0.069
c               +0.068
Example: right
Top tokens (original):
shot            +0.237
##eit           +0.140
br              +0.125
inside          +0.109
female          +0.061
winning         +0.060
award           +0.054
##bar           +0.054
found           +0.051
phoenix