# Understanding Political Bias in News Text Classification

This notebook trains a BERT classifier on MBIC and compares baseline predictions to a masked-text variant to test reliance on explicitly biased terms.

## Step 1: Setup

Import libraries for data handling, modeling, and explainability.

In [1]:
import matplotlib.pyplot as plt 
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix, f1_score
from sklearn.utils.class_weight import compute_class_weight
import torch
from transformers import AutoModelForSequenceClassification, Trainer, TrainingArguments, AutoTokenizer
from collections import defaultdict
from datasets import Dataset
from transformers_interpret import SequenceClassificationExplainer


  from .autonotebook import tqdm as notebook_tqdm
2026-01-03 23:50:44.776270: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2026-01-03 23:50:44.806649: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI AVX512_BF16 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2026-01-03 23:50:45.566476: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.


## Step 2: Data loading and split

Load the cleaned MBIC dataset.

In [2]:
df = pd.read_excel("datasets/labeled_dataset_cleaned.xlsx")

Split the dataset into train/test with stratification.

In [3]:
X_train, X_test, y_train, y_test = train_test_split(
    df["sentence"],
    df["type"],
    test_size=0.2,
    stratify=df["type"],
    random_state=42
)


Create train/test DataFrames aligned with the split indices.

In [4]:
train_df = df.loc[X_train.index].copy()
test_df  = df.loc[X_test.index].copy()

Define label-id mappings for the three classes.

In [5]:
label2id = {
    "left": 0,
    "center": 1,
    "right": 2
}

id2label = {v: k for k, v in label2id.items()}


Map string labels to numeric ids.

In [6]:
train_df["label"] = train_df["type"].map(label2id)
test_df["label"]  = test_df["type"].map(label2id)

Convert Pandas DataFrames to Hugging Face Datasets.

In [7]:
train_dataset = Dataset.from_pandas(train_df[["sentence", "label"]])
test_dataset  = Dataset.from_pandas(test_df[["sentence", "label"]])

Inspect the dataset objects (optional sanity check).

In [8]:
train_dataset, test_dataset


(Dataset({
     features: ['sentence', 'label', '__index_level_0__'],
     num_rows: 782
 }),
 Dataset({
     features: ['sentence', 'label', '__index_level_0__'],
     num_rows: 196
 }))

## Step 3: Tokenization and dataset preparation

Load the BERT tokenizer.

In [9]:
model_name = "bert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(model_name)


Define the tokenization function and maximum sequence length.

In [10]:
MAX_LEN = 512

def tokenize(batch):
    return tokenizer(
        batch["sentence"],
        truncation=True,
        padding="max_length",
        max_length=MAX_LEN
    )


Tokenize the train/test datasets.

In [11]:
train_dataset = train_dataset.map(tokenize, batched=True)
test_dataset  = test_dataset.map(tokenize, batched=True)


Map: 100%|██████████| 782/782 [00:00<00:00, 7553.74 examples/s]
Map: 100%|██████████| 196/196 [00:00<00:00, 8012.28 examples/s]


Remove text columns and set dataset format to PyTorch.

In [12]:
train_dataset = train_dataset.remove_columns(["sentence"])
test_dataset  = test_dataset.remove_columns(["sentence"])

train_dataset.set_format("torch")
test_dataset.set_format("torch")  


Remove index columns added during conversion.

In [13]:
train_dataset = train_dataset.remove_columns(["__index_level_0__"])
test_dataset  = test_dataset.remove_columns(["__index_level_0__"])


## Step 4: BERT training

Initialize the BERT classifier.

In [14]:

model_name = "bert-base-uncased" 

bert_model = AutoModelForSequenceClassification.from_pretrained(
    model_name,
    num_labels=3,
    id2label=id2label,
    label2id=label2id
)


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Define evaluation metrics (macro F1).

In [15]:

def compute_metrics(eval_pred):
    logits, labels = eval_pred
    preds = np.argmax(logits, axis=1)
    return {
        "macro_f1": f1_score(labels, preds, average="macro")
    }


Compute class weights to address imbalance.

In [16]:

class_weights = compute_class_weight(
    class_weight="balanced",
    classes=np.array([0, 1, 2]),
    y=train_df["label"].values
)

class_weights = torch.tensor(class_weights, dtype=torch.float)
class_weights


tensor([0.6896, 5.2133, 0.7363])

Create a custom Trainer with class-weighted loss.

In [17]:

class WeightedTrainer(Trainer):
    def compute_loss(self, model, inputs, return_outputs=False, **kwargs):
        labels = inputs.get("labels")
        outputs = model(**inputs)
        logits = outputs.get("logits")

        loss_fct = torch.nn.CrossEntropyLoss(
            weight=class_weights.to(logits.device)
        )
        loss = loss_fct(logits, labels)

        return (loss, outputs) if return_outputs else loss



Set training hyperparameters and evaluation strategy.

In [18]:

training_args = TrainingArguments(
    output_dir="./bert_bias",
    eval_strategy="epoch",
    save_strategy="epoch",
    num_train_epochs=5,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    learning_rate=2e-5,
    weight_decay=0.01,
    logging_steps=50,
    load_best_model_at_end=True,
    metric_for_best_model="macro_f1",
    greater_is_better=True,
    report_to="none"
)


Train the model.

In [19]:
trainer = WeightedTrainer(
    model=bert_model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=test_dataset,
    compute_metrics=compute_metrics
)

trainer.train()


Epoch,Training Loss,Validation Loss,Macro F1
1,1.116,1.058791,0.373476
2,0.95,1.118879,0.409931
3,0.6184,1.148632,0.486962
4,0.3679,1.240505,0.447467
5,0.2656,1.311401,0.451178


TrainOutput(global_step=490, training_loss=0.657240999961386, metrics={'train_runtime': 79.9133, 'train_samples_per_second': 48.928, 'train_steps_per_second': 6.132, 'total_flos': 1028773463316480.0, 'train_loss': 0.657240999961386, 'epoch': 5.0})

Evaluate the model on the test set and print metrics.

In [20]:
preds = trainer.predict(test_dataset)
y_true = preds.label_ids
y_pred = np.argmax(preds.predictions, axis=1)

print(classification_report(
    y_true,
    y_pred,
    target_names=[id2label[i] for i in range(3)],
    digits=3
))


              precision    recall  f1-score   support

        left      0.682     0.632     0.656        95
      center      0.250     0.077     0.118        13
       right      0.635     0.750     0.688        88

    accuracy                          0.648       196
   macro avg      0.522     0.486     0.487       196
weighted avg      0.632     0.648     0.634       196



Switch to evaluation mode.

In [21]:
bert_model.eval()


BertForSequenceClassification(
  (bert): BertModel(
    (embeddings): BertEmbeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (token_type_embeddings): Embedding(2, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): BertEncoder(
      (layer): ModuleList(
        (0-11): 12 x BertLayer(
          (attention): BertAttention(
            (self): BertSdpaSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e

## Step 5: Evaluation and sampling

Convert logits to probabilities and compute confidence scores.

In [22]:

def softmax(x):
    exp_x = np.exp(x - np.max(x, axis=1, keepdims=True))
    return exp_x / exp_x.sum(axis=1, keepdims=True)

probs = softmax(preds.predictions)
conf = probs.max(axis=1)
y_pred = np.argmax(probs, axis=1)


Build a results table with predictions and confidence.

In [23]:
results_bert = pd.DataFrame({
    "text": test_df["sentence"],   
    "true": [id2label[i] for i in y_true],
    "pred": [id2label[i] for i in y_pred],
    "confidence": conf
})


Helper function to sample instances by confidence and correctness.

In [24]:
def select_instances(
    df,
    true_label,
    correct=True,
    conf_min=None,
    conf_max=None,
    n=10
):
    subset = df[df["true"] == true_label]

    if correct is not None:
        if correct:
            subset = subset[subset["true"] == subset["pred"]]
        else:
            subset = subset[subset["true"] != subset["pred"]]

    if conf_min is not None:
        subset = subset[subset["confidence"] >= conf_min]

    if conf_max is not None:
        subset = subset[subset["confidence"] <= conf_max]

    return subset.sort_values("confidence", ascending=False).head(n)


Select high-confidence correct examples per class.

In [25]:
left_high = select_instances(results_bert, "left", True, 0.60, None, n=10)
right_high = select_instances(results_bert, "right", True, 0.60, None, n=10)

Preview a couple of example texts and predictions.

In [26]:
print(left_high.iloc[0]["text"][:1000])
print(left_high.iloc[0]["true"])
print(left_high.iloc[0]["pred"])
print("==============================")
print(right_high.iloc[0]["text"][:1000])
print(right_high.iloc[0]["true"])
print(right_high.iloc[0]["pred"]) 



Trump has also in his years in the White House frequently praised the use of violence. He told police officers in 2017 that they shouldn’t worry about hitting suspects’ heads as they place them in patrol cars. And on a conference call with governors on Monday, he berated them for not “dominating” the streets of their cities.
left
left
Democrats blocked a coronavirus package on Sunday that would provide economic relief to businesses and Americans suffering from the impact of the coronavirus outbreak
right
right


Select low-confidence correct and incorrect cases.

In [27]:
low_conf_correct = results_bert[
    (results_bert["true"] == results_bert["pred"]) &
    (results_bert["confidence"] >= 0.50) &
    (results_bert["confidence"] <= 0.55)
].sort_values("confidence").head(1)

low_conf_wrong = results_bert[
    (results_bert["true"] != results_bert["pred"]) &
    (results_bert["confidence"] >= 0.50) &
    (results_bert["confidence"] <= 0.55)
].sort_values("confidence").head(4)


Select confident errors for analysis.

In [28]:
confident_error = results_bert[
    results_bert["true"] != results_bert["pred"]
].sort_values("confidence", ascending=False).head(4)

Preview a low-confidence example.

In [29]:
print(low_conf_correct.iloc[0]["text"][:1000])
print(low_conf_correct.iloc[0]["true"])
print(low_conf_correct.iloc[0]["pred"])
print(low_conf_correct.iloc[0]["confidence"])



It’s also why attempts to portray pro-lifers as racists smack of disingenuousness.
right
right
0.50949603


Assemble a candidate set for explanation.

In [30]:
bert_candidates = pd.concat([
    left_high.iloc[[0]],
    right_high.iloc[[0]],
    low_conf_correct,
    low_conf_wrong.iloc[[0]],
    confident_error.iloc[[1]]
]).reset_index(drop=True)

bert_candidates

Unnamed: 0,text,true,pred,confidence
0,Trump has also in his years in the White House...,left,left,0.907582
1,Democrats blocked a coronavirus package on Sun...,right,right,0.847086
2,It’s also why attempts to portray pro-lifers a...,right,right,0.509496
3,"On Wednesday, democrats got a chance to grill ...",center,left,0.506826
4,Dem Senators Say Mismanaged Student Loan Progr...,left,right,0.830945


## Step 6: Explainability (token attributions)

Split long texts into overlapping chunks for attribution.

In [31]:
def chunk_text_with_overlap(text, tokenizer, max_length=512, overlap=50):
    tokens = tokenizer.encode(text, add_special_tokens=False)
    
    chunks = []
    start = 0
    
    while start < len(tokens):
        end = start + max_length
        chunk_tokens = tokens[start:end]
        
        chunk_text = tokenizer.decode(chunk_tokens, skip_special_tokens=True)
        chunks.append(chunk_text)
        
        start += max_length - overlap
    
    return chunks


Aggregate token attributions across chunks.

In [None]:
def explain_bert_instance(
    text,
    explainer,
    tokenizer,
    max_length=400,
    overlap=50,
    top_k=15
):
    chunks = chunk_text_with_overlap(
        text,
        tokenizer,
        max_length=max_length,
        overlap=overlap
    )
    
    all_attributions = defaultdict(list)
    
    for chunk in chunks:
        word_attributions = explainer(chunk)
        #explainer.visualize()
        for token, score in word_attributions:
            all_attributions[token].append(score)
    
    aggregated_attributions = {
        token: np.mean(scores)
        for token, scores in all_attributions.items()
    }
    
    top_tokens = sorted(
        aggregated_attributions.items(),
        key=lambda x: abs(x[1]),
        reverse=True
    )[:top_k]
    
    return top_tokens
 

Run token-level explanations on selected candidates.

In [33]:
explainer = SequenceClassificationExplainer(
    bert_model,
    tokenizer
)

for i in range(len(bert_candidates)):
    row = bert_candidates.iloc[i]
    
    text = row["text"]
    true_label = row["true"]
    pred_label = row["pred"]
    conf = row["confidence"]
    
    print("=" * 70)
    print(f"INSTANCE {i+1}")
    print(f"TRUE: {true_label} | PRED: {pred_label} | CONF: {conf:.2f}")
    
    top_tokens = explain_bert_instance(
        text,
        explainer,
        tokenizer
    )
    
    print("Top tokens:")
    for tok, score in top_tokens:
        print(f"{tok:15s} {score:+.3f}")


INSTANCE 1
TRUE: left | PRED: left | CONF: 0.91
Top tokens:
trump           +0.752
also            -0.222
praised         -0.212
2017            +0.201
house           +0.197
police          +0.161
their           +0.153
monday          -0.116
he              +0.104
suspects        +0.099
use             +0.099
violence        +0.097
##rated         -0.083
white           +0.082
cars            +0.080
INSTANCE 2
TRUE: right | PRED: right | CONF: 0.85
Top tokens:
democrats       +0.594
the             +0.296
americans       +0.280
a               -0.234
blocked         -0.162
package         +0.130
to              -0.121
of              +0.107
businesses      +0.105
and             -0.094
would           -0.089
sunday          +0.086
##virus         +0.058
from            +0.057
that            +0.048
INSTANCE 3
TRUE: right | PRED: right | CONF: 0.51
Top tokens:
smack           +0.683
.               +0.403
##s             -0.329
s               -0.268
racist          -0.192
di         

## Step 7: Masked BERT pipeline

Load the masked MBIC dataset.

In [34]:
df_masked =pd.read_excel("datasets/labeled_dataset_masked.xlsx")

Prepare masked datasets using the same train/test split and tokenize them.

In [None]:
text_col_masked = "sentence_masked"

train_df_masked = df_masked.loc[X_train.index].copy()
test_df_masked  = df_masked.loc[X_test.index].copy()

train_df_masked["label"] = train_df_masked["type"].map(label2id)
test_df_masked["label"]  = test_df_masked["type"].map(label2id)

train_df_masked["sentence"] = train_df_masked[text_col_masked]
test_df_masked["sentence"] = test_df_masked[text_col_masked]

train_dataset_masked = Dataset.from_pandas(train_df_masked[["sentence", "label"]])
test_dataset_masked  = Dataset.from_pandas(test_df_masked[["sentence", "label"]])

train_dataset_masked = train_dataset_masked.map(tokenize, batched=True)
test_dataset_masked  = test_dataset_masked.map(tokenize, batched=True)

train_dataset_masked = train_dataset_masked.remove_columns(["sentence"])
test_dataset_masked  = test_dataset_masked.remove_columns(["sentence"])

train_dataset_masked = train_dataset_masked.remove_columns(["__index_level_0__"])
test_dataset_masked  = test_dataset_masked.remove_columns(["__index_level_0__"])

train_dataset_masked.set_format("torch")
test_dataset_masked.set_format("torch")


Map: 100%|██████████| 782/782 [00:00<00:00, 8559.92 examples/s]
Map: 100%|██████████| 196/196 [00:00<00:00, 8328.61 examples/s]


Fine-tune a new BERT model on masked text and evaluate performance.

In [36]:
bert_model_masked = AutoModelForSequenceClassification.from_pretrained(
    model_name,
    num_labels=3,
    id2label=id2label,
    label2id=label2id
)

class_weights_masked = compute_class_weight(
    class_weight="balanced",
    classes=np.array([0, 1, 2]),
    y=train_df_masked["label"].values
)
class_weights_masked = torch.tensor(class_weights_masked, dtype=torch.float)

class WeightedTrainerMasked(Trainer):
    def compute_loss(self, model, inputs, return_outputs=False, **kwargs):
        labels = inputs.get("labels")
        outputs = model(**inputs)
        logits = outputs.get("logits")

        loss_fct = torch.nn.CrossEntropyLoss(
            weight=class_weights_masked.to(logits.device)
        )
        loss = loss_fct(logits, labels)
        return (loss, outputs) if return_outputs else loss

training_args_masked = TrainingArguments(
    output_dir="./bert_bias_masked",
    eval_strategy="epoch",
    save_strategy="epoch",
    num_train_epochs=5,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    learning_rate=2e-5,
    weight_decay=0.01,
    logging_steps=50,
    load_best_model_at_end=True,
    metric_for_best_model="macro_f1",
    greater_is_better=True,
    report_to="none"
)

trainer_masked = WeightedTrainerMasked(
    model=bert_model_masked,
    args=training_args_masked,
    train_dataset=train_dataset_masked,
    eval_dataset=test_dataset_masked,
    compute_metrics=compute_metrics
)

trainer_masked.train()

preds_masked = trainer_masked.predict(test_dataset_masked)
y_true_masked = preds_masked.label_ids
y_pred_masked = np.argmax(preds_masked.predictions, axis=1)

print(classification_report(
    y_true_masked,
    y_pred_masked,
    target_names=[id2label[i] for i in range(3)],
    digits=3
))


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Macro F1
1,1.1292,1.073252,0.365878
2,1.0492,1.001859,0.365544
3,0.8405,0.950182,0.437341
4,0.5943,1.087277,0.418412
5,0.377,1.183393,0.379664


              precision    recall  f1-score   support

        left      0.557     0.463     0.506        95
      center      0.172     0.385     0.238        13
       right      0.568     0.568     0.568        88

    accuracy                          0.505       196
   macro avg      0.433     0.472     0.437       196
weighted avg      0.536     0.505     0.516       196



Compute masked results and select candidates for explanation.

In [37]:
bert_model_masked.eval()

probs_masked = softmax(preds_masked.predictions)
conf_masked = probs_masked.max(axis=1)
y_pred_masked = np.argmax(probs_masked, axis=1)

results_bert_masked = pd.DataFrame({
    "text": test_df_masked["sentence"],
    "true": [id2label[i] for i in y_true_masked],
    "pred": [id2label[i] for i in y_pred_masked],
    "confidence": conf_masked
})

left_high_m = select_instances(results_bert_masked, "left", True, 0.60, None, n=10)
right_high_m = select_instances(results_bert_masked, "right", True, 0.60, None, n=10)

low_conf_correct_m = results_bert_masked[(results_bert_masked["true"] == results_bert_masked["pred"]) &
    (results_bert_masked["confidence"] >= 0.50) &
    (results_bert_masked["confidence"] <= 0.55)].sort_values("confidence").head(1)

low_conf_wrong_m = results_bert_masked[(results_bert_masked["true"] != results_bert_masked["pred"]) &
    (results_bert_masked["confidence"] >= 0.50) &
    (results_bert_masked["confidence"] <= 0.55)].sort_values("confidence").head(4)

confident_error_m = results_bert_masked[results_bert_masked["true"] != results_bert_masked["pred"]]
confident_error_m = confident_error_m.sort_values("confidence", ascending=False).head(4)

bert_candidates_masked = pd.concat([
    left_high_m.iloc[[0]],
    right_high_m.iloc[[0]],
    low_conf_correct_m,
    low_conf_wrong_m.iloc[[0]],
    confident_error_m.iloc[[1]]
]).reset_index(drop=True)


## Step 8: Masked explainability

Explain masked-model predictions with token attributions.

In [38]:
explainer_masked = SequenceClassificationExplainer(
    bert_model_masked,
    tokenizer
)

for i in range(len(bert_candidates_masked)):
    row = bert_candidates_masked.iloc[i]
    text = row["text"]
    true_label = row["true"]
    pred_label = row["pred"]
    conf = row["confidence"]

    print("=" * 70)
    print(f"MASKED INSTANCE {i+1}")
    print(f"TRUE: {true_label} | PRED: {pred_label} | CONF: {conf:.2f}")

    top_tokens = explain_bert_instance(
        text,
        explainer_masked,
        tokenizer
    )

    print("Top tokens:")
    for tok, score in top_tokens:
        print(f"{tok:15s} {score:+.3f}")


MASKED INSTANCE 1
TRUE: left | PRED: left | CONF: 0.79
Top tokens:
amazon          +0.572
ceo             -0.362
fires           +0.275
did             -0.240
announcement    -0.197
##zos           -0.161
is              +0.155
his             -0.152
getting         +0.146
.               -0.146
making          +0.145
company         -0.144
we              +0.142
$               -0.139
know            +0.135
MASKED INSTANCE 2
TRUE: right | PRED: right | CONF: 0.74
Top tokens:
how             +0.345
explained       +0.345
was             +0.314
s               +0.289
an              +0.263
patrick         +0.220
moore           +0.206
president       +0.197
original        +0.196
former          +0.172
news            +0.158
vision          +0.154
and             +0.148
br              +0.131
##t             +0.127
MASKED INSTANCE 3
TRUE: right | PRED: right | CONF: 0.50
Top tokens:
.               -0.639
unlike          -0.358
pregnancy       +0.325
and             -0.221
abortion     