Evaluating the classification accuracy of BERT, BART, GPT, Llama models (which are politically leaning) on text classification accuracy of politically biased statements. 

In [None]:
# importing BERT: BERTForSequenceClassification is the best for text classification. 
import pandas as pd
from transformers import BertTokenizer, BertForSequenceClassification
import torch
from torch.nn.functional import softmax
from sklearn.metrics import accuracy_score, classification_report

# load dataset: 
df = pd.read_csv("data/IBC/sample_ibc.csv")

tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
model = BertForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=3)

model.eval()

# map labels to indices
label_mapping = {"Liberal": 0, "Conservative": 1, "Neutral": 2}
reverse_label_mapping = {v: k for k, v in label_mapping.items()}

# tokenize
inputs = tokenizer(
    list(df["sentence"]),
    padding=True,
    truncation=True,
    max_length=512,
    return_tensors="pt"
)

# run inference
with torch.no_grad():
    outputs = model(**inputs)
    logits = outputs.logits
    predictions = torch.argmax(logits, dim=-1)

# eval

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]



model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [5]:
from transformers import BartTokenizer, BartForSequenceClassification

# prep tokenizer and model
tokenizer = BartTokenizer.from_pretrained("facebook/bart-large")
model = BartForSequenceClassification.from_pretrained("facebook/bart-large", num_labels=3)

model.eval()

label_mapping = {"Liberal": 0, "Conservative": 1, "Neutral": 2}
reverse_label_mapping = {v: k for k, v in label_mapping.items()}

inputs = tokenizer(
    list(df["sentence"]),
    padding=True,
    truncation=True,
    max_length=512,
    return_tensors="pt"
)

with torch.no_grad():
    outputs = model(**inputs)
    logits = outputs.logits
    predictions = torch.argmax(logits, dim=-1)

# eval 

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/1.63k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/1.02G [00:00<?, ?B/s]

Some weights of BartForSequenceClassification were not initialized from the model checkpoint at facebook/bart-large and are newly initialized: ['classification_head.dense.bias', 'classification_head.dense.weight', 'classification_head.out_proj.bias', 'classification_head.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
