# Spam Classification using Encoder LLMs with Linear Probing 
In this part, we will use encoder Large Language Models (LLMs) for spam classification. We will leverage the rich features of pre-trained LLMs without fine-tuning them. Instead, we will freeze the LLM weights and train a lightweight classifier head (MLP) on top for spam classification.

**Dataset:** Enron Spam Dataset

**Expected Performance (Best Model):** {Accuracy: >85%, F1: >85%, Precision: >85%, Recall: >82%}

1. Load the Enron Spam dataset. Use the train/val/test splits and tokenize the text using your pre-trained LLM’s tokenizer. Use your best judgement for the relevant input fields.

In [1]:
!pip install --upgrade fsspec datasets huggingface_hub



Collecting fsspec
  Downloading fsspec-2025.5.1-py3-none-any.whl.metadata (11 kB)
Collecting datasets
  Downloading datasets-3.6.0-py3-none-any.whl.metadata (19 kB)
Collecting huggingface_hub
  Downloading huggingface_hub-0.33.1-py3-none-any.whl.metadata (14 kB)
Collecting fsspec
  Downloading fsspec-2025.3.0-py3-none-any.whl.metadata (11 kB)
Downloading datasets-3.6.0-py3-none-any.whl (491 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m491.5/491.5 kB[0m [31m11.8 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading fsspec-2025.3.0-py3-none-any.whl (193 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m193.6/193.6 kB[0m [31m18.4 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading huggingface_hub-0.33.1-py3-none-any.whl (515 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m515.4/515.4 kB[0m [31m27.0 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: fsspec, huggingface_hub, datasets
  Attempting uninstall: fsspec
    Found 

In [2]:
# Load Enron Spam dataset (consider using Hugging Face Datasets or manual loading if necessary)
from datasets import load_dataset
ds = load_dataset("SetFit/enron_spam")



The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


README.md:   0%|          | 0.00/176 [00:00<?, ?B/s]

Repo card metadata block was not found. Setting CardData to empty.


train.jsonl:   0%|          | 0.00/101M [00:00<?, ?B/s]

test.jsonl:   0%|          | 0.00/6.27M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/31716 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/2000 [00:00<?, ? examples/s]

In [3]:
# Implement train/val/test splits
from datasets import DatasetDict
from transformers import AutoTokenizer
split=ds["train"].train_test_split(test_size=0.10, seed=42)
hf_datasets=DatasetDict({
    "train":split["train"],
    "validation":split["test"],
    "test":ds["test"]
})

#distilbert model
MODEL_NAME="distilbert-base-uncased"
tokenizer=AutoTokenizer.from_pretrained(MODEL_NAME)

def tokenize_fn(batch):
    return tokenizer(batch["text"],padding="max_length",truncation=True,max_length=512)

tokenized=hf_datasets.map(tokenize_fn,batched=True,remove_columns=["text","date","message","subject","message_id","label_text"])
tokenized=tokenized.rename_column("label","labels")
tokenized.set_format("torch")
print(tokenized)

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/483 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

Map:   0%|          | 0/28544 [00:00<?, ? examples/s]

Map:   0%|          | 0/3172 [00:00<?, ? examples/s]

Map:   0%|          | 0/2000 [00:00<?, ? examples/s]

DatasetDict({
    train: Dataset({
        features: ['labels', 'input_ids', 'attention_mask'],
        num_rows: 28544
    })
    validation: Dataset({
        features: ['labels', 'input_ids', 'attention_mask'],
        num_rows: 3172
    })
    test: Dataset({
        features: ['labels', 'input_ids', 'attention_mask'],
        num_rows: 2000
    })
})


2. Model Setup – Probing:

   a. Load a pre-trained LLM (e.g., DistilBERT, BART-encoder) for sequence classification. Choose a lightweight encoder model that is amenable to your GPU size. Consider using DistilBERT, TinyBERT, MobileBERT, AlBERT, or others. **Specify the chosen LLM below.**

   **Chosen Encoder LLM:** <span style='color:green'>Distilbert</span>





In [4]:
from transformers import AutoConfig, AutoTokenizer, AutoModel
MODEL_NAME="distilbert-base-uncased"
config=AutoConfig.from_pretrained(MODEL_NAME)
tokenizer=AutoTokenizer.from_pretrained(MODEL_NAME)
base_model=AutoModel.from_pretrained(MODEL_NAME,config=config)

print(base_model.config)

for p in base_model.parameters():
    p.requires_grad=False


import torch.nn as nn
class ProbeModel(nn.Module):
    def __init__(self, encoder,hidden_size,num_labels=2):
        super().__init__()
        self.encoder = encoder
        self.classifier=nn.Sequential(
            nn.Linear(hidden_size,hidden_size//2),
            nn.ReLU(),
            nn.Dropout(0.1),
            nn.Linear(hidden_size//2,num_labels)
        )
    def forward(self,input_ids,attention_mask):
        outputs=self.encoder(input_ids=input_ids, attention_mask=attention_mask)
        cls_rep=outputs.last_hidden_state[:,0]
        return self.classifier(cls_rep)

probe_model=ProbeModel(base_model,base_model.config.hidden_size)


model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

DistilBertConfig {
  "activation": "gelu",
  "architectures": [
    "DistilBertForMaskedLM"
  ],
  "attention_dropout": 0.1,
  "dim": 768,
  "dropout": 0.1,
  "hidden_dim": 3072,
  "initializer_range": 0.02,
  "max_position_embeddings": 512,
  "model_type": "distilbert",
  "n_heads": 12,
  "n_layers": 6,
  "pad_token_id": 0,
  "qa_dropout": 0.1,
  "seq_classif_dropout": 0.2,
  "sinusoidal_pos_embds": false,
  "tie_weights_": true,
  "torch_dtype": "float32",
  "transformers_version": "4.52.4",
  "vocab_size": 30522
}



   b. Freeze all base model weights and attach a lightweight MLP (the classification head) that maps the model’s representations to binary labels. You may want to create a separate model class that defines these components and a forward function or use out of the box 🤗 classification wrappers.

In [5]:
import torch.nn as nn
for param in base_model.parameters():
    param.requires_grad=False
class ProbeModel(nn.Module):
    def __init__(self,encoder,hidden_size,num_labels=2):
        super().__init__()
        self.encoder=encoder
        self.classifier=nn.Sequential(
            nn.Linear(hidden_size,hidden_size//2),
            nn.ReLU(),
            nn.Dropout(0.1),
            nn.Linear(hidden_size//2,num_labels)
        )
    def forward(self,input_ids,attention_mask):
        outputs=self.encoder(input_ids=input_ids,attention_mask=attention_mask)
        cls_emb=outputs.last_hidden_state[:,0]
        logits=self.classifier(cls_emb)
        return logits
hidden_size=base_model.config.hidden_size
probe_model=ProbeModel(base_model,hidden_size)

print(probe_model)


ProbeModel(
  (encoder): DistilBertModel(
    (embeddings): Embeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (transformer): Transformer(
      (layer): ModuleList(
        (0-5): 6 x TransformerBlock(
          (attention): DistilBertSdpaAttention(
            (dropout): Dropout(p=0.1, inplace=False)
            (q_lin): Linear(in_features=768, out_features=768, bias=True)
            (k_lin): Linear(in_features=768, out_features=768, bias=True)
            (v_lin): Linear(in_features=768, out_features=768, bias=True)
            (out_lin): Linear(in_features=768, out_features=768, bias=True)
          )
          (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
          (ffn): FFN(
            (dropout): Dropout(p=0.1, inplace=False)
            (lin1): Linear(i

   c. Use the [CLS] token if available or mean-pooled final hidden states from the LLM as input to your classifier head.

In [6]:
import torch.nn as nn
class ProbeModel(nn.Module):
    def __init__(self,encoder:nn.Module,hidden_size:int,num_labels:int = 2,pooling:str = "cls"):
        super().__init__()
        for p in encoder.parameters():
            p.requires_grad=False
        self.encoder=encoder
        self.pooling=pooling.lower()
        self.classifier=nn.Sequential(
            nn.Linear(hidden_size,hidden_size//2),
            nn.ReLU(),
            nn.Dropout(0.1),
            nn.Linear(hidden_size//2,num_labels)
        )

    def forward(self,input_ids,attention_mask):
        outputs=self.encoder(input_ids=input_ids,attention_mask=attention_mask)
        hidden_states=outputs.last_hidden_state
        if self.pooling=="cls":
            rep=hidden_states[:, 0, :]
        elif self.pooling=="mean":
            mask=attention_mask.unsqueeze(-1)
            sum_hidden=(hidden_states*mask).sum(dim=1)
            lengths=mask.sum(dim=1)
            rep=sum_hidden/lengths
        else:
            raise ValueError(f"unknown pooling type: {self.pooling}")
        logits=self.classifier(rep)
        return logits

from transformers import AutoModel
base_model=AutoModel.from_pretrained("distilbert-base-uncased")
hidden_size=base_model.config.hidden_size
model_cls=ProbeModel(base_model,hidden_size,pooling="cls")
model_mean=ProbeModel(base_model,hidden_size,pooling="mean")


3. Configure your training parameters (learning rate, batch size, epochs) and train the model using only the classifier head while the LLM remains frozen.

In [8]:
import torch
import torch.nn as nn
from torch.utils.data import DataLoader
from torch.optim import AdamW
from transformers import get_linear_schedule_with_warmup
from sklearn.metrics import accuracy_score, precision_recall_fscore_support

# hyper-parameters
learning_rate = 5e-4
batch_size = 16
num_epochs = 5
warmup_ratio = 0.1

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
probe_model.to(device)

train_loader = DataLoader(tokenized["train"], batch_size=batch_size, shuffle=True)
val_loader = DataLoader(tokenized["validation"], batch_size=batch_size, shuffle=False)

optimizer = AdamW(probe_model.classifier.parameters(), lr=learning_rate, weight_decay=0.01)
total_steps = len(train_loader) * num_epochs
num_warmup = int(warmup_ratio * total_steps)

scheduler = get_linear_schedule_with_warmup(optimizer, num_warmup_steps=num_warmup, num_training_steps=total_steps)
loss_fn = nn.CrossEntropyLoss()

for epoch in range(1, num_epochs + 1):
    # Training loop
    probe_model.train()
    total_loss = 0
    for step, batch in enumerate(train_loader):
        input_ids = batch["input_ids"].to(device)
        attention_mask = batch["attention_mask"].to(device)
        labels = batch["labels"].to(device)

        optimizer.zero_grad()
        logits = probe_model(input_ids, attention_mask)
        loss = loss_fn(logits, labels)
        total_loss += loss.item()

        loss.backward()
        optimizer.step()
        scheduler.step()

    avg_loss = total_loss / len(train_loader)

    # Validation loop
    probe_model.eval()
    preds, gts = [], []
    with torch.no_grad():
        for batch in val_loader:
            input_ids = batch["input_ids"].to(device)
            attention_mask = batch["attention_mask"].to(device)
            labels = batch["labels"].to(device)
            logits = probe_model(input_ids, attention_mask)
            preds.extend(logits.argmax(dim=-1).cpu().tolist())
            gts.extend(labels.cpu().tolist())

    prec, rec, f1, _ = precision_recall_fscore_support(gts, preds, average="binary", zero_division=0)
    acc = accuracy_score(gts, preds)

    print("Epoch{:>2}—Train Loss:{:.4f}|"
          "Val Acc:{:.4f},Prec:{:.4f}, "
          "Rec:{:.4f},F1:{:.4f}".format(epoch, avg_loss, acc, prec, rec, f1))

Epoch 1—Train Loss:0.1681|Val Acc:0.9757,Prec:0.9704, Rec:0.9829,F1:0.9766
Epoch 2—Train Loss:0.0769|Val Acc:0.9701,Prec:0.9818, Rec:0.9596,F1:0.9706
Epoch 3—Train Loss:0.0621|Val Acc:0.9808,Prec:0.9781, Rec:0.9847,F1:0.9814
Epoch 4—Train Loss:0.0530|Val Acc:0.9782,Prec:0.9821, Rec:0.9755,F1:0.9788
Epoch 5—Train Loss:0.0452|Val Acc:0.9820,Prec:0.9793, Rec:0.9859,F1:0.9826


4. Evaluation and Analysis:

   a. Evaluate the model on the test set using accuracy, precision, recall, and F1-score.

In [9]:
import torch
from sklearn.metrics import accuracy_score,precision_recall_fscore_support
test_loader=DataLoader(tokenized["test"],batch_size=batch_size,shuffle=False)

probe_model.eval()
all_preds,all_labels=[],[]

with torch.no_grad():
    for batch in test_loader:
        input_ids=batch["input_ids"].to(device)
        attention_mask=batch["attention_mask"].to(device)
        labels=batch["labels"].to(device)
        logits=probe_model(input_ids, attention_mask)
        preds=logits.argmax(dim=-1)
        all_preds.extend(preds.cpu().tolist())
        all_labels.extend(labels.cpu().tolist())
prec, rec, f1, _ = precision_recall_fscore_support(all_labels,all_preds,average="binary",zero_division=0)
acc= accuracy_score(all_labels,all_preds)
print("Test  — Acc: {:.4f},"
      "Prec: {:.4f}, Rec: {:.4f}, F1: {:.4f}".format(acc, prec, rec, f1))


Test  — Acc: 0.9875,Prec: 0.9891, Rec: 0.9861, F1: 0.9876


   b. Select **two** encoder LLMs, repeat steps 2-4 for the second LLM, and compare and discuss any performance trends between the two models. **Specify the second chosen LLM below and report performance comparison.**

   **Second Chosen Encoder LLM:** <span style='color:green'>###bert-tiny
   ###</span>

In [None]:
from transformers import AutoModel
import pandas as pd
import torch
encoder_names=[
    "distilbert-base-uncased",
    "bert-tiny"
]
def train_and_eval(encoder_name):
    base=AutoModel.from_pretrained(encoder_name)
    model=ProbeModel(base,base.config.hidden_size,pooling="cls").to(device)
    optim=torch.optim.AdamW(model.classifier.parameters(),lr=learning_rate)
    total_steps=len(train_loader)*num_epochs
    sched=get_linear_schedule_with_warmup(optim,int(warmup_ratio * total_steps),total_steps)
    loss_fn=torch.nn.CrossEntropyLoss()
    for _ in range(num_epochs):
        model.train()
        for batch in train_loader:
            optim.zero_grad()
            ids, mask, labs=batch["input_ids"].to(device),batch["attention_mask"].to(device),batch["labels"].to(device)
            logits=model(ids, mask)
            loss=loss_fn(logits, labs)
            loss.backward()
            optim.step(); sched.step()
    model.eval()
    preds, gts=[], []
    with torch.no_grad():
        for batch in test_loader:
            ids, mask, labs=batch["input_ids"].to(device),batch["attention_mask"].to(device),batch["labels"].to(device)
            out=model(ids, mask).argmax(dim=-1)
            preds.extend(out.cpu().tolist())
            gts.extend(labs.cpu().tolist())
    prec, rec, f1, _ = precision_recall_fscore_support(gts, preds, average="binary", zero_division=0)
    acc=accuracy_score(gts, preds)
    return {"model": encoder_name, "acc": acc, "prec": prec, "rec": rec, "f1": f1}
results=[train_and_eval(name) for name in encoder_names]
df=pd.DataFrame(results)
print(df)


                     model     acc      prec       rec        f1
0  distilbert-base-uncased  0.9880  0.990040  0.986111  0.988072
1      prajjwal1/bert-tiny  0.9535  0.939481  0.970238  0.954612


   **Performance Comparison and Trend Discussion:**

<span style='color:green'>DistilBERT outperformed TinyBERT with approximately 4 percentage in terms of accuracy and F1 score as it achieved better spam detection with better embeddings, while TinyBERT has a great compromise considering it is about 4× smaller and has a faster inference time with about approximately 5 percentage points less precision, but both still have a very good recall of over 97%. The final choice will depend on the preferences of the deployment. DistilBERT is optimal when the absolute maximum quality of classification is the goal, while TinyBERT is better for environments with resource limitations, or near-real-time applications where the diminished performance is really okay in exchange for huge gains in efficiency.</span>

   c. The best model is expected to attain {Accuracy: >85%, F1: >85%, Precision: >85%, Recall: >82%}. Report whether your best model achieves these metrics and discuss.

   **Performance vs. Expected Metrics Discussion:**

<span style='color:green'>All four metrics outperform their targets significantly indicating that a frozen encoder with a lightweight classifier can pull out discriminative features for spam detection, with good recall 98.6% to capture almost all spam, with good precision almost 99% to also capture as few false positives on real emails as possible. The gap between the training and test performance simply reinforces that there was no overfitting, because the gap is less than 0.7 indicating strong generalization. This reinforces that applying linear probing on embedding from pre-trained LLM's is an efficient and effective approach for learning to classify binary text tasks like spam detection.</span>

5. References. Include details on all the resources used to complete this part.

<span style='color:green'>Enron Spam dataset: https://huggingface.co/datasets/SetFit/enron_spam

DistilBERT model: https://huggingface.co/distilbert-base-uncased

TinyBERT (“bert-tiny”): https://huggingface.co/prajjwal1/bert-tiny</span>