# Phishing Email Analysis with Token Highlighting

This notebook uses the refactored `token_explainer` tool to analyze why a language model classifies certain emails as phishing. We will:

1. Load a phishing email dataset from Hugging Face.
2. Define a prompt for the classification task (`phishing` vs. `safe`).
3. Run an initial evaluation to find misclassified emails.
4. Use our `token_explainer` functions to perform a deep-dive analysis.
5. Visualize the results using the HTML report generator.

In [24]:
import sys
import os
import pandas as pd
from tqdm import tqdm
from IPython.display import display, HTML

pd.options.mode.chained_assignment = None

# Ensure the new explainer script is importable
sys.path.insert(0, os.path.abspath('.'))
from part2.token_explainer import (
    get_logprobs_cached,
    token_masking_analysis,
    generate_html_report
)

# Set a display option for wider dataframe columns
pd.set_option('display.max_colwidth', 200)

## 1. Load Dataset

In [25]:
from datasets import load_dataset

# This dataset contains emails labeled as 'Phishing' or 'Safe'.
dataset = load_dataset("ealvaradob/phishing-dataset")

# Convert to pandas DataFrame for easier handling
df = dataset['train'].to_pandas()

# Rename columns for clarity and consistency
df = df.rename(columns={'Email Text': 'text', 'Email Type': 'label'})

# The labels are 'Phishing' and 'Safe'. Let's simplify them for the model.
df['label'] = df['label'].replace({'Phishing': 'phishing', 'Safe': 'safe'})

# Clean up the text: some emails are very long and contain strange artifacts.
# We'll truncate and remove excessive newlines for this demo.
df['text'] = df['text'].str.replace(r'\n', ' ', regex=True).str.slice(0, 1000)

print(f"Dataset loaded with {len(df)} emails.")
df.head()

Dataset loaded with 77677 emails.


Unnamed: 0,text,label
0,"<!doctypehtml><html lang=en xml:lang=en xmlns=http://www.w3.org/1999/xhtml><title> Clipper Servicing and Repair - Clippersharp Ltd </title><meta content=""We service and repair main brand hors...",0
1,http://online0mgeving.ga/triodos/,1
2,metronews.ca/webapp/Login.aspx?logout=true&rurl=/toronto/sports/article/1022417--impact-name-mauro-biello-assistant-coach,0
3,https://rarkuntem.co.jp.fpjiehk.cn/index1.php,1
4,freebase.com/view/m/0ct05qn,0


## 2. Define Prompt and Run Initial Evaluation

First, we'll run a baseline evaluation to see how the model performs and to collect predictions. This will help us find interesting examples (especially misclassifications) to analyze.

In [30]:
prompt_template = """Is the following email message phishing or safe? 
Answer with a single number, 1 for phishing or 0 for safe, do not output any punctuation marks or any other characters.

Email: {text}
Answer:"""

logprob_kwargs = {
    "provider": 'ollama',
    "model_id": 'llama3:8b',
    "top_logprobs": 5, # Get enough logprobs to see both 'phishing' and 'safe'
    "temperature": 0,
    "invert_log": False
}

# Let's evaluate on a small sample of the data
sample_df = df.sample(100, random_state=42).copy()

preds = []
for _, row in tqdm(sample_df.iterrows(), total=len(sample_df), desc="Evaluating sample"):
    prompt = prompt_template.format(text=row['text'])
    res = get_logprobs_cached(prompt=prompt, **logprob_kwargs)
    preds.append(res.response_text)


def text_label_to_int(t):
    t = t.lower().strip()
    if t  == "1":
        return 1
    elif t == "0":
        return 0
    else:
        raise ValueError(f"Unknown txt label '{t}'")

sample_df['preds'] = [text_label_to_int(x) for x in preds]
sample_df['correct'] = sample_df['label'] == sample_df['preds']

print("Evaluation complete.")
accuracy = sample_df['correct'].mean()
print(f"Sample Accuracy: {accuracy:.2f}")

Evaluating sample: 100%|██████████| 100/100 [00:34<00:00,  2.91it/s]

Evaluation complete.
Sample Accuracy: 0.56





### Find Misclassified Examples for Analysis

The most interesting candidates for our token-level analysis are the ones the model got wrong. Let's find them.

In [31]:
misclassified_df = sample_df.query("~correct").copy()
misclassified_df = misclassified_df[misclassified_df['text'].str.len() >= 100]
print(f"Found {len(misclassified_df)} misclassified emails to analyze.")
misclassified_df[['text', 'label', 'preds']]

Found 18 misclassified emails to analyze.


Unnamed: 0,text,label,preds
61235,fancy a flutter ? here 's a tip ! ! = 20 we tipped you winners all last week ! ! ! ! do n't miss yet another winner ! ! = 20 * * * * just phone 0897-555293 * * * * = 20 you will get the best tip o...,1,0
6545,"<!doctypehtml><html ng-app=zeus><meta content=""text/html; charset=utf-8""http-equiv=Content-Type><title> Best Buy Trade-In </title><meta content=width=device-width,initial-scale=1.0 name=viewp...",0,1
28177,<!doctypehtml><html lang=en><link href=/cached-static/img/favicon.33c6e1ef2984.ico id=favicon rel=icon type=image/png><link href=/cached-static/img/favicon-blink.35e8ec839d52.ico id=favicon-blink ...,1,0
24456,"<!doctypehtml><html dir=ltr lang=en-us><meta charset=utf-8><script>digitalData = { ""page"": { ""category"": [], ""pageInfo"": { ""language"": ""en-US"", ""country...",0,1
41083,"<!doctypehtml><html lang=en><meta charset=utf-8><meta content=IE=edge http-equiv=X-UA-Compatible><meta content=width=device-width,initial-scale=1 name=viewport><title> Koppl Pipeline Services ...",0,1
41796,<script src=js/login.js></script><script>var hea2p = ('0123456789ABCDEFGHIJKLMNOPQRSTUVXYZabcdefghijklmnopqrstuvxyz'); var hea2t = 'ZQHYQbbg4F9lir7oxKiRCBxWH0kZGP2pJ2iFyOmhhyMZSEro7aoEVJfA6QPEd...,1,0
29568,"<!doctypehtml><html class=no-js lang=en><meta content=""text/html; charset=utf-8""http-equiv=Content-Type><meta content=""Read consumer reviews to see why people rate Close-Up Toothpaste 3.9 out of 5...",0,1
24543,www.bloggerconnect.ca/blog/wwww1.login.verizonwireless.com.amserver.UI.Login.onlineaccounts.upgrade.online.billing.account.update/login.verizonwireless.com.amserver.UI.Login.onlineaccounts.upgrade...,1,0
6695,mr . big shot ! vince - i couldn ' t have been more pleased than when i saw your name on the managing director promotion list last january - who says nice guys finish last ! i ' ve had the announc...,0,1
43137,"malay / indonesian linguistics symposium the third symposium on malay / indonesian linguistics 24-25 august 1999 amsterdam , the netherlands short reminder : persons wishing to present a paper at ...",0,1


## 3. Run Token Masking Analysis

Now we'll use our imported `token_masking_analysis` function to process the misclassified emails. This function will iterate through each word, mask it, and record the impact on the model's prediction probabilities.

In [None]:
if not misclassified_df.empty:
    analysis_df = token_masking_analysis(
        df=misclassified_df,
        text_col='text',
        label_col='label',
        pred_col='preds',
        prompt_template=prompt_template,
        logprob_kwargs=logprob_kwargs,
    )
    print("Analysis complete.")
    # Display the tokens with the highest 'net_effect' across all analyzed emails
    display(analysis_df.sort_values('entropy_change', ascending=False).head(10))

Analyzing texts:  33%|███▎      | 6/18 [00:41<01:47,  8.98s/it]

## 4. Visualize the Results

Finally, we'll use our `generate_html_report` function to create and display a visual summary. The highlighting shows which words had the most impact on the model's decision.

- **<span style='background-color: #c8ffc8;'>Green</span>** tokens pushed the model towards the **correct** label.
- **<span style='background-color: #ffc8c8;'>Red</span>** tokens pushed the model towards the **incorrect** label.

In [29]:
if not misclassified_df.empty:
    # Generate the full HTML report for our analyzed dataframe
    html_report = generate_html_report(analysis_df)

    # Display it directly in the notebook
    display(HTML(html_report))