# Nemotron Phishing Detection Workshop

This notebook walks through the full fine-tuning workflow on the Enron dataset:
1. Download the dataset
2. Convert to JSONL
3. Fine-tune Nemotron with LoRA
4. Evaluate the model


## Install dependencies
If you're running in a fresh environment, install the workshop requirements.

In [None]:
!pip install -r ../requirements.txt

## Configure Kaggle API
Export your Kaggle credentials before downloading the dataset.

In [None]:
import os
os.environ['KAGGLE_USERNAME'] = 'YOUR_KAGGLE_USERNAME'
os.environ['KAGGLE_KEY'] = 'YOUR_KAGGLE_KEY'

## Download the dataset

In [None]:
!python ../scripts/download_dataset.py --output_dir ../data/raw

## Convert to JSONL
This uses a simple keyword heuristic to label phishing vs benign.

In [None]:
!python ../scripts/prepare_jsonl.py --input_dir ../data/raw/maildir --output_dir ../data/processed

## Inspect dataset stats

In [None]:
import json
from pathlib import Path
stats = json.loads(Path('../data/processed/stats.json').read_text())
stats

## Fine-tune the model
Adjust batch size, epochs, and max sequence length for your GPU memory.

In [None]:
!python ../scripts/train.py --data_dir ../data/processed --output_dir ../outputs --model_name nvidia/Nemotron-4-Mini-HF

## Quick local evaluation
Run a quick test on a few samples using the fine-tuned adapter.

In [None]:
!python ../scripts/test_model.py --test_file ../data/processed/test.jsonl --max_samples 20 --adapter_dir ../outputs/adapter --model_name nvidia/Nemotron-4-Mini-HF