Author: Karthik Nakkeeran
This project is a hands-on study of IBM's Adversarial Robustness Toolbox (ART) — an open-source library for securing machine learning models against adversarial threats.
The goal is to understand:
- What ART does and how it works in practice.
- How adversarial attacks can exploit standard ML models deployed in critical domains like financial fraud detection.
- How adversarial training (ART's primary defense mechanism) makes models resilient to these attacks.
- Whether the robustness trade-offs are acceptable for real-world deployment.
model_security_study/
├── README.md
├── requirements.txt
├── code/
│ ├── Fraud_Security_Demo.ipynb # Demo with synthetic data
│ ├── Kaggle_Fraud_Security_Demo.ipynb # Demo with real Kaggle dataset
│ └── fraud_detection_demo.py # Standalone Python script
└── data/
└── bank_transactions_data_2.csv # Kaggle bank transactions dataset
Uses 5,000 programmatically generated bank check transactions with 5 features:
- Amount, Account Age, Bounced Checks, Distance, Signature Match
The top 20% riskiest transactions (based on a composite risk equation) are labeled as fraud. This controlled environment lets us isolate and clearly demonstrate the attack/defense mechanics without noisy real-world data.
Uses a real Kaggle bank transactions dataset (bank_transactions_data_2.csv, 2,512 samples, 16 columns). Since the dataset has no explicit fraud label, a heuristic risk score is derived from numeric features and the top 20% are flagged as fraud. This notebook includes:
- Full EDA on all features (distributions, correlations, category breakdowns).
- The same attack/defense pipeline applied to messier, real-world data.
Both notebooks use the same neural network — FraudNet:
FraudNet (PyTorch MLP)
├── Linear(input_features → 32) + ReLU
├── Linear(32 → 16) + ReLU
└── Linear(16 → 2) [binary classification output]
- Input: Scaled numeric features (StandardScaler applied).
- Output: 2-class softmax (Legitimate vs Fraudulent).
- Optimizer: Adam (lr=0.01)
- Loss: CrossEntropyLoss
- Training: Mini-batch (batch_size=64), 25 epochs for standard model.
Type: White-box evasion attack (the attacker has access to model gradients).
How it works:
- Compute the gradient of the loss function with respect to the input features.
- Perturb each input feature by a small amount (
eps) in the direction that maximizes the loss. - The result: inputs that look nearly identical to the original data but cause the model to misclassify.
Configuration used: eps = 0.35 (Fraud_Security_Demo) and multiple epsilon values [0.1, 0.25, 0.35, 0.5] (Kaggle notebook adversarial training).
Why FGSM? It's fast, well-understood, and represents the minimum sophistication an attacker needs. If a model can't withstand FGSM, it certainly can't survive more advanced attacks (PGD, C&W, etc.).
- Trained only on clean, historical data.
- Learns the steepest path to high accuracy on normal inputs.
- Creates rigid, brittle decision boundaries with exploitable blind spots.
- Result: Achieves ~98% accuracy on clean data but collapses to ~15-20% under FGSM attack.
- Trained on both clean data AND adversarial examples generated during training.
- The training loop iteratively:
- Wraps the model in ART's
PyTorchClassifier. - Generates FGSM adversarial examples against the model's current state.
- Combines clean + adversarial data and retrains.
- Repeats for multiple rounds (3-5 rounds with varying epsilon strengths).
- Wraps the model in ART's
- Forces the model to smooth out decision boundaries and eliminate blind spots.
- Result: Maintains ~85-92% accuracy even under the same FGSM attack, while conceding only ~1-2% on clean data.
Both models face the exact same evaluation protocol:
| Test | What it measures |
|---|---|
| Clean Accuracy | Model performance on unmodified test data (baseline capability) |
| Adversarial Accuracy | Model performance on FGSM-perturbed test data (attack resilience) |
| Accuracy Drop | Difference between clean and adversarial accuracy (vulnerability magnitude) |
The attack is generated specifically targeting each model — the robust model isn't just tested on the standard model's adversarial examples. ART generates a fresh FGSM attack using the robust model's own gradients, making it a fair and rigorous test.
| Metric | Standard AI | Secured AI |
|---|---|---|
| Clean Accuracy | ~98% | ~96-98% |
| Accuracy Under FGSM Attack | ~15-20% (catastrophic) | ~85-92% (resilient) |
| Accuracy Drop | ~78-80 pp | ~4-10 pp |
Key insight: The standard model is worse than a coin flip under attack — it's being actively steered toward wrong decisions. The secured model maintains operational integrity.
Both notebooks include a dynamic "Business Impact" cell that computes real-world implications from the actual run results:
- Fraud leakage reduction: ~74-85% fewer missed fraudulent transactions under attack.
- Robustness trade-off: Conceding ~1-2% clean accuracy to eliminate a ~80% vulnerability gap.
- Operational scenario: For a bank processing 10,000 daily transactions (15% fraud rate), the difference between ~500+ missed frauds per day (standard) vs ~100-150 missed (secured).
# Clone the repository
git clone https://github.com/karthik-dataiq/model_security_study.git
# Set up the project
cd model_security_study
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt- Python 3.10+
- PyTorch
- adversarial-robustness-toolbox (ART v1.17+)
- scikit-learn
- pandas, numpy
- plotly (interactive charts)
- matplotlib, seaborn (legacy)
Both notebooks can be exported as standalone, self-contained HTML files with fully interactive Plotly charts — no Jupyter or Python required to view them.
# Activate virtual environment
source .venv/bin/activate
# Install export dependencies (first time only)
pip install nbconvert ipykernel
python -m ipykernel install --user --name python3
# Export both notebooks (--execute re-runs all cells to capture fresh output)
jupyter nbconvert --to html --execute code/Fraud_Security_Demo.ipynb --output-dir=exports --ExecutePreprocessor.timeout=300
jupyter nbconvert --to html --execute code/Kaggle_Fraud_Security_Demo.ipynb --output-dir=exports --ExecutePreprocessor.timeout=300The --execute flag is important — it ensures the Plotly charts are rendered with the notebook renderer (pio.renderers.default = "notebook") which embeds the full Plotly JavaScript directly into the HTML output (~5 MB per file).
Output files:
exports/Fraud_Security_Demo.htmlexports/Kaggle_Fraud_Security_Demo.html
Open in any browser for full interactivity (hover tooltips, zoom, pan).
Alternatively, in VS Code: Ctrl+Shift+P → "Jupyter: Export to HTML".
ART provides a practical, framework-agnostic way to stress-test and harden ML models before deployment. For any model operating in a critical domain (finance, healthcare, autonomous systems), adversarial robustness is not optional — it's a deployment requirement. This project demonstrates that with minimal additional training cost (~1-2% clean accuracy), a model can become dramatically more resilient to mathematically-informed evasion attacks.