Model Security Study: Adversarial Robustness Toolbox (ART)

Author: Karthik Nakkeeran

Purpose

This project is a hands-on study of IBM's Adversarial Robustness Toolbox (ART) — an open-source library for securing machine learning models against adversarial threats.

The goal is to understand:

What ART does and how it works in practice.
How adversarial attacks can exploit standard ML models deployed in critical domains like financial fraud detection.
How adversarial training (ART's primary defense mechanism) makes models resilient to these attacks.
Whether the robustness trade-offs are acceptable for real-world deployment.

Project Structure

model_security_study/
├── README.md
├── requirements.txt
├── code/
│   ├── Fraud_Security_Demo.ipynb          # Demo with synthetic data
│   ├── Kaggle_Fraud_Security_Demo.ipynb   # Demo with real Kaggle dataset
│   └── fraud_detection_demo.py            # Standalone Python script
└── data/
    └── bank_transactions_data_2.csv       # Kaggle bank transactions dataset

Notebooks

1. `Fraud_Security_Demo.ipynb` — Synthetic Data Demo

Uses 5,000 programmatically generated bank check transactions with 5 features:

Amount, Account Age, Bounced Checks, Distance, Signature Match

The top 20% riskiest transactions (based on a composite risk equation) are labeled as fraud. This controlled environment lets us isolate and clearly demonstrate the attack/defense mechanics without noisy real-world data.

2. `Kaggle_Fraud_Security_Demo.ipynb` — Real Data Demo

Uses a real Kaggle bank transactions dataset (bank_transactions_data_2.csv, 2,512 samples, 16 columns). Since the dataset has no explicit fraud label, a heuristic risk score is derived from numeric features and the top 20% are flagged as fraud. This notebook includes:

Full EDA on all features (distributions, correlations, category breakdowns).
The same attack/defense pipeline applied to messier, real-world data.

Model Architecture

Both notebooks use the same neural network — FraudNet:

FraudNet (PyTorch MLP)
├── Linear(input_features → 32) + ReLU
├── Linear(32 → 16) + ReLU
└── Linear(16 → 2)  [binary classification output]

Input: Scaled numeric features (StandardScaler applied).
Output: 2-class softmax (Legitimate vs Fraudulent).
Optimizer: Adam (lr=0.01)
Loss: CrossEntropyLoss
Training: Mini-batch (batch_size=64), 25 epochs for standard model.

Attack: Fast Gradient Sign Method (FGSM)

Type: White-box evasion attack (the attacker has access to model gradients).

How it works:

Compute the gradient of the loss function with respect to the input features.
Perturb each input feature by a small amount (eps) in the direction that maximizes the loss.
The result: inputs that look nearly identical to the original data but cause the model to misclassify.

Configuration used: eps = 0.35 (Fraud_Security_Demo) and multiple epsilon values [0.1, 0.25, 0.35, 0.5] (Kaggle notebook adversarial training).

Why FGSM? It's fast, well-understood, and represents the minimum sophistication an attacker needs. If a model can't withstand FGSM, it certainly can't survive more advanced attacks (PGD, C&W, etc.).

Standard Model vs Secured Model

Standard Model (Vulnerable)

Trained only on clean, historical data.
Learns the steepest path to high accuracy on normal inputs.
Creates rigid, brittle decision boundaries with exploitable blind spots.
Result: Achieves ~98% accuracy on clean data but collapses to ~15-20% under FGSM attack.

Secured Model (Adversarially Trained)

Trained on both clean data AND adversarial examples generated during training.
The training loop iteratively:
1. Wraps the model in ART's PyTorchClassifier.
2. Generates FGSM adversarial examples against the model's current state.
3. Combines clean + adversarial data and retrains.
4. Repeats for multiple rounds (3-5 rounds with varying epsilon strengths).
Forces the model to smooth out decision boundaries and eliminate blind spots.
Result: Maintains ~85-92% accuracy even under the same FGSM attack, while conceding only ~1-2% on clean data.

How We Test Both Models

Both models face the exact same evaluation protocol:

Test	What it measures
Clean Accuracy	Model performance on unmodified test data (baseline capability)
Adversarial Accuracy	Model performance on FGSM-perturbed test data (attack resilience)
Accuracy Drop	Difference between clean and adversarial accuracy (vulnerability magnitude)

The attack is generated specifically targeting each model — the robust model isn't just tested on the standard model's adversarial examples. ART generates a fresh FGSM attack using the robust model's own gradients, making it a fair and rigorous test.

Results Summary

Metric	Standard AI	Secured AI
Clean Accuracy	~98%	~96-98%
Accuracy Under FGSM Attack	~15-20% (catastrophic)	~85-92% (resilient)
Accuracy Drop	~78-80 pp	~4-10 pp

Key insight: The standard model is worse than a coin flip under attack — it's being actively steered toward wrong decisions. The secured model maintains operational integrity.

Business Implications

Both notebooks include a dynamic "Business Impact" cell that computes real-world implications from the actual run results:

Fraud leakage reduction: ~74-85% fewer missed fraudulent transactions under attack.
Robustness trade-off: Conceding ~1-2% clean accuracy to eliminate a ~80% vulnerability gap.
Operational scenario: For a bank processing 10,000 daily transactions (15% fraud rate), the difference between ~500+ missed frauds per day (standard) vs ~100-150 missed (secured).

Setup & Installation

# Clone the repository
git clone https://github.com/karthik-dataiq/model_security_study.git

# Set up the project
cd model_security_study
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Requirements

Python 3.10+
PyTorch
adversarial-robustness-toolbox (ART v1.17+)
scikit-learn
pandas, numpy
plotly (interactive charts)
matplotlib, seaborn (legacy)

Exporting Notebooks as HTML (for Presentations)

Both notebooks can be exported as standalone, self-contained HTML files with fully interactive Plotly charts — no Jupyter or Python required to view them.

# Activate virtual environment
source .venv/bin/activate

# Install export dependencies (first time only)
pip install nbconvert ipykernel
python -m ipykernel install --user --name python3

# Export both notebooks (--execute re-runs all cells to capture fresh output)
jupyter nbconvert --to html --execute code/Fraud_Security_Demo.ipynb --output-dir=exports --ExecutePreprocessor.timeout=300
jupyter nbconvert --to html --execute code/Kaggle_Fraud_Security_Demo.ipynb --output-dir=exports --ExecutePreprocessor.timeout=300

The --execute flag is important — it ensures the Plotly charts are rendered with the notebook renderer (pio.renderers.default = "notebook") which embeds the full Plotly JavaScript directly into the HTML output (~5 MB per file).

Output files:

exports/Fraud_Security_Demo.html
exports/Kaggle_Fraud_Security_Demo.html

Open in any browser for full interactivity (hover tooltips, zoom, pan).

Alternatively, in VS Code: Ctrl+Shift+P → "Jupyter: Export to HTML".

Key Takeaway

ART provides a practical, framework-agnostic way to stress-test and harden ML models before deployment. For any model operating in a critical domain (finance, healthcare, autonomous systems), adversarial robustness is not optional — it's a deployment requirement. This project demonstrates that with minimal additional training cost (~1-2% clean accuracy), a model can become dramatically more resilient to mathematically-informed evasion attacks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Model Security Study: Adversarial Robustness Toolbox (ART)

Purpose

Project Structure

Notebooks

1. `Fraud_Security_Demo.ipynb` — Synthetic Data Demo

2. `Kaggle_Fraud_Security_Demo.ipynb` — Real Data Demo

Model Architecture

Attack: Fast Gradient Sign Method (FGSM)

Standard Model vs Secured Model

Standard Model (Vulnerable)

Secured Model (Adversarially Trained)

How We Test Both Models

Results Summary

Business Implications

Setup & Installation

Requirements

Exporting Notebooks as HTML (for Presentations)

Key Takeaway

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
code		code
data		data
exports		exports
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Model Security Study: Adversarial Robustness Toolbox (ART)

Purpose

Project Structure

Notebooks

1. Fraud_Security_Demo.ipynb — Synthetic Data Demo

2. Kaggle_Fraud_Security_Demo.ipynb — Real Data Demo

Model Architecture

Attack: Fast Gradient Sign Method (FGSM)

Standard Model vs Secured Model

Standard Model (Vulnerable)

Secured Model (Adversarially Trained)

How We Test Both Models

Results Summary

Business Implications

Setup & Installation

Requirements

Exporting Notebooks as HTML (for Presentations)

Key Takeaway

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1. `Fraud_Security_Demo.ipynb` — Synthetic Data Demo

2. `Kaggle_Fraud_Security_Demo.ipynb` — Real Data Demo

Packages