Skip to content

karthik-dataiq/model_security_study

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Model Security Study: Adversarial Robustness Toolbox (ART)

Author: Karthik Nakkeeran


Purpose

This project is a hands-on study of IBM's Adversarial Robustness Toolbox (ART) — an open-source library for securing machine learning models against adversarial threats.

The goal is to understand:

  • What ART does and how it works in practice.
  • How adversarial attacks can exploit standard ML models deployed in critical domains like financial fraud detection.
  • How adversarial training (ART's primary defense mechanism) makes models resilient to these attacks.
  • Whether the robustness trade-offs are acceptable for real-world deployment.

Project Structure

model_security_study/
├── README.md
├── requirements.txt
├── code/
│   ├── Fraud_Security_Demo.ipynb          # Demo with synthetic data
│   ├── Kaggle_Fraud_Security_Demo.ipynb   # Demo with real Kaggle dataset
│   └── fraud_detection_demo.py            # Standalone Python script
└── data/
    └── bank_transactions_data_2.csv       # Kaggle bank transactions dataset

Notebooks

1. Fraud_Security_Demo.ipynb — Synthetic Data Demo

Uses 5,000 programmatically generated bank check transactions with 5 features:

  • Amount, Account Age, Bounced Checks, Distance, Signature Match

The top 20% riskiest transactions (based on a composite risk equation) are labeled as fraud. This controlled environment lets us isolate and clearly demonstrate the attack/defense mechanics without noisy real-world data.

2. Kaggle_Fraud_Security_Demo.ipynb — Real Data Demo

Uses a real Kaggle bank transactions dataset (bank_transactions_data_2.csv, 2,512 samples, 16 columns). Since the dataset has no explicit fraud label, a heuristic risk score is derived from numeric features and the top 20% are flagged as fraud. This notebook includes:

  • Full EDA on all features (distributions, correlations, category breakdowns).
  • The same attack/defense pipeline applied to messier, real-world data.

Model Architecture

Both notebooks use the same neural network — FraudNet:

FraudNet (PyTorch MLP)
├── Linear(input_features → 32) + ReLU
├── Linear(32 → 16) + ReLU
└── Linear(16 → 2)  [binary classification output]
  • Input: Scaled numeric features (StandardScaler applied).
  • Output: 2-class softmax (Legitimate vs Fraudulent).
  • Optimizer: Adam (lr=0.01)
  • Loss: CrossEntropyLoss
  • Training: Mini-batch (batch_size=64), 25 epochs for standard model.

Attack: Fast Gradient Sign Method (FGSM)

Type: White-box evasion attack (the attacker has access to model gradients).

How it works:

  1. Compute the gradient of the loss function with respect to the input features.
  2. Perturb each input feature by a small amount (eps) in the direction that maximizes the loss.
  3. The result: inputs that look nearly identical to the original data but cause the model to misclassify.

Configuration used: eps = 0.35 (Fraud_Security_Demo) and multiple epsilon values [0.1, 0.25, 0.35, 0.5] (Kaggle notebook adversarial training).

Why FGSM? It's fast, well-understood, and represents the minimum sophistication an attacker needs. If a model can't withstand FGSM, it certainly can't survive more advanced attacks (PGD, C&W, etc.).


Standard Model vs Secured Model

Standard Model (Vulnerable)

  • Trained only on clean, historical data.
  • Learns the steepest path to high accuracy on normal inputs.
  • Creates rigid, brittle decision boundaries with exploitable blind spots.
  • Result: Achieves ~98% accuracy on clean data but collapses to ~15-20% under FGSM attack.

Secured Model (Adversarially Trained)

  • Trained on both clean data AND adversarial examples generated during training.
  • The training loop iteratively:
    1. Wraps the model in ART's PyTorchClassifier.
    2. Generates FGSM adversarial examples against the model's current state.
    3. Combines clean + adversarial data and retrains.
    4. Repeats for multiple rounds (3-5 rounds with varying epsilon strengths).
  • Forces the model to smooth out decision boundaries and eliminate blind spots.
  • Result: Maintains ~85-92% accuracy even under the same FGSM attack, while conceding only ~1-2% on clean data.

How We Test Both Models

Both models face the exact same evaluation protocol:

Test What it measures
Clean Accuracy Model performance on unmodified test data (baseline capability)
Adversarial Accuracy Model performance on FGSM-perturbed test data (attack resilience)
Accuracy Drop Difference between clean and adversarial accuracy (vulnerability magnitude)

The attack is generated specifically targeting each model — the robust model isn't just tested on the standard model's adversarial examples. ART generates a fresh FGSM attack using the robust model's own gradients, making it a fair and rigorous test.


Results Summary

Metric Standard AI Secured AI
Clean Accuracy ~98% ~96-98%
Accuracy Under FGSM Attack ~15-20% (catastrophic) ~85-92% (resilient)
Accuracy Drop ~78-80 pp ~4-10 pp

Key insight: The standard model is worse than a coin flip under attack — it's being actively steered toward wrong decisions. The secured model maintains operational integrity.


Business Implications

Both notebooks include a dynamic "Business Impact" cell that computes real-world implications from the actual run results:

  • Fraud leakage reduction: ~74-85% fewer missed fraudulent transactions under attack.
  • Robustness trade-off: Conceding ~1-2% clean accuracy to eliminate a ~80% vulnerability gap.
  • Operational scenario: For a bank processing 10,000 daily transactions (15% fraud rate), the difference between ~500+ missed frauds per day (standard) vs ~100-150 missed (secured).

Setup & Installation

# Clone the repository
git clone https://github.com/karthik-dataiq/model_security_study.git

# Set up the project
cd model_security_study
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Requirements

  • Python 3.10+
  • PyTorch
  • adversarial-robustness-toolbox (ART v1.17+)
  • scikit-learn
  • pandas, numpy
  • plotly (interactive charts)
  • matplotlib, seaborn (legacy)

Exporting Notebooks as HTML (for Presentations)

Both notebooks can be exported as standalone, self-contained HTML files with fully interactive Plotly charts — no Jupyter or Python required to view them.

# Activate virtual environment
source .venv/bin/activate

# Install export dependencies (first time only)
pip install nbconvert ipykernel
python -m ipykernel install --user --name python3

# Export both notebooks (--execute re-runs all cells to capture fresh output)
jupyter nbconvert --to html --execute code/Fraud_Security_Demo.ipynb --output-dir=exports --ExecutePreprocessor.timeout=300
jupyter nbconvert --to html --execute code/Kaggle_Fraud_Security_Demo.ipynb --output-dir=exports --ExecutePreprocessor.timeout=300

The --execute flag is important — it ensures the Plotly charts are rendered with the notebook renderer (pio.renderers.default = "notebook") which embeds the full Plotly JavaScript directly into the HTML output (~5 MB per file).

Output files:

  • exports/Fraud_Security_Demo.html
  • exports/Kaggle_Fraud_Security_Demo.html

Open in any browser for full interactivity (hover tooltips, zoom, pan).

Alternatively, in VS Code: Ctrl+Shift+P"Jupyter: Export to HTML".


Key Takeaway

ART provides a practical, framework-agnostic way to stress-test and harden ML models before deployment. For any model operating in a critical domain (finance, healthcare, autonomous systems), adversarial robustness is not optional — it's a deployment requirement. This project demonstrates that with minimal additional training cost (~1-2% clean accuracy), a model can become dramatically more resilient to mathematically-informed evasion attacks.

About

Implementing IBM's Adversarial Robustness Toolbox to secure the model

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors