# Day 71: Model Extraction Defense

Model Extraction attacks involve an adversary querying a model to train a 'shadow model' that replicates its behavior. This compromises IP and can bypass safety guardrails.

In this lab, we implement:
1. **Rate Limiting**: Throttling users who query too frequently.
2. **Watermarking**: Deterministically perturbing outputs so they carry a 'fingerprint'.
3. **Query Monitoring**: Detecting high-entropy queries near decision boundaries.

In [None]:
import sys
import os
import numpy as np

# Add root directory to sys.path
sys.path.append(os.path.abspath('../../'))

from src.security.extraction_defense import ExtractionDefender, QueryMonitor

## 1. Rate Limiting

We simulate a user attempting to scrape the model.

In [None]:
defender = ExtractionDefender(rate_limit=3, window_seconds=10)
user_id = "scraper_bot"

for i in range(5):
    allowed = defender.check_rate_limit(user_id)
    status = "ALLOWED" if allowed else "BLOCKED"
    print(f"Query {i+1}: {status}")

## 2. Output Watermarking

We inject a subtle, deterministic bias into the probabilities.

In [None]:
probs = np.array([[0.9, 0.1], [0.4, 0.6]])
watermarked = defender.apply_watermark(probs)

print("Original Probs:\n", probs)
print("Watermarked Probs:\n", watermarked)
print("Shift:", watermarked - probs)

## 3. Query Suspicion (Decision Boundaries)

Adversaries often target the decision boundary ($P \approx 0.5$). We use entropy to detect this.

In [None]:
monitor = QueryMonitor()

# Case A: Informative/Safe queries (high confidence)
safe_batch = np.array([[0.99, 0.01], [0.98, 0.02]])

# Case B: Extraction-style queries (boundary target)
extraction_batch = np.array([[0.51, 0.49], [0.48, 0.52]])

print(f"Safe Batch Suspicion: {monitor.estimate_query_suspicion(safe_batch):.4f}")
print(f"Extraction Batch Suspicion: {monitor.estimate_query_suspicion(extraction_batch):.4f}")