<a href="https://colab.research.google.com/github/kk25081998/Modelguard/blob/main/examples/1-Detect-Bad-Models.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 🛡️ ModelGuard Quick Start: Detect Bad Models

Welcome to ModelGuard! This notebook demonstrates how to protect your ML applications from malicious model files.

**What you'll learn:**
- How malicious ML models can execute arbitrary code
- How to detect and block dangerous models
- Safe loading techniques for PyTorch, TensorFlow, and scikit-learn
- Real-world security best practices

**No setup required** - this runs entirely in Google Colab!

## 📦 Installation

First, let's install ModelGuard and the ML frameworks we'll use for demonstration:

In [None]:
# Install ModelGuard and dependencies
!pip install ml-modelguard torch scikit-learn --quiet

print("✅ Installation complete!")

## 🚨 The Problem: Malicious Models

ML models saved with Python's `pickle` format can contain arbitrary code that executes when loaded. Let's create a "malicious" model to demonstrate this risk:

In [None]:
import pickle
import tempfile
import os
from pathlib import Path

# Create a malicious model that executes code when loaded
class MaliciousModel:
    def __reduce__(self):
        # This code will execute when the model is unpickled!
        return (print, ("🚨 DANGER: Malicious code executed! This could have been anything...",))

# Save the malicious model
malicious_path = "/tmp/malicious_model.pkl"
with open(malicious_path, 'wb') as f:
    pickle.dump(MaliciousModel(), f)

print(f"Created malicious model at: {malicious_path}")
print("⚠️  This model will execute code when loaded with standard pickle.load()")

### Demonstrating the Danger

Let's see what happens when we load this model with standard Python pickle:

In [None]:
# WARNING: This demonstrates unsafe loading - DON'T do this with untrusted models!
print("Loading malicious model with standard pickle...")

with open(malicious_path, 'rb') as f:
    dangerous_model = pickle.load(f)  # Code executes here!

print("\n💀 As you can see, the malicious code executed automatically!")
print("   In a real attack, this could:")
print("   - Steal your data")
print("   - Install malware")
print("   - Access your cloud credentials")
print("   - Anything the attacker wants!")

## ✅ The Solution: ModelGuard

Now let's see how ModelGuard protects you from these attacks:

In [None]:
from modelguard.core.scanner import ModelScanner
from pathlib import Path

# Scan the malicious model
scanner = ModelScanner()
result = scanner.scan_file(Path(malicious_path))

print("🔍 ModelGuard Scan Results:")
print(f"   Is Safe: {result.is_safe}")
print(f"   Threats Found: {len(result.threats)}")
print(f"   Threat Details: {result.threats}")

if not result.is_safe:
    print("\n🛡️ ModelGuard detected the malicious content and blocked it!")
else:
    print("\n⚠️  Model appears safe (this shouldn't happen with our malicious example)")

## 🔒 Safe Loading with ModelGuard

Let's try to load the malicious model using ModelGuard's safe loading:

**Note**: ModelGuard has different enforcement modes. By default, it only logs warnings. For this demo, we'll enable strict enforcement mode to actually block malicious models.

In [None]:
from modelguard import sklearn
import os

print("Attempting to load malicious model with ModelGuard...")

# Enable enforcement mode for this demonstration
os.environ["MODELGUARD_ENFORCE"] = "true"
os.environ["MODELGUARD_SCAN_ON_LOAD"] = "true"

try:
    # This should fail safely
    model = sklearn.safe_load(malicious_path)
    print("❌ Unexpected: Model loaded (this shouldn't happen!)")
except Exception as e:
    print(f"✅ ModelGuard blocked the malicious model: {type(e).__name__}")
    print(f"   Error details: {str(e)[:100]}...")
    print("\n🛡️ Your system is protected!")
finally:
    # Clean up environment variables
    os.environ.pop("MODELGUARD_ENFORCE", None)
    os.environ.pop("MODELGUARD_SCAN_ON_LOAD", None)

## ✅ Working with Legitimate Models

Now let's create and work with legitimate models to show that ModelGuard doesn't interfere with normal usage:

In [None]:
import torch
import torch.nn as nn
from sklearn.linear_model import LinearRegression
import numpy as np

# Create a legitimate PyTorch model
class SimpleNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear = nn.Linear(10, 1)
    
    def forward(self, x):
        return self.linear(x)

pytorch_model = SimpleNet()
pytorch_path = "/tmp/safe_pytorch_model.pth"
torch.save(pytorch_model.state_dict(), pytorch_path)

# Create a legitimate scikit-learn model
X = np.random.randn(100, 5)
y = np.random.randn(100)
sklearn_model = LinearRegression().fit(X, y)
sklearn_path = "/tmp/safe_sklearn_model.pkl"

with open(sklearn_path, 'wb') as f:
    pickle.dump(sklearn_model, f)

print("✅ Created legitimate models:")
print(f"   PyTorch model: {pytorch_path}")
print(f"   Scikit-learn model: {sklearn_path}")

### Scanning Legitimate Models

In [None]:
# Scan the legitimate models
print("🔍 Scanning legitimate models...\n")

for name, path in [("PyTorch", pytorch_path), ("Scikit-learn", sklearn_path)]:
    result = scanner.scan_file(Path(path))
    print(f"{name} Model:")
    print(f"   ✅ Is Safe: {result.is_safe}")
    print(f"   🔍 Threats: {len(result.threats)}")
    if result.threats:
        print(f"   ⚠️  Details: {result.threats}")
    print()

### Safe Loading of Legitimate Models

In [None]:
from modelguard import torch as safe_torch
from modelguard import sklearn as safe_sklearn

print("🔒 Loading legitimate models with ModelGuard...\n")

# Load PyTorch model safely
try:
    loaded_pytorch = safe_torch.safe_load(pytorch_path)
    print("✅ PyTorch model loaded successfully!")
    print(f"   Model keys: {list(loaded_pytorch.keys())[:3]}...")
except Exception as e:
    print(f"❌ PyTorch loading failed: {e}")

# Load scikit-learn model safely
try:
    loaded_sklearn = safe_sklearn.safe_load(sklearn_path)
    print("\n✅ Scikit-learn model loaded successfully!")
    print(f"   Model type: {type(loaded_sklearn).__name__}")
    print(f"   Coefficients shape: {loaded_sklearn.coef_.shape}")
except Exception as e:
    print(f"❌ Scikit-learn loading failed: {e}")

## 🎯 Advanced: Context Manager (Recommended)

The most convenient way to use ModelGuard is with the context manager, which automatically secures all model loading:

In [None]:
import modelguard
import torch
import joblib

print("🎯 Using ModelGuard context manager...\n")

# Set enforcement policy for this demo
import os
os.environ["MODELGUARD_ENFORCE"] = "true"
os.environ["MODELGUARD_SCAN_ON_LOAD"] = "true"

# Everything inside this context is automatically secured
with modelguard.patched():
    print("1. Loading PyTorch model with torch.load()...")
    try:
        model = torch.load(pytorch_path)  # Automatically secured!
        print("   ✅ Success! PyTorch model loaded safely")
    except Exception as e:
        print(f"   ❌ Failed: {e}")
    
    print("\n2. Loading scikit-learn model with joblib.load()...")
    try:
        # Save model with joblib for this test
        joblib_path = "/tmp/sklearn_joblib_model.pkl"
        joblib.dump(sklearn_model, joblib_path)
        
        model = joblib.load(joblib_path)  # Automatically secured!
        print("   ✅ Success! Scikit-learn model loaded safely")
    except Exception as e:
        print(f"   ❌ Failed: {e}")

print("\n🛡️ Context manager provides seamless protection for framework loaders!")
print("💡 Note: For direct pickle.load(), use modelguard.sklearn.safe_load() instead")

# Clean up environment
os.environ.pop("MODELGUARD_ENFORCE", None)
os.environ.pop("MODELGUARD_SCAN_ON_LOAD", None)

## ⚙️ Policy Configuration

ModelGuard allows you to configure security policies for your organization:

In [None]:
from modelguard.core.policy import Policy, PolicyConfig
import os

print("⚙️ Configuring ModelGuard policies...\n")

# Method 1: Environment variables
os.environ["MODELGUARD_ENFORCE"] = "true"
os.environ["MODELGUARD_SCAN_ON_LOAD"] = "true"
os.environ["MODELGUARD_MAX_FILE_SIZE_MB"] = "100"

print("Environment-based policy:")
print(f"   MODELGUARD_ENFORCE: {os.environ.get('MODELGUARD_ENFORCE')}")
print(f"   MODELGUARD_SCAN_ON_LOAD: {os.environ.get('MODELGUARD_SCAN_ON_LOAD')}")
print(f"   MODELGUARD_MAX_FILE_SIZE_MB: {os.environ.get('MODELGUARD_MAX_FILE_SIZE_MB')}")

# Method 2: Programmatic configuration
config = PolicyConfig(
    enforce=True,
    require_signatures=False,  # Set to False for this demo
    scan_on_load=True,
    max_file_size_mb=50
)

policy = Policy(config)

print("\nProgrammatic policy:")
print(f"   Enforce mode: {policy.should_enforce()}")
print(f"   Scan on load: {policy.should_scan()}")
print(f"   Max file size: {policy.get_max_file_size()} bytes")
print(f"   Requires signatures: {policy.requires_signatures()}")

## 🌍 Real-World Usage Patterns

Here are some common patterns for using ModelGuard in production:

In [None]:
print("🌍 Real-world usage patterns:\n")

# Pattern 1: Drop-in replacement
print("1. Drop-in Replacement Pattern:")
print("   # Before: import torch")
print("   # After:  import modelguard.torch as torch")
print("   # All torch.load() calls are now secured!")

# Pattern 2: Explicit safe loading
print("\n2. Explicit Safe Loading Pattern:")
print("   model = modelguard.torch.safe_load('model.pth')")
print("   # Clear intent, explicit security")

# Pattern 3: Context manager for mixed loading
print("\n3. Context Manager Pattern:")
print("   with modelguard.patched():")
print("       # All model loading is secured")
print("       pytorch_model = torch.load('model.pth')")
print("       sklearn_model = pickle.load(open('model.pkl', 'rb'))")

# Pattern 4: Enterprise security
print("\n4. Enterprise Security Pattern:")
print("   # Set strict policies via environment")
print("   export MODELGUARD_ENFORCE=true")
print("   export MODELGUARD_REQUIRE_SIGNATURES=true")
print("   # All applications automatically secured")

print("\n✨ Choose the pattern that fits your workflow!")

## ⚡ Performance Demonstration

Let's measure ModelGuard's performance impact:

In [None]:
import time

print("⚡ Performance comparison...\n")

# Create a larger model for testing
large_model = {
    'weights': np.random.randn(1000, 1000).tolist(),
    'bias': np.random.randn(1000).tolist(),
    'metadata': {'version': '1.0', 'framework': 'test'}
}

large_model_path = "/tmp/large_model.pkl"
with open(large_model_path, 'wb') as f:
    pickle.dump(large_model, f)

file_size = os.path.getsize(large_model_path) / (1024 * 1024)  # MB
print(f"Test model size: {file_size:.1f} MB")

# Time scanning
start_time = time.time()
scan_result = scanner.scan_file(Path(large_model_path))
scan_time = time.time() - start_time

print(f"\n🔍 Scanning performance:")
print(f"   Time: {scan_time*1000:.1f} ms")
print(f"   Rate: {file_size/scan_time:.1f} MB/s")
print(f"   Result: {'✅ Safe' if scan_result.is_safe else '❌ Unsafe'}")

# Time safe loading
start_time = time.time()
try:
    loaded_model = sklearn.safe_load(large_model_path)
    load_time = time.time() - start_time
    print(f"\n🔒 Safe loading performance:")
    print(f"   Time: {load_time*1000:.1f} ms")
    print(f"   Rate: {file_size/load_time:.1f} MB/s")
    print(f"   Status: ✅ Success")
except Exception as e:
    print(f"\n🔒 Safe loading: ❌ {e}")

print(f"\n🚀 ModelGuard adds minimal overhead while providing comprehensive security!")

## 🧹 Cleanup

In [None]:
# Clean up temporary files
for path in [malicious_path, pytorch_path, sklearn_path, large_model_path]:
    if os.path.exists(path):
        os.remove(path)

print("🧹 Cleanup complete!")

## 🎉 Summary

Congratulations! You've learned how to protect your ML applications with ModelGuard:

### ✅ What We Covered
- **The Risk**: ML models can contain malicious code that executes when loaded
- **Detection**: ModelGuard scans models for dangerous patterns
- **Protection**: Safe loading prevents malicious code execution
- **Flexibility**: Multiple usage patterns for different workflows
- **Performance**: Minimal overhead with comprehensive security

### 🚀 Next Steps
1. **Install ModelGuard** in your projects: `pip install ml-modelguard`
2. **Choose your pattern**: Drop-in replacement, explicit safe loading, or context manager
3. **Configure policies**: Set up organizational security policies
4. **Stay secure**: Always scan models from untrusted sources

### 📚 Learn More
- **GitHub**: https://github.com/kk25081998/Modelguard
- **PyPI**: https://pypi.org/project/ml-modelguard/
- **Documentation**: Check the README for advanced features

### 🛡️ Remember
**Never load untrusted models without ModelGuard protection!**

---
*Made with ❤️ for the ML community's security*