# Creating New MIA Attackers in Synth-MIA

This notebook demonstrates how to create custom Membership Inference Attack (MIA) implementations using the Synth-MIA framework.

## Overview

The synth-mia library provides a robust framework for implementing MIA attacks through the `BaseAttacker` class. All custom attackers inherit from this base class, which provides:

- Standardized data validation and preprocessing
- Built-in evaluation metrics (ROC, classification, privacy, epsilon)
- Consistent API across all attackers
- Parameter management

**Key insight: You only need to implement one method - `_compute_attack_scores()`**

In [1]:
import numpy as np
import pandas as pd
from typing import Optional
from scipy.spatial import cKDTree

# Import the base class
from synth_mia.base import BaseAttacker

## Understanding the BaseAttacker Class

Let's examine the methods available in the BaseAttacker class:

In [2]:
print("BaseAttacker provides these methods:")
for method in dir(BaseAttacker):
    if not method.startswith('_') or method == '__init__':
        print(f"  - {method}")

print("\nKey point: You only need to implement _compute_attack_scores()")
print("All other functionality is inherited automatically.")

BaseAttacker provides these methods:
  - __init__
  - attack
  - eval
  - get_properties

Key point: You only need to implement _compute_attack_scores()
All other functionality is inherited automatically.


## Example: Distance to Closest Record (DCR) Attack

We will implement the Distance to Closest Record (DCR) attack to demonstrate the simplicity of creating custom attackers. This attack computes the distance from each test point to its closest synthetic record.

In [3]:
class MyDCR(BaseAttacker):
    """
    Distance to Closest Record (DCR) attack implementation.
    
    The attack hypothesis is that if a test point is very close to a synthetic record,
    it is more likely to be a member of the original training data.
    """
    
    def __init__(self, distance_type=2):
        """Initialize DCR attacker.
        
        Args:
            distance_type: Type of distance metric (default: 2 for L2 norm)
        """
        super().__init__(distance_type=distance_type)
        self.name = "MyDCR"  # Set name for identification in results
    
    def _compute_attack_scores(self, X_test: np.ndarray, synth: np.ndarray, 
                              ref: Optional[np.ndarray] = None) -> np.ndarray:
        """
        This is the only method that needs to be implemented for a custom attacker.
        
        Compute attack scores based on distance to closest synthetic record.
        
        Args:
            X_test: Test data (member and non-member combined)
            synth: Synthetic data
            ref: Reference data (not used in DCR)
            
        Returns:
            Attack scores (higher scores indicate higher likelihood of membership)
        """
        # Build efficient tree for nearest neighbor search
        tree = cKDTree(synth)
        
        # Find distance to closest synthetic record for each test point
        distances, _ = tree.query(X_test, k=1, p=self.distance_type)
        
        # Return negative distances (closer distance = higher membership score)
        return -distances

print("MyDCR attacker implemented successfully.")

MyDCR attacker implemented successfully.


## Load Test Data

In [4]:
# Load example data
mem = pd.read_csv('../example_data/housing/mem.csv').values
non_mem = pd.read_csv('../example_data/housing/non_mem.csv').values
synth = pd.read_csv('../example_data/housing/synth.csv').values
ref = pd.read_csv('../example_data/housing/ref.csv').values

print(f"Data shapes: mem={mem.shape}, non_mem={non_mem.shape}, synth={synth.shape}")

Data shapes: mem=(200, 9), non_mem=(200, 9), synth=(200, 9)


## Test the Custom Attacker

Now we test our custom DCR implementation:

In [5]:
# Create our custom attacker
my_dcr = MyDCR(distance_type=2)

print("Running attack...")
# The attack() method is inherited - we get it automatically
true_labels, scores = my_dcr.attack(mem, non_mem, synth, ref)

print("Evaluating results...")
# The eval() method is also inherited
results = my_dcr.eval(true_labels, scores, metrics=['roc'])

print(f"\nResults:")
print(results)

Running attack...
Evaluating results...

Results:
{'auc_roc': 0.5274249999999999, 'tpr_at_fpr_0': 0.025, 'tpr_at_fpr_0.001': 0.025, 'tpr_at_fpr_0.01': 0.035, 'tpr_at_fpr_0.1': 0.145}


## Key Takeaways

### Creating a Custom MIA Attacker

The process of creating a new MIA attacker is straightforward:

1. **Inherit from `BaseAttacker`**
2. **Set `self.name` in `__init__()`** for identification in results
3. **Implement `_compute_attack_scores()`** - this is the only required method
4. **All other functionality is inherited**: data validation, evaluation metrics, consistent API

### `_compute_attack_scores()` Requirements

This method must:
- **Accept**: `X_test` (test data), `synth` (synthetic data), `ref` (reference data, optional)
- **Return**: Array of scores where higher scores indicate higher likelihood of membership
- **Implement**: The core logic of your attack strategy