# L0 regularization for sparse weights

L0 regularization is a powerful technique that creates sparse weights during calibration, effectively reducing the dataset size by setting many weights to zero. This is particularly useful when:

- You need to reduce computational costs in downstream processing
- You want to identify the most important records in your dataset
- You need a smaller, representative sample that still matches population targets

## How L0 regularization works

L0 regularization adds a penalty term to the calibration loss that encourages weights to be exactly zero. Unlike L1 regularization (which shrinks weights), L0 creates truly sparse solutions by using a differentiable approximation of the L0 norm through the Hard Concrete distribution.

In [18]:
from microcalibrate import Calibration
import numpy as np
import pandas as pd
import logging

calibration_logger = logging.getLogger("microcalibrate.calibration")
calibration_logger.setLevel(logging.WARNING)

np.random.seed(42)

## Example 1: Basic L0 regularization

Let's create a synthetic dataset and apply L0 regularization to reduce its size while maintaining calibration accuracy.

In [19]:
# Create synthetic data
n_samples = 5000
n_targets = 10

# Generate random data with some structure
age_groups = np.random.choice(['18-30', '31-50', '51-65', '65+'], n_samples)
income = np.random.lognormal(10.5, 0.8, n_samples)  # Log-normal income distribution
employed = np.random.binomial(1, 0.65, n_samples)

# Create estimate matrix with various demographic combinations
estimate_matrix = pd.DataFrame()
for age in ['18-30', '31-50', '51-65', '65+']:
    mask = age_groups == age
    estimate_matrix[f'income_{age}'] = mask * income
    estimate_matrix[f'employed_{age}'] = mask * employed

estimate_matrix['total_income'] = income
estimate_matrix['total_employed'] = employed

# Set realistic targets (scaled population values)
targets = estimate_matrix.sum().values * 1.1  # 10% higher than unweighted

print(f"Dataset size: {n_samples} records")
print(f"Number of targets: {len(targets)}")
print(f"Target names: {list(estimate_matrix.columns)}")

Dataset size: 5000 records
Number of targets: 10
Target names: ['income_18-30', 'employed_18-30', 'income_31-50', 'employed_31-50', 'income_51-65', 'employed_51-65', 'income_65+', 'employed_65+', 'total_income', 'total_employed']


## Comparing standard vs L0 calibration

In [20]:
# Standard calibration (no sparsity)
weights_init = np.ones(n_samples)

cal_standard = Calibration(
    weights=weights_init.copy(),
    targets=targets,
    estimate_matrix=estimate_matrix,
    epochs=200,
    learning_rate=1e-3,
    regularize_with_l0=False
)

print("Running standard calibration...")
perf_standard = cal_standard.calibrate()
weights_standard = cal_standard.weights

print(f"\nStandard calibration results:")
print(f"Non-zero weights: {np.sum(weights_standard != 0)} ({100*np.mean(weights_standard != 0):.1f}%)")
print(f"Weight range: [{weights_standard.min():.3f}, {weights_standard.max():.3f}]")

Running standard calibration...


Reweighting progress: 100%|██████████| 200/200 [00:00<00:00, 2660.14epoch/s, loss=13.2, weights_mean=5.1, weights_std=2.42, weights_min=0.842]


Standard calibration results:
Non-zero weights: 5000 (100.0%)
Weight range: [0.835, 9.164]





In [21]:
# L0 regularized calibration
cal_l0 = Calibration(
    weights=weights_init.copy(),
    targets=targets,
    estimate_matrix=estimate_matrix,
    epochs=200,
    learning_rate=1e-3,
    regularize_with_l0=True,
    l0_lambda=5e-6,      # Regularization strength
    init_mean=0.999,     # Start with most weights active
    temperature=0.5,     # Controls sparsity gradient
)

print("Running L0 regularized calibration...")
perf_l0 = cal_l0.calibrate()
weights_l0 = cal_l0.sparse_weights

print(f"\nL0 calibration results:")
print(f"Non-zero weights: {np.sum(weights_l0 != 0)} ({100*np.mean(weights_l0 != 0):.1f}%)")
print(f"Dataset reduction: {100*(1 - np.mean(weights_l0 != 0)):.1f}%")
print(f"Weight range: [{weights_l0[weights_l0>0].min():.3f}, {weights_l0.max():.3f}]")

Running L0 regularized calibration...


Reweighting progress: 100%|██████████| 200/200 [00:00<00:00, 1776.92epoch/s, loss=13.5, weights_mean=5.13, weights_std=2.43, weights_min=0.84] 
Sparse reweighting progress: 100%|██████████| 400/400 [00:00<00:00, 722.65epoch/s, loss=0.0103, loss_rel_change=-0.691]



L0 calibration results:
Non-zero weights: 1998 (40.0%)
Dataset reduction: 60.0%
Weight range: [0.010, 14.061]


# Hyperparameter tuning for L0 regularization

Finding the optimal L0 regularization parameters is crucial for achieving the right balance between sparsity and calibration accuracy. This notebook demonstrates how to use the automatic hyperparameter tuning feature to find the best parameters for your specific dataset.

## Why hyperparameter tuning matters

L0 regularization has three key parameters that interact in complex ways:
- **l0_lambda**: Controls the strength of sparsity penalty
- **init_mean**: Sets the initial proportion of active weights
- **temperature**: Determines how "hard" the sparsity decisions are

Manual tuning can be time-consuming and may miss optimal combinations. The automatic tuning uses Optuna to efficiently search the parameter space.

## Basic hyperparameter tuning

Let's start with a simple tuning run to find good L0 parameters. The tuning process will:
1. Create multiple holdout sets for cross-validation
2. Try different parameter combinations
3. Evaluate each combination on both training and validation targets
4. Select the best parameters based on a multi-objective criterion

In [22]:
# Initialize calibration object
weights_init = np.ones(n_samples)

cal = Calibration(
    weights=weights_init,
    targets=targets,
    estimate_matrix=estimate_matrix,
    epochs=100,  # Will be overridden during tuning
    learning_rate=1e-3,
)

print("Starting hyperparameter tuning...")
print("This will take a few minutes as it explores different parameter combinations.\n")

# Run hyperparameter tuning
best_params = cal.tune_l0_hyperparameters(
    n_trials=20,  # Number of parameter combinations to try
    objectives_balance={
        'loss': 1.0,       # Weight for calibration loss
        'accuracy': 100.0, # Weight for accuracy (targets within 10%)
        'sparsity': 10.0,  # Weight for sparsity
    },
    n_holdout_sets=3,      # Number of cross-validation folds
    holdout_fraction=0.2,  # Fraction of targets to hold out
    epochs_per_trial=50,   # Epochs per trial (faster for tuning)
)

print("\n" + "="*50)
print("Tuning completed!")
print("="*50)

INFO:microcalibrate.hyperparameter_tuning:Multi-holdout hyperparameter tuning:
  - 3 holdout sets
  - 2 targets per holdout (20.0%)
  - Aggregation: mean



Starting hyperparameter tuning...
This will take a few minutes as it explores different parameter combinations.



  0%|          | 0/20 [00:00<?, ?it/s]

Reweighting progress: 100%|██████████| 50/50 [00:00<00:00, 1549.46epoch/s, loss=17.2, weights_mean=5.7, weights_std=2.74, weights_min=0.962]
Sparse reweighting progress: 100%|██████████| 100/100 [00:00<00:00, 347.13epoch/s, loss=0.0279, loss_rel_change=-0.485]
Reweighting progress: 100%|██████████| 50/50 [00:00<00:00, 1961.55epoch/s, loss=68, weights_mean=10.2, weights_std=3.83, weights_min=1.05]
Sparse reweighting progress: 100%|██████████| 100/100 [00:00<00:00, 357.60epoch/s, loss=0.0297, loss_rel_change=-0.998]
Reweighting progress: 100%|██████████| 50/50 [00:00<00:00, 1983.82epoch/s, loss=148, weights_mean=14.5, weights_std=4.58, weights_min=1.48]
Sparse reweighting progress: 100%|██████████| 100/100 [00:00<00:00, 296.82epoch/s, loss=0.0652, loss_rel_change=-0.999]
INFO:microcalibrate.hyperparameter_tuning:Trial 0:
  Objectives by holdout: ['109.9197', '110.0278', '10.0005']
  Mean objective: 76.6493
  Mean val accuracy: 33.33% (±47.14%)
  Sparsity: 0.00%
Reweighting progress: 100%


Tuning completed!


## Analyzing tuning results

In [23]:
# Display best parameters
print("Best parameters found:")
print(f"  l0_lambda: {best_params['l0_lambda']:.2e}")
print(f"  init_mean: {best_params['init_mean']:.4f}")
print(f"  temperature: {best_params['temperature']:.2f}")
print()
print("Performance metrics:")
print(f"  Mean validation loss: {best_params['mean_val_loss']:.6f} (±{best_params['std_val_loss']:.6f})")
print(f"  Mean validation accuracy: {best_params['mean_val_accuracy']:.1%} (±{best_params['std_val_accuracy']:.1%})")
print(f"  Sparsity achieved: {best_params['sparsity']:.1%}")
print()
print("Cross-validation results:")
print(f"  Holdout objectives: {best_params['holdout_objectives']}")
print(f"  Number of holdout sets: {best_params['n_holdout_sets']}")
print(f"  Aggregation method: {best_params['aggregation']}")

Best parameters found:
  l0_lambda: 7.43e-05
  init_mean: 0.8443
  temperature: 1.79

Performance metrics:
  Mean validation loss: 0.002145 (±0.000429)
  Mean validation accuracy: 100.0% (±0.0%)
  Sparsity achieved: 0.0%

Cross-validation results:
  Holdout objectives: [np.float64(9.999662299386227), np.float64(10.002067347057164), np.float64(9.998705346327275)]
  Number of holdout sets: 3
  Aggregation method: mean


## Applying the best parameters

Now let's apply the best parameters found through tuning and compare with default parameters.

In [24]:
# Calibration with tuned parameters
cal_tuned = Calibration(
    weights=weights_init.copy(),
    targets=targets,
    estimate_matrix=estimate_matrix,
    epochs=200,
    learning_rate=1e-3,
    regularize_with_l0=True,
    l0_lambda=best_params['l0_lambda'],
    init_mean=best_params['init_mean'],
    temperature=best_params['temperature'],
)

print("Calibrating with tuned parameters...")
perf_tuned = cal_tuned.calibrate()
weights_tuned = cal_tuned.sparse_weights

# Calibration with default parameters
cal_default = Calibration(
    weights=weights_init.copy(),
    targets=targets,
    estimate_matrix=estimate_matrix,
    epochs=200,
    learning_rate=1e-3,
    regularize_with_l0=True,
    l0_lambda=5e-6,  # Default
    init_mean=0.999,  # Default
    temperature=0.5,  # Default
)

print("Calibrating with default parameters...")
perf_default = cal_default.calibrate()
weights_default = cal_default.sparse_weights

print("\nComparison complete!")

Calibrating with tuned parameters...


Reweighting progress: 100%|██████████| 200/200 [00:00<00:00, 2654.50epoch/s, loss=12.8, weights_mean=5.02, weights_std=2.41, weights_min=0.84]
Sparse reweighting progress: 100%|██████████| 400/400 [00:00<00:00, 678.76epoch/s, loss=0.105, loss_rel_change=-0.786]


Calibrating with default parameters...


Reweighting progress: 100%|██████████| 200/200 [00:00<00:00, 2813.69epoch/s, loss=13.1, weights_mean=5.09, weights_std=2.43, weights_min=0.842]
Sparse reweighting progress: 100%|██████████| 400/400 [00:00<00:00, 593.57epoch/s, loss=0.0103, loss_rel_change=-0.691]



Comparison complete!


In [25]:
# Compare results
def evaluate_calibration(weights, estimate_matrix, targets, label):
    estimates = (estimate_matrix.T * weights).sum(axis=1).values
    rel_errors = np.abs((estimates - targets) / targets)
    
    return {
        'Label': label,
        'Non-zero weights': np.sum(weights != 0),
        'Sparsity': f"{100 * np.mean(weights == 0):.1f}%",
        'Mean rel error': f"{np.mean(rel_errors):.4f}",
        'Max rel error': f"{np.max(rel_errors):.4f}",
        'Within 1%': f"{100 * np.mean(rel_errors < 0.01):.1f}%",
        'Within 5%': f"{100 * np.mean(rel_errors < 0.05):.1f}%",
        'Within 10%': f"{100 * np.mean(rel_errors < 0.10):.1f}%",
    }

comparison = pd.DataFrame([
    evaluate_calibration(weights_default, estimate_matrix, targets, 'Default params'),
    evaluate_calibration(weights_tuned, estimate_matrix, targets, 'Tuned params'),
])

print("\nParameter comparison:")
print(comparison.to_string(index=False))

print("\n" + "="*50)
print("Improvement summary:")
sparsity_default = np.mean(weights_default == 0)
sparsity_tuned = np.mean(weights_tuned == 0)
print(f"Sparsity improvement: {sparsity_tuned - sparsity_default:.1%}")
print(f"Dataset reduction: {100*sparsity_tuned:.1f}% of records can be dropped")
print(f"Remaining records: {np.sum(weights_tuned != 0):,} out of {len(weights_tuned):,}")


Parameter comparison:
         Label  Non-zero weights Sparsity Mean rel error Max rel error Within 1% Within 5% Within 10%
Default params              1998    60.0%         0.0250        0.0620     30.0%     90.0%     100.0%
  Tuned params              1372    72.6%         0.1008        0.1155      0.0%      0.0%      40.0%

Improvement summary:
Sparsity improvement: 12.5%
Dataset reduction: 72.6% of records can be dropped
Remaining records: 1,372 out of 5,000


## Advanced tuning with custom objectives

You can customize the tuning process by adjusting the objective balance. Here's how different balances affect the results:

In [26]:
# Different objective balances for different use cases
objective_configs = {
    'Accuracy-focused': {'loss': 1.0, 'accuracy': 200.0, 'sparsity': 1.0},
    'Sparsity-focused': {'loss': 1.0, 'accuracy': 50.0, 'sparsity': 50.0},
    'Balanced': {'loss': 1.0, 'accuracy': 100.0, 'sparsity': 10.0},
}

results = []

for name, objectives in objective_configs.items():
    print(f"\nTuning with {name} objectives...")
    
    cal_temp = Calibration(
        weights=weights_init.copy(),
        targets=targets,
        estimate_matrix=estimate_matrix,
        epochs=100,
        learning_rate=1e-3,
    )
    
    params = cal_temp.tune_l0_hyperparameters(
        n_trials=10,  # Fewer trials for demonstration
        objectives_balance=objectives,
        n_holdout_sets=2,
        holdout_fraction=0.2,
        epochs_per_trial=30,
    )
    
    results.append({
        'Config': name,
        'l0_lambda': f"{params['l0_lambda']:.2e}",
        'Accuracy': f"{params['mean_val_accuracy']:.1%}",
        'Sparsity': f"{params['sparsity']:.1%}",
    })

results_df = pd.DataFrame(results)
print("\n" + "="*50)
print("Objective balance comparison:")
print(results_df.to_string(index=False))

INFO:microcalibrate.hyperparameter_tuning:Multi-holdout hyperparameter tuning:
  - 2 holdout sets
  - 2 targets per holdout (20.0%)
  - Aggregation: mean




Tuning with Accuracy-focused objectives...


  0%|          | 0/10 [00:00<?, ?it/s]

Reweighting progress: 100%|██████████| 30/30 [00:00<00:00, 1640.43epoch/s, loss=18.8, weights_mean=5.88, weights_std=2.83, weights_min=0.98]
Sparse reweighting progress: 100%|██████████| 60/60 [00:00<00:00, 334.73epoch/s, loss=0.0282, loss_rel_change=-0.479]
Reweighting progress: 100%|██████████| 30/30 [00:00<00:00, 1762.34epoch/s, loss=75, weights_mean=10.6, weights_std=3.91, weights_min=1.27]
Sparse reweighting progress: 100%|██████████| 60/60 [00:00<00:00, 327.34epoch/s, loss=0.112, loss_rel_change=-0.993]
INFO:microcalibrate.hyperparameter_tuning:Trial 0:
  Objectives by holdout: ['201.0131', '201.0256']
  Mean objective: 201.0193
  Mean val accuracy: 0.00% (±0.00%)
  Sparsity: 0.00%
Reweighting progress: 100%|██████████| 30/30 [00:00<00:00, 1506.32epoch/s, loss=163, weights_mean=15.2, weights_std=4.78, weights_min=2]
Sparse reweighting progress: 100%|██████████| 60/60 [00:00<00:00, 308.44epoch/s, loss=0.0668, loss_rel_change=-0.996]
Reweighting progress: 100%|██████████| 30/30 [00


Tuning with Sparsity-focused objectives...


  0%|          | 0/10 [00:00<?, ?it/s]

Reweighting progress: 100%|██████████| 30/30 [00:00<00:00, 567.32epoch/s, loss=18.4, weights_mean=5.85, weights_std=2.81, weights_min=0.981]
Sparse reweighting progress: 100%|██████████| 60/60 [00:00<00:00, 257.40epoch/s, loss=0.0282, loss_rel_change=-0.479]
Reweighting progress: 100%|██████████| 30/30 [00:00<00:00, 1167.63epoch/s, loss=74.1, weights_mean=10.6, weights_std=3.97, weights_min=1.08]
Sparse reweighting progress: 100%|██████████| 60/60 [00:00<00:00, 229.01epoch/s, loss=0.106, loss_rel_change=-0.993]
INFO:microcalibrate.hyperparameter_tuning:Trial 0:
  Objectives by holdout: ['99.9935', '100.0218']
  Mean objective: 100.0076
  Mean val accuracy: 0.00% (±0.00%)
  Sparsity: 0.00%
Reweighting progress: 100%|██████████| 30/30 [00:00<00:00, 1799.85epoch/s, loss=164, weights_mean=15.2, weights_std=4.77, weights_min=2.37]
Sparse reweighting progress: 100%|██████████| 60/60 [00:00<00:00, 279.03epoch/s, loss=0.0647, loss_rel_change=-0.997]
Reweighting progress: 100%|██████████| 30/30


Tuning with Balanced objectives...


  0%|          | 0/10 [00:00<?, ?it/s]

Reweighting progress: 100%|██████████| 30/30 [00:00<00:00, 1818.97epoch/s, loss=19.6, weights_mean=5.94, weights_std=2.84, weights_min=0.982]
Sparse reweighting progress: 100%|██████████| 60/60 [00:00<00:00, 325.64epoch/s, loss=0.0282, loss_rel_change=-0.479]
Reweighting progress: 100%|██████████| 30/30 [00:00<00:00, 1941.75epoch/s, loss=76.5, weights_mean=10.7, weights_std=3.96, weights_min=1.13]
Sparse reweighting progress: 100%|██████████| 60/60 [00:00<00:00, 336.02epoch/s, loss=0.114, loss_rel_change=-0.993]
INFO:microcalibrate.hyperparameter_tuning:Trial 0:
  Objectives by holdout: ['110.0095', '110.0267']
  Mean objective: 110.0181
  Mean val accuracy: 0.00% (±0.00%)
  Sparsity: 0.00%
Reweighting progress: 100%|██████████| 30/30 [00:00<00:00, 1966.42epoch/s, loss=165, weights_mean=15.3, weights_std=4.77, weights_min=1.73]
Sparse reweighting progress: 100%|██████████| 60/60 [00:00<00:00, 266.74epoch/s, loss=0.0703, loss_rel_change=-0.996]
Reweighting progress: 100%|██████████| 30/


Objective balance comparison:
          Config l0_lambda Accuracy Sparsity
Accuracy-focused  1.31e-06    50.0%     0.0%
Sparsity-focused  1.58e-05    50.0%     1.6%
        Balanced  1.58e-05    50.0%     1.5%


## Best practices for hyperparameter tuning

### 1. Start with fewer trials
Begin with 10-20 trials to get a sense of the parameter space, then increase if needed.

### 2. Adjust objective balance based on your needs
- **High accuracy weight**: When precision is critical
- **High sparsity weight**: When dataset reduction is the priority
- **Balanced**: Good starting point for most use cases

### 3. Use appropriate cross-validation
- **More holdout sets**: Better generalization estimates but slower
- **Larger holdout fraction**: More robust validation but less training data

### 4. Consider computational resources
- Reduce `epochs_per_trial` for faster exploration
- Use `n_jobs=-1` for parallel trials if you have multiple cores

### 5. Monitor for overfitting
Watch for large gaps between training and validation performance.

### 6. Data leakage awareness
Remember that targets often share information (e.g., 'income_north' and 'total_income'), so validation metrics may be optimistic.

## Next steps

After finding optimal hyperparameters:
1. Apply them to your full calibration with more epochs
2. Evaluate robustness using the [Robustness evaluation](robustness_evaluation.ipynb) notebook
3. Save the parameters for future use
4. Consider fine-tuning if results aren't satisfactory

The tuned parameters are specific to your dataset and target configuration, so re-tune if these change significantly.