# Reweighting Function Documentation

## Purpose

This notebook documents a Python function `reweight` that adjusts a set of initial weights to better match target statistics. It's particularly useful for calibrating survey data weights in microsimulation models, such as those used in PolicyEngine UK.

## Import Required Libraries

In [1]:
import pandas as pd
import numpy as np
import torch
from torch.utils.tensorboard import SummaryWriter

## Function Definition and Overview

The reweight function uses an optimization process to adjust initial weights so that the weighted sum of estimates more closely matches a set of target values.
### Parameters

`initial_weights (torch.Tensor):` Initial weights for survey data.

`estimate_matrix (torch.Tensor):` Matrix of estimates from a microsimulation model.

`target_names (iterable):` Names of target statistics (not used in the function body).

`target_values (torch.Tensor):` Values of target statistics to match.

`epochs (int, optional):` Number of optimization iterations. Default is 1000.

`epoch_step (int, optional):` Interval for printing loss during optimization. Default is 100.

### Returns

`final_weights (torch.Tensor):` Adjusted weights after optimization.

In [2]:
def reweight(
    initial_weights,
    estimate_matrix,
    target_names,
    target_values,
    epochs=1000,
    epoch_step=100,
):
    """
    Main reweighting function, suitable for PolicyEngine UK use (PolicyEngine US use and testing TK)

    To avoid the need for equivalisation factors, use relative error:
    |predicted - actual|/actual

    Parameters:
    household_weights (torch.Tensor): The initial weights given to survey data, which are to be
    adjusted by this function.
    estimate_matrix (torch.Tensor): A large matrix of estimates, obtained from e.g. a PolicyEngine
    Microsimulation instance.
    target_names (iterable): The names of a set of target statistics treated as ground truth.
    target_values (torch.Tensor): The values of these target statistics.
    epochs: The number of iterations that the optimization loop should run for.
    epoch_step: The interval at which to print the loss during the optimization loop.

    Returns:
    final_weights: a reweighted set of household weights, obtained through an optimization process
    over mean squared errors with respect to the target values.
    """
    # Initialize a TensorBoard writer
    writer = SummaryWriter()

    # Create a Torch tensor of log weights
    log_weights = torch.log(initial_weights)
    log_weights.requires_grad_()

    # estimate_matrix (cross) exp(log_weights) = target_values

    optimizer = torch.optim.Adam([log_weights])

    # Report the initial loss:
    targets_estimate = torch.exp(log_weights) @ estimate_matrix
    # Calculate the loss
    loss = torch.mean(
        ((targets_estimate - target_values) / target_values) ** 2
    )
    print(f"Initial loss: {loss.item()}")

    # Training loop
    for epoch in range(epochs):

        # Estimate the targets
        targets_estimate = torch.exp(log_weights) @ estimate_matrix
        # Calculate the loss
        loss = torch.mean(
            ((targets_estimate - target_values) / target_values) ** 2
        )

        writer.add_scalar("Loss/train", loss, epoch)

        optimizer.zero_grad()

        # Perform backpropagation
        loss.backward()

        # Update weights
        optimizer.step()

        # Print loss whenever the epoch number, when one-indexed, is divisible by epoch_step
        if (epoch + 1) % epoch_step == 0:
            print(f"Epoch {epoch+1}, Loss: {loss.item()}")

    writer.flush()

    return torch.exp(log_weights.detach())

## Usage Example
Here's how you might use the reweight function:

In [3]:
# Prepare your data as PyTorch tensors
initial_weights = torch.tensor([1.0, 1.0, 1.0, 1.0, 1.0])
estimate_matrix = torch.tensor([
    [1.0, 2.0, 3.0],
    [2.0, 3.0, 4.0],
    [3.0, 4.0, 5.0],
    [4.0, 5.0, 6.0],
    [5.0, 6.0, 7.0]
])
target_names = ["Stat1", "Stat2", "Stat3"]
target_values = torch.tensor([10.0, 15.0, 20.0])

# Call the function
final_weights = reweight(
    initial_weights,
    estimate_matrix,
    target_names,
    target_values,
    epochs=1000,
    epoch_step=100
)

print("Final weights:", final_weights)


Initial loss: 0.14120370149612427
Epoch 100, Loss: 0.06793717294931412
Epoch 200, Loss: 0.03280560299754143
Epoch 300, Loss: 0.016901666298508644
Epoch 400, Loss: 0.010035503655672073
Epoch 500, Loss: 0.007239286322146654
Epoch 600, Loss: 0.0061649903655052185
Epoch 700, Loss: 0.005761378910392523
Epoch 800, Loss: 0.0055924332700669765
Epoch 900, Loss: 0.005493843927979469
Epoch 1000, Loss: 0.005410326179116964
Final weights: tensor([0.7894, 0.7471, 0.7306, 0.7218, 0.7163])


## Important Notes

* The function uses relative error (|predicted - actual|/actual) for optimization, avoiding the need for equivalisation factors.

* It utilizes TensorBoard for logging the loss during training.

* The optimization process uses the Adam optimizer and performs gradient descent on the log of the weights.

## Warning

This function expects input data in the form of PyTorch tensors. Using data in any other format (e.g., NumPy arrays, Pandas DataFrames) without converting to PyTorch tensors first will result in errors. Make sure to convert your input data to PyTorch tensors before passing them to the function.