Skip to content

fix: l2 loss stddev scaling#101

Merged
ryan112358 merged 3 commits intoryan112358:masterfrom
Shoeboxam:l2-loss-fix
Mar 6, 2026
Merged

fix: l2 loss stddev scaling#101
ryan112358 merged 3 commits intoryan112358:masterfrom
Shoeboxam:l2-loss-fix

Conversation

@Shoeboxam
Copy link
Contributor

@Shoeboxam Shoeboxam commented Mar 5, 2026

We know that $y \mid x \sim \mathcal{N}(x,\ \sigma^2 I)$. So

$$p(y\mid x)=(2\pi\sigma^2)^{-n/2}\exp\left(-\frac{1}{2\sigma^2}\,\lVert y-x\rVert_2^2\right),$$

therefore (ignoring the constant term)

$$-\log p(y\mid x)\equiv\frac{1}{2\sigma^2}\,\lVert y-x\rVert_2^2 = \texttt{(diff @ diff) / (2 * M.stddev**2)}.$$

It's currently $1/\sigma$, not $1/\sigma^2$, meaning high-variance estimates are given too much importance.

Let me know if I've misunderstood your API!

EDIT: The code is consistent with the distribution estimation problem as written in the AIM paper section 2.3, which weights squared errors by $1/\sigma$ (and $\sigma$ is stddev). But under the paper's stated noise model the MLE for gaussian noise should weight squared errors by $1/\sigma^2$ (ignoring the half). With this change you then get equivalence to inverse variance weighting (known to be MLE for gaussian).

In AIM, this means later rounds won't improve utility as much as they should if they overlap with a query in an earlier round, which repeatedly triggers the "not significant" criterion and pushes the budget up until early exhaustion. Effect might also be negligible though, haven't tested.

@Shoeboxam
Copy link
Contributor Author

Shoeboxam commented Mar 5, 2026

By the way, have you considered storing whether noise has been added with laplace or gaussian in LinearMeasurement and automatically choosing l1 or l2 optimization if not explicitly set by the user?

Copy link
Owner

@ryan112358 ryan112358 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, this is a regression introduced during refactoring. It appears a previous version of the library handled this correctly. Left one comment then we can merge.

Regarding L1 vs. L2 minimization: I recommend using L2 minimization even when adding Laplace noise, since it appears to generalize better. I never figured out why though! See Section D.1 in https://arxiv.org/pdf/1901.09136.

@Shoeboxam Shoeboxam requested a review from ryan112358 March 6, 2026 03:50
@ryan112358 ryan112358 merged commit a34eefa into ryan112358:master Mar 6, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants