Skip to content

Fix yeojohnson lambda estimate ignoring missing values#28

Merged
petersonR merged 1 commit into
petersonR:masterfrom
JonSulc:master
Jun 4, 2026
Merged

Fix yeojohnson lambda estimate ignoring missing values#28
petersonR merged 1 commit into
petersonR:masterfrom
JonSulc:master

Conversation

@JonSulc
Copy link
Copy Markdown
Contributor

@JonSulc JonSulc commented Jun 3, 2026

Problem

estimate_yeojohnson_lambda() computed the observation count with n <- length(x) before dropping missing values, so n included the NAs. That inflated n feeds the log-likelihood that optimize() maximizes:

-0.5 * n * log(x_t_var) + (lambda - 1) * constant

Because the inflated n reweights the first term, the optimized lambda shifts. The practical symptom: appending a few NAs to a vector changed the estimated lambda for the otherwise-identical data.

Fix

Moving n <- length(x) to after dropping the missing values fixes the issue.

Added a regression test as well.

Updated the version to 1.9.3 to reflect the bug fix.

estimate_yeojohnson_lambda computed n with length(x) before dropping
NAs, so the count of "observations" included missing values. That
inflated n in the log-likelihood, shifting the optimized lambda.
Appending NAs to a vector therefore changed the estimated lambda.

Compute n after removing missing values so the log-likelihood uses the
number of nonmissing observations. Add a regression test asserting the
estimated lambda is unchanged when NAs are appended.
Copy link
Copy Markdown
Owner

@petersonR petersonR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, thanks

@petersonR petersonR merged commit 90698be into petersonR:master Jun 4, 2026
5 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants