-
-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
'stats.boxcox' return all same values #6873
Comments
@Qukaiyi could you please add a minimal reproducible example, something we can directly run? Also it would be helpful if you explain what precisely you think is wrong with |
@ev-br I‘m sorry. here is my example in
Now, I use
We see that And the following is the result in
The following is a simple description of R's
Last, I run python again with the
We can see, the results ofr |
Apparently the code that computes the MLE of lambda in
When this happens, any further calculation performed by Note that you get several warnings when you compute
A somewhat surprising work-around is to tweak the
So instead of using
|
@WarrenWeckesser Thank you for your quick reply! |
@Qukaiyi Thanks for providing the data and the additional information. That made it easy to track down the problem. In case anyone else is wondering about the discrepancy between R and scipy in the estimate of lambda: apparently the R function
In scipy:
|
@WarrenWeckesser Get it! |
When should we expect a fix for this? What is the proposed solution? |
@omtinez only when someone gets around to fixing it, no one is working on it at the moment so no timeline can be given. |
Would you accept a pull request for this? My proposed solution would be to find the minimum and maximum lambdas that would not lead to loss of precision, then use the triplet form of |
Yes, that would be great! |
I'm running into this issue and the proposed workaround does not seem to work for my case, unfortunately.
results in
|
Here is another case:
|
I have this problem too! |
* Fixes scipy#6873 * Short-circuits MLE and Pearson in certain edge cases * Adds test for Box-Cox transform when all transformed values are equal
* Fixes scipy#6873 * Short-circuits MLE and Pearson in certain edge cases * Adds test for Box-Cox transform when all transformed values are equal
* Fixes scipy#6873 * Short-circuits MLE and Pearson in certain edge cases * Adds test for Box-Cox transform when all transformed values are equal
* Fixes scipy#6873 * Short-circuits MLE and Pearson in certain edge cases * Adds test for Box-Cox transform when all transformed values are equal
* Fixes scipy#6873 * Short-circuits MLE and Pearson in certain edge cases * Adds test for Box-Cox transform when all transformed values are equal
* Fixes scipy#6873 * Short-circuits MLE and Pearson in certain edge cases * Adds test for Box-Cox transform when all transformed values are equal
* Fixes scipy#6873 * Short-circuits MLE and Pearson in certain edge cases * Adds test for Box-Cox transform when all transformed values are equal
* Fixes scipy#6873 * Short-circuits MLE and Pearson in certain edge cases * Adds test for Box-Cox transform when all transformed values are equal
* Fixes scipy#6873 * Short-circuits MLE and Pearson in certain edge cases * Adds test for Box-Cox transform for valid corner cases * Adds test for Box-Cox transform for invalid corner cases
* Fixes scipy#6873 * Short-circuits MLE and Pearson in certain edge cases * Adds test for Box-Cox transform for valid corner cases * Adds test for Box-Cox transform for invalid corner cases
* Fixes scipy#6873 * Short-circuits MLE and Pearson in certain edge cases * Adds test for Box-Cox transform for valid corner cases * Adds test for Box-Cox transform for invalid corner cases
* Fixes scipy#6873 * Short-circuits MLE and Pearson in certain edge cases * Adds test for Box-Cox transform for valid corner cases * Adds test for Box-Cox transform for invalid corner cases
* Fixes scipy#6873 * Short-circuits MLE and Pearson in certain edge cases * Adds test for Box-Cox transform for valid corner cases * Adds test for Box-Cox transform for invalid corner cases
I finally had some time to look at this. Both the MLE and the Pearson version of this function were broken for different (but similar) reasons. The PR fixes the main problems, but some inputs still result in numerical instability (e.g. |
Thanks @omtinez. I agree that that's better left for another PR. Yours should close this issue, and then a new one can be opened for remaining edge cases. |
* Fixes scipy#6873 * Short-circuits MLE and Pearson in certain edge cases * Adds test for Box-Cox transform for valid corner cases * Adds test for Box-Cox transform for invalid corner cases
Are the two lines of the annotated code below correct? Take the first one for example. According to YJ definition, it should be lmbda !=0. As lmbda can take on both positive and negative values, lmbda<1e-19 is a very poor substitute for lmbda!=0, is it not? Perhaps these two lines should instead read as follows? I am missing something?
|
* Fixes scipy#6873 * Short-circuits MLE and Pearson in certain edge cases * Adds test for Box-Cox transform for valid corner cases * Adds test for Box-Cox transform for invalid corner cases
* Fixes scipy#6873 * Short-circuits MLE and Pearson in certain edge cases * Adds test for Box-Cox transform for valid corner cases * Adds test for Box-Cox transform for invalid corner cases
Can we reopen this issue? It has not been fixed, still waiting for #9271 to be merged. For anyone that wants a workaround, you can clone my fork since the PR with the fix is not getting any traction: https://github.com/omtinez/scipy |
have the same problem too |
Do we have some temporary solution toward this issue? Thanks |
* Fixes scipy#6873 * Short-circuits MLE and Pearson in certain edge cases * Adds test for Box-Cox transform for valid corner cases * Adds test for Box-Cox transform for invalid corner cases
* Fixes scipy#6873 * Short-circuits MLE and Pearson in certain edge cases * Adds test for Box-Cox transform for valid corner cases * Adds test for Box-Cox transform for invalid corner cases
For the record, these questions on stackoverflow are likely the same issue: |
This change reformulates the expression for the log-likelihood function computed by boxcox_llf in a way that is mathematically equivalent to the old version but avoids the loss of precision that can occur in the subtraction y**λ - 1. For conciseness, let T(y; λ) be the Box-Cox transformation T(y; λ) = { (y**λ - 1)/λ if λ ≠ 0 { ln(y) if λ = 0 As explained in a comment in scipygh-6873, a problem arises if y is sufficiently large and λ is sufficiently negative (e.g. 5000**-5 is 3.2e-19). When y**λ approaches the floating point epsilon, the subtraction y**λ - 1 suffers catastrophic loss of precision. When this occurs in the log-likelihood function, it results in the optimizer returning garbage. The log-likelihood function (for a vector Y) is L(Y; λ) = -n*ln(var(T(Y; λ)))/2 + (λ - 1)*sum(ln(Y)) where n is the length of Y. The Box-Cox transformation T only appears as the argument to var; we only use the transform to compute the variance of the transformed data. The variance is invariant with respect to a constant shift, so, assuming λ ≠ 0, var(T(y; λ)) = var(Y**λ/λ - 1/λ) = var(Y**λ/λ) That is, we can compute var(T(y; λ)) without the subtraction in the Box-Cox transformation. Closes scipygh-6873.
This change reformulates the expression for the log-likelihood function computed by boxcox_llf in a way that is mathematically equivalent to the old version but avoids the loss of precision that can occur in the subtraction y**λ - 1. For conciseness, let T(y; λ) be the Box-Cox transformation T(y; λ) = { (y**λ - 1)/λ if λ ≠ 0 { ln(y) if λ = 0 As explained in a comment in scipygh-6873, a problem arises if y is sufficiently large and λ is sufficiently negative (e.g. 5000**-5 is 3.2e-19). When y**λ approaches the floating point epsilon, the subtraction y**λ - 1 suffers catastrophic loss of precision. When this occurs in the log-likelihood function, it results in the optimizer returning garbage. The log-likelihood function (for a vector Y) is L(Y; λ) = -n*ln(var(T(Y; λ)))/2 + (λ - 1)*sum(ln(Y)) where n is the length of Y. The Box-Cox transformation T only appears as the argument to var; we only use the transform to compute the variance of the transformed data. The variance is invariant with respect to a constant shift, so, assuming λ ≠ 0, var(T(y; λ)) = var(Y**λ/λ - 1/λ) = var(Y**λ/λ) That is, we can compute var(T(y; λ)) without the subtraction in the Box-Cox transformation. Closes scipygh-6873.
Proposed solution: #10072 |
I applyed
stats.boxbox
to my data and the returned values are all the same, which seems really unreasonable! it returned this same result inscipy=0.18.1
andscipy=0.17.1
. the optimallambda
in my case is -5.501196436791543.your can download my data(data.txt).
I also tried the
boxcox
function inR
and it returned reasonable result. So i think theboxcox_normmax
function should do more test and should be used carefully.The text was updated successfully, but these errors were encountered: