New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lowess silently returns nans #1798
Comments
Thanks for reporting. I'm not very familiar with the details (or don't remember) I tried to see if it is related to the missing noise. It doesn't seem to be. From the docstring, the number of neighbors should always be frac * nobs, so independent of the distance between the x.
|
The number of nans is increasing with each iteration:
|
This might be a perfect prediction problem. Maybe somewhere a error variance estimate goes to zero.
|
I noticed that biopython also provides an implementation of lowess that suffers from a similar issue:
|
@anntzer Can you check whether |
Yes, it is equal to 0. Now I don't understand why the first two warnings are "invalid value" instead of "divide by zero", though. |
OK, so I checked matlab's implementation of lowess (smooth, in the curve fit toolbox). They explicitly use mldivide-least-squares (which doesn't return the least-squares solution with the smallest norm, but the one with the most zero elements -- but I guess lstsq would just work as well) here. |
No, not using least squares is a "feature" of lowess. The iteration=0, But that might be a good clue, For RLM. I have an open pull request that fixes the weights when the variance is zero.
I don't understand this part, if they use least squares or a version that works for singular matrices, then it returns the minimum, and doesn't select on the number of zero elements. |
based on this, I guess the problem is in
The solution is to set 0/0 = 0 for the std_resid (i.e. replace nans by zero if |
I agree that my comment about mldivide is not clear -- mostly because indeed I don't exactly know what mldivide is doing. See e.g. http://scicomp.stackexchange.com/questions/5603/in-matlab-what-differences-are-between-linsolve-and-mldivide |
It uses the matlab documentation for smooth 'rlowess', doesn't say what the weight is for "inliers". Our bisquare also sets the weight of observations to zero if the residual is larger than |
MATLAB also uses bisquare. |
With statsmodels 0.5.0,
lowess(arange(20), arange(20), frac=.4)
returns a smoothed array entirely made ofnan
s. This seems to be triggered by a value too small forfrac
, relative to the size of the data. With unevenly spaced data, sometimes only part of the smoothed data isnan
; e.g.lowess(arange(20), arange(20) ** 2, frac=.2)
.I don't know the exact details of the lowess implementation but it seems to me that if
frac
is too small (i.e. there aren't enough data points for the local smoothing?), the return value should just be set to the initially given value.The text was updated successfully, but these errors were encountered: