Issue with lowess() smoother in statsmodels #946

yarden · 2013-07-04T19:12:40Z

Following Josef's suggestion (https://groups.google.com/forum/#!topic/pystatsmodels/A5KMexQA1D8), I am using lowess() from statsmodels.nonparametric.smoothers_lowess to do Lowess smoothing. When I try it on this data set of X and Y values "test_data.txt", available here: https://gist.github.com/yarden/5929702
then I get the error:

  File "/home/yarden/.local/lib/python2.7/site-packages/statsmodels-0.5.0-py2.7-linux-x86_64.egg/statsmodels/nonparametric/smoothers_lowess.py", line 162, in lowess
    x = np.array(x[sort_index])
IndexError: index 13632 is out of bounds for size 10146

My code just calls lowess on the x, y values in the file:

df = pandas.read_table("./test_data.txt", sep="\t")
from statsmodels.nonparametric.smoothers_lowess import lowess
y_vals = lowess(df["Y"], df["X"], return_sorted=False)

where NaN values in the test_data.txt file just represent missing values. What is going wrong here?

Also, if I do:

df = df.dropna(subset=["X", "Y"], how="any")

Then it seems to work, but I thought NaN values are by default dropped (based on the missing argument to lowess()), so I am not sure what caused the problem in this case.

Thanks very much for your help.

The text was updated successfully, but these errors were encountered:

josef-pkt · 2013-07-04T19:36:54Z

BUG, my mistake for not having a test case with nans.

argsort on line 161 should use x not exog

sort_index = np.argsort(x)
not
sort_index = np.argsort(exog)

exog is the original full length, x, y have fewer rows if there are missing values

yarden · 2013-07-04T19:39:08Z

thanks very much for your prompt reply and fix!

josef-pkt · 2013-07-04T19:42:56Z

Fix will land in master within a day, with test case.

Thanks for reporting it.

yarden · 2013-07-04T20:01:30Z

Quick followup: if I pass lowess the argument missing="none", return_sorted=False, it runs with no error. Does that mean that missing values (nans) are ignored and nan and x values that are missing are simply returned as nan? That's the behavior I hope to achieve but I wanted to make sure I am not misunderstanding�.

josef-pkt · 2013-07-04T20:14:15Z

missing="none" means we don't check for nans. (to save computation when we already know we don't have NaNs or infs.)
If there are nans, then they are treated as nans in the floating point operations. All code just runs with float (double). I never checked how NaNs propagate in this case.
My guess from similar code is that all smoothed values that have a nan in their neighborhood will also turn into nans.

maybe you want missing="drop", return_sorted=False which drops nans and sorts the array for the calculations, but then put's it back in the same order and shape as the original data, with nans in the position where either x or y had a nan in the data.

That's the intended behavior, the unit tests might not include a case with nans given the previous error. (not all option combinations are unit tested.)

yarden · 2013-07-04T21:12:22Z

Ah I see, on closer look, it does put nans on all values that are nearby nans which is definitely not what I intended. I am looking for missing="drop", return_sorted=False based on your description: dropping nans for the lowess operation, and then putting nans back to preserve you get the same length arrays.

josef-pkt · 2013-07-04T23:33:09Z

I just tried with your data return_sorted=True and return_sorted=False return exactly the same valid points.

If you want to speed up the calculations with a large dataset, then you could use the delta option which skips points that are not a minimum distance apart. (I never really tried it out. Carl implemented it to have the same fast option as in R.)

yarden · 2013-07-04T23:45:35Z

Did you change anything in the code? I don't see updates to master branch and when I try it with a recently cloned repository and missing="drop" (with either set of return_sorted) I get the error I mentioned above about dimensions being incorrect.

josef-pkt · 2013-07-04T23:55:18Z

no, I just made the change exog -> x locally, in the code of statsmodels that my python is using.

I was mainly looking at plots, and trying to figure out how to write additional unit tests for these options.

You could change it in your installed statsmodels, then you can run it right away.

BUG fix lowess sort when nans closes #946

josef-pkt · 2013-07-05T20:08:05Z

fix and more unit tests are in master

BUG fix lowess sort when nans closes statsmodels#946

josef-pkt mentioned this issue Jul 5, 2013

BUG fix lowess sort when nans closes #946 #949

Merged

josef-pkt closed this as completed in d9b699a Jul 5, 2013

josef-pkt added a commit that referenced this issue Jul 5, 2013

Merge pull request #949 from josef-pkt/fix_lowess_946

95737a1

BUG fix lowess sort when nans closes #946

PierreBdR pushed a commit to PierreBdR/statsmodels that referenced this issue Sep 2, 2014

BUG fix lowess sort when nans closes statsmodels#946

fd1ff85

PierreBdR pushed a commit to PierreBdR/statsmodels that referenced this issue Sep 2, 2014

Merge pull request statsmodels#949 from josef-pkt/fix_lowess_946

13b5014

BUG fix lowess sort when nans closes statsmodels#946

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue with lowess() smoother in statsmodels #946

Issue with lowess() smoother in statsmodels #946

yarden commented Jul 4, 2013

josef-pkt commented Jul 4, 2013

yarden commented Jul 4, 2013

josef-pkt commented Jul 4, 2013

yarden commented Jul 4, 2013

josef-pkt commented Jul 4, 2013

yarden commented Jul 4, 2013

josef-pkt commented Jul 4, 2013

yarden commented Jul 4, 2013

josef-pkt commented Jul 4, 2013

josef-pkt commented Jul 5, 2013

Issue with lowess() smoother in statsmodels #946

Issue with lowess() smoother in statsmodels #946

Comments

yarden commented Jul 4, 2013

josef-pkt commented Jul 4, 2013

yarden commented Jul 4, 2013

josef-pkt commented Jul 4, 2013

yarden commented Jul 4, 2013

josef-pkt commented Jul 4, 2013

yarden commented Jul 4, 2013

josef-pkt commented Jul 4, 2013

yarden commented Jul 4, 2013

josef-pkt commented Jul 4, 2013

josef-pkt commented Jul 5, 2013