iterated lowess is wrong #2108

anntzer · 2014-11-24T22:54:09Z

Consider

In [1]: sm.nonparametric.lowess([0] * 10 + [1] * 10, np.arange(20), it=2)[:, 1].T
Out[1]: 
array([ 0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
        0.        ,  0.        ,  0.        ,  0.26162324,  0.45476355,
        0.56315913,  0.70972324,  1.        ,  1.        ,  1.        ,
        1.        ,  1.        ,  1.        ,  1.        ,  1.        ])

(either with the old, 0.6.0 version, or the latest "nan-fixed" version)
This is likely incorrect as the output array isn't even symmetric around 0.5.

Compare with R's result:

> lowess(c(rep(0, 10), rep(1, 10)), iter=2)
$x
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20

$y
 [1] 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
 [7] 0.00000000 0.03796574 0.29511209 0.44982749 0.55017251 0.70488791
[13] 0.96203426 1.00000000 1.00000000 1.00000000 1.00000000 1.00000000
[19] 1.00000000 1.00000000

which is symmetric.

Note that the issue doesn't appear if iter=1 (statsmodels gives the same result as R).

The text was updated successfully, but these errors were encountered:

jseabold · 2014-11-24T23:10:22Z

@carljv Any comment on this? I'm not familiar enough with this code to say anything without studying it for a bit.

jseabold · 2014-11-24T23:30:02Z

Well, it's not quite as dire as the title sounds. It could just be in these odd cases, R is doing something different. If you want to dig in here and figure it out, it would be much appreciated. For 'normal' data, we agree with R.

R> lowess(cars, iter=3)
$x
 [1]  4  4  7  7  8  9 10 10 10 11 11 12 12 12 12 13 13 13 13 14 14 14 14 15 15
[26] 15 16 16 17 17 17 18 18 18 18 19 19 19 20 20 20 20 20 22 23 24 24 24 24 25

$y
 [1]  4.965  4.965 13.124 13.124 15.859 18.580 21.280 21.280 21.280 24.129
[11] 24.129 27.120 27.120 27.120 27.120 30.027 30.027 30.027 30.027 32.963
[21] 32.963 32.963 32.963 36.758 36.758 36.758 40.435 40.435 43.463 43.463
[31] 43.463 46.885 46.885 46.885 46.885 50.793 50.793 50.793 56.491 56.491
[41] 56.491 56.491 56.491 67.586 73.080 78.643 78.643 78.643 78.643 84.329

Statsmodels

[~/]
[1]: cars = sm.datasets.get_rdataset("cars").data

[~/]
[2]: sm.nonparametric.lowess(cars.dist, cars.speed, it=3, delta=.01*np.ptp(cars.speed))
[2]: 
array([[  4.        ,   4.96545928],
       [  4.        ,   4.96545928],
       [  7.        ,  13.12449504],
       [  7.        ,  13.12449504],
       [  8.        ,  15.85863338],
       [  9.        ,  18.57969051],
       [ 10.        ,  21.28031258],
       [ 10.        ,  21.28031258],
       [ 10.        ,  21.28031258],
       [ 11.        ,  24.12927715],
       [ 11.        ,  24.12927715],
       [ 12.        ,  27.11954855],
       [ 12.        ,  27.11954855],
       [ 12.        ,  27.11954855],
       [ 12.        ,  27.11954855],
       [ 13.        ,  30.02727633],
       [ 13.        ,  30.02727633],
       [ 13.        ,  30.02727633],
       [ 13.        ,  30.02727633],
       [ 14.        ,  32.96250613],
       [ 14.        ,  32.96250613],
       [ 14.        ,  32.96250613],
       [ 14.        ,  32.96250613],
       [ 15.        ,  36.75772834],
       [ 15.        ,  36.75772834],
       [ 15.        ,  36.75772834],
       [ 16.        ,  40.43507456],
       [ 16.        ,  40.43507456],
       [ 17.        ,  43.46349178],
       [ 17.        ,  43.46349178],
       [ 17.        ,  43.46349178],
       [ 18.        ,  46.88547894],
       [ 18.        ,  46.88547894],
       [ 18.        ,  46.88547894],
       [ 18.        ,  46.88547894],
       [ 19.        ,  50.79315172],
       [ 19.        ,  50.79315172],
       [ 19.        ,  50.79315172],
       [ 20.        ,  56.49122409],
       [ 20.        ,  56.49122409],
       [ 20.        ,  56.49122409],
       [ 20.        ,  56.49122409],
       [ 20.        ,  56.49122409],
       [ 22.        ,  67.58582423],
       [ 23.        ,  73.07969527],
       [ 24.        ,  78.64316355],
       [ 24.        ,  78.64316355],
       [ 24.        ,  78.64316355],
       [ 24.        ,  78.64316355],
       [ 25.        ,  84.32869809]])

josef-pkt · 2015-06-17T01:07:25Z

I'm trying to partially figure out how the neighborhood is determined, #2449

it looks to me there is an asymmetry
line269 radius = fmax(x[i] - x[left_end], x[right_end-1] - x[i])
which in my interpretation centers just to the left of the intersection between x and the midpoint 0.5 * (x_low + x_high)
However, maybe it's correct because we have already increased left_end by one, so we are trimming one observation on each side.

???

josef-pkt · 2015-06-17T01:23:19Z

However, we are using right_end for slicing x[left_end:right_end] which does not include the right_end index, in in calculate_weights.

Also, what's the behavior at the lower boundary. Is x[0] necessarily included in the first interval?

jbrockmendel · 2018-09-09T16:40:56Z

Does this apply to smoothers_lowess or smoothers_lowess_old?

josef-pkt · 2018-09-09T16:51:50Z

It's for the cython code that is actually used.
The cython code is a translation of the python code with some extras, AFAIR.
in #2449 I went through the cython code, and might not have looked much at the python code.

kaktus42 · 2021-05-26T12:56:06Z

So, I looked into the question of indexing. And it looks fine.

Let's say you have 30 datapoints and f = 0.2 then you got a window size of k = 6
and the following is set: left_end, right_end = 0, k.
Which correctly includes 6 datpoints with indexing x[left_end:right_end].
However, there are just going to be 5 datapoints used, because they are weighted by their distance to x[i].
The algorithm says: use the k nearest neighbours and then divide by the distance h = x[i] - x[a] with a being the point k-furthest away (or the "k-smallest number among |x[i] - x[j]| ∀ j"). This means, the furthest point within the window is always downweighted to 0.

Does this help?

jseabold added comp-nonparametric type-bug labels Nov 24, 2014

josef-pkt mentioned this issue Jun 18, 2015

[Bug/Doc] lowess returns nan and does not warn if there are too few neighbors #2449

Open

bashtage added this to bugs in 0.11 Jul 16, 2019

bashtage added this to To do in 0.12 Jan 24, 2020

bashtage removed this from Bugs (Not-started ) in 0.11 Jan 24, 2020

jseabold linked a pull request Apr 18, 2024 that will close this issue

ENH: Stop unstable lowess iterations #9220

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

iterated lowess is wrong #2108

iterated lowess is wrong #2108

anntzer commented Nov 24, 2014

jseabold commented Nov 24, 2014

jseabold commented Nov 24, 2014

josef-pkt commented Jun 17, 2015

josef-pkt commented Jun 17, 2015

jbrockmendel commented Sep 9, 2018

josef-pkt commented Sep 9, 2018

kaktus42 commented May 26, 2021

iterated lowess is wrong #2108

iterated lowess is wrong #2108

Comments

anntzer commented Nov 24, 2014

jseabold commented Nov 24, 2014

jseabold commented Nov 24, 2014

josef-pkt commented Jun 17, 2015

josef-pkt commented Jun 17, 2015

jbrockmendel commented Sep 9, 2018

josef-pkt commented Sep 9, 2018

kaktus42 commented May 26, 2021