WIP ENH: optimize knot location, Segmented Regression #2677

josef-pkt · 2015-10-25T14:39:03Z

Use case: This is a complementary method to penalized splines, where we want to carefully place a few knots instead of penalizing a large number of knots. The main advantage is in cases where we have clear breaks like in piecewise linear regression. This uses power splines and assumes that the estimated function is continuous but not smooth.

This is still largely "research code". I'm trying to figure out how to make this computationally efficient and robust in the optimization.

to the optimization: I guess we can have many local optima. My initial idea was to optimize a knot at a time, restricting it to be within the interval given by the neighboring knots. (article on knot selection in splines, ref?). Because of a mistake in the scipy documentation for brent I actually use the lower and the center point of the bracket which actually works better. Using fminbound to force the interval gives worse results.

One consequence/problem with brent is that it doesn't preserve the increasing order or knots.
It works well in the examples, but it doesn't "feel" robust.

Another option is to increase the number of knots sequentially. This also works well in the example, but we still have the unsorted knot problem.

TODO:

compatibility fixes: I'm working currently only with python 3.4 and numpy 1.9. Final version needs to have compatibility fixes.
no predict yet: We need to be able to create the spline/segment basis for an exog in prediction.
Similar to stateful transform in patsy.
pandas integration, xnames, ...: currently extra data information is not transferred to created models
no model.__class__.__init__ keywords yet, should be relatively easy to add
Other exog: currently uses or assumes the full model: projecting on other exog would be computationally more efficient. (p-inverse of one block in partitioned matrix, instead of full p-inverse)
selecting number of knots: sequential addition of knots also provides, ssr, aic, ... for choosing the number of knots. A proper hypothesis test would have to take the optimization of knot location into account (multiple testing, sup test)

josef-pkt · 2015-10-25T18:07:00Z

extra failure np.percentile doesn't seem to be vectorized in numpy 1.7

I get now the failure in from_model also in my original example script. It looks like a local minimum that shows up because of the default starting knots. The original example used different starting knots and went to the better local minimum.
I get the same bad local minimum if I use fminbound instead of brent. Either we need to use a global optimizer, or we use the sequential addition of knots, and add a drop method.

edit
I still have the problem that knots don't remain sorted, but I use the bracket for brent or bounds for fminbound as if they were sorted.
This is still a possible candidate for messing up convergence.

josef-pkt · 2015-10-25T18:43:25Z

update to previous:
Using correct brackets for brent or fminbound (using searchsorted) does not make a difference to the local minimum. Adding an additional knot moves it to the optimal 4-knot solution.
Adding a knot checks adding it to every segment. (but adding knots also still assumes partially that knots/bounds are sorted)

edit
Everything seems to work fine after fixing some mistakes in using searchsorted bounds for brackets.
from_model also converges to good local minimum.

josef-pkt · 2015-10-27T13:31:01Z

I just saw that R has a segmented package https://cran.r-project.org/web/packages/segmented/index.html
Several answers to "segmented" questions in stats.stackexchange refer to it.

josef-pkt · 2015-10-27T13:38:48Z

AttributeError: 'list' object has no attribute 'copy' in python 2, including 2.7

I haven't decided yet whether to keep bounds as list inside the loops or switch everywhere to ndarrays

josef-pkt · 2015-10-27T16:20:19Z

about R segmented: seems to have a similar API structure as I used
http://cran.utstat.utoronto.ca/doc/Rnews/Rnews_2008-1.pdf#page=20
the optimization there uses a gap variable (jump size) which should go to zero at convergence if connected linear segments are correctly specified. Non convergence might imply misspecification.

The package uses Davies test for the p-value for the Null of zero slope increment, but seems to use standard Wald values for the rest, e.g. confidence intervals (assuming break exists). (Nuisance parameter not identified under alternative.)
They nan the pvalue in the summary results for the slope increments.

standard error and confidence interval of knot location are based on Delta method with the gap parameters.

josef-pkt · 2016-06-19T03:55:50Z

statsmodels/base/_segmented.py

+
+class Segmented(object):
+    """class to search for a variable transformation in regression
+


missing Parameters section

josef-pkt · 2016-06-19T04:10:15Z

Based on the docstring it's not very clear how to use this. rework API or document better.

https://github.com/josef-pkt/misc/blob/segmented/notebooks/ex_segmented_regression_sm.ipynb
the main limitation seems to be the missing predict, i.e. creating new splines given the estimated knots. (based on summary and todos at the end of the notebook)

josef-pkt · 2016-06-19T04:21:58Z

also currently only supports OLS (implicitly, no extra args and minimize ssr)

It might still be useful as is, because it's also possible to construct a regular model with possibly patsy linear splines with the knot locations found by the Segmented class.

aic, bic, and lr test might also be valid for choosing the number of knots, with the caveat that wald inference and standard errors don't take knot search into account.

(I guess LR test will not be a standard case, "parameter not identified under the alternative" ? I don't remember right now.)

josef-pkt · 2017-05-20T01:48:29Z

rebased (merge conflict in compat/numpy easily resolved)

from what I remember this was ready to merge, just waiting for feedback

josef-pkt · 2017-05-20T02:31:12Z

fails on python 2:

  File "/home/travis/miniconda2/envs/statsmodels-test/lib/python2.7/site-packages/statsmodels-0.8.0-py2.7-linux-x86_64.egg/statsmodels/base/_segmented.py", line 317, in add_knot
    bounds_all.append(bounds.copy())
AttributeError: 'list' object has no attribute 'copy'

coveralls · 2017-05-20T02:33:23Z

Coverage increased (+0.03%) to 90.464% when pulling 017bf76 on josef-pkt:segmented_knots into 90e2af8 on statsmodels:master.

josef-pkt · 2017-11-01T01:01:43Z

rebased, no merge conflicts

coveralls · 2017-11-01T02:54:43Z

Coverage increased (+0.02%) to 81.909% when pulling 5cccb33 on josef-pkt:segmented_knots into 5b8f20d on statsmodels:master.

josef-pkt · 2018-08-14T12:09:44Z

test failure python 2.7 incompatibility in add_knot

>           bounds_all.append(bounds.copy())
E           AttributeError: 'list' object has no attribute 'copy'

same comment already twice above

josef-pkt · 2022-02-16T16:15:27Z

rebased version in #8124

josef-pkt added type-enh comp-base comp-regression labels Oct 25, 2015

josef-pkt mentioned this pull request Oct 26, 2015

DOC: changed documentation of brent to be consistent with bracket scipy/scipy#5422

Merged

josef-pkt reviewed Jun 19, 2016
View reviewed changes

josef-pkt added this to the 0.9 milestone Jun 19, 2016

josef-pkt added this to 85percent in josef-pulls Feb 26, 2017

josef-pkt force-pushed the segmented_knots branch from 41e2bc7 to 017bf76 Compare May 20, 2017 01:47

josef-pkt added 7 commits October 31, 2017 20:58

ENH: initial commit Segmented module, class, with example

1af9aa1

TST: add first unit tests, from_model fails. knot unsorted problem

1b5ec42

REF: use correct searchsorted bounds in brack (if bounds are not sorted)

b354883

REF: add_knots use sorted knots for knots insertion

2d5dcac

ENH/TST: test for single knot, test function, add idx in function

04420d6

ENH/CLN: small generalization, cleanup and docstrings

6d8275e

REF/MAINT: compatibility numpy < 1.9 percentile

5cccb33

josef-pkt force-pushed the segmented_knots branch from 017bf76 to 5cccb33 Compare November 1, 2017 01:00

josef-pkt mentioned this pull request Oct 26, 2018

WIP/ENH Gam 2744 rebased2 #5296

Closed

josef-pkt mentioned this pull request Feb 16, 2022

ENH: Segmented knots rebased #8124

Open

josef-pkt removed this from the 0.9 milestone Feb 16, 2022

josef-pkt added the superseded Used for PRs with content that is superseded by a newer PR label Feb 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP ENH: optimize knot location, Segmented Regression #2677

WIP ENH: optimize knot location, Segmented Regression #2677

josef-pkt commented Oct 25, 2015

josef-pkt commented Oct 25, 2015

josef-pkt commented Oct 25, 2015

josef-pkt commented Oct 27, 2015

josef-pkt commented Oct 27, 2015

josef-pkt commented Oct 27, 2015

josef-pkt Jun 19, 2016

josef-pkt commented Jun 19, 2016

josef-pkt commented Jun 19, 2016 •

edited

josef-pkt commented May 20, 2017

josef-pkt commented May 20, 2017

coveralls commented May 20, 2017

josef-pkt commented Nov 1, 2017

coveralls commented Nov 1, 2017

josef-pkt commented Aug 14, 2018 •

edited

josef-pkt commented Feb 16, 2022


		class Segmented(object):
		"""class to search for a variable transformation in regression

WIP ENH: optimize knot location, Segmented Regression #2677

Are you sure you want to change the base?

WIP ENH: optimize knot location, Segmented Regression #2677

Conversation

josef-pkt commented Oct 25, 2015

josef-pkt commented Oct 25, 2015

josef-pkt commented Oct 25, 2015

josef-pkt commented Oct 27, 2015

josef-pkt commented Oct 27, 2015

josef-pkt commented Oct 27, 2015

josef-pkt Jun 19, 2016

Choose a reason for hiding this comment

josef-pkt commented Jun 19, 2016

josef-pkt commented Jun 19, 2016 • edited

josef-pkt commented May 20, 2017

josef-pkt commented May 20, 2017

coveralls commented May 20, 2017

josef-pkt commented Nov 1, 2017

coveralls commented Nov 1, 2017

josef-pkt commented Aug 14, 2018 • edited

josef-pkt commented Feb 16, 2022

josef-pkt commented Jun 19, 2016 •

edited

josef-pkt commented Aug 14, 2018 •

edited