New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP ENH: optimize knot location, Segmented Regression #2677
base: main
Are you sure you want to change the base?
Conversation
extra failure np.percentile doesn't seem to be vectorized in numpy 1.7 I get now the failure in from_model also in my original example script. It looks like a local minimum that shows up because of the default starting knots. The original example used different starting knots and went to the better local minimum. edit |
update to previous: edit |
I just saw that R has a segmented package https://cran.r-project.org/web/packages/segmented/index.html |
I haven't decided yet whether to keep bounds as list inside the loops or switch everywhere to ndarrays |
about R segmented: seems to have a similar API structure as I used The package uses Davies test for the p-value for the Null of zero slope increment, but seems to use standard Wald values for the rest, e.g. confidence intervals (assuming break exists). (Nuisance parameter not identified under alternative.) standard error and confidence interval of knot location are based on Delta method with the gap parameters. |
|
||
class Segmented(object): | ||
"""class to search for a variable transformation in regression | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing Parameters section
Based on the docstring it's not very clear how to use this. rework API or document better. https://github.com/josef-pkt/misc/blob/segmented/notebooks/ex_segmented_regression_sm.ipynb |
also currently only supports OLS (implicitly, no extra args and minimize ssr) It might still be useful as is, because it's also possible to construct a regular model with possibly patsy linear splines with the knot locations found by the Segmented class. aic, bic, and lr test might also be valid for choosing the number of knots, with the caveat that wald inference and standard errors don't take knot search into account. (I guess LR test will not be a standard case, "parameter not identified under the alternative" ? I don't remember right now.) |
41e2bc7
to
017bf76
Compare
rebased (merge conflict in compat/numpy easily resolved) from what I remember this was ready to merge, just waiting for feedback |
fails on python 2:
|
017bf76
to
5cccb33
Compare
rebased, no merge conflicts |
test failure python 2.7 incompatibility in add_knot
same comment already twice above |
rebased version in #8124 |
see #2634
Use case: This is a complementary method to penalized splines, where we want to carefully place a few knots instead of penalizing a large number of knots. The main advantage is in cases where we have clear breaks like in piecewise linear regression. This uses power splines and assumes that the estimated function is continuous but not smooth.
This is still largely "research code". I'm trying to figure out how to make this computationally efficient and robust in the optimization.
to the optimization: I guess we can have many local optima. My initial idea was to optimize a knot at a time, restricting it to be within the interval given by the neighboring knots. (article on knot selection in splines, ref?). Because of a mistake in the scipy documentation for
brent
I actually use the lower and the center point of the bracket which actually works better. Usingfminbound
to force the interval gives worse results.One consequence/problem with
brent
is that it doesn't preserve the increasing order or knots.It works well in the examples, but it doesn't "feel" robust.
Another option is to increase the number of knots sequentially. This also works well in the example, but we still have the unsorted knot problem.
TODO:
Similar to stateful transform in patsy.
data
information is not transferred to created modelsmodel.__class__.__init__
keywords yet, should be relatively easy to add