GLM.fit does not allow start_params #443

Closed
opened this issue Aug 24, 2012 · 6 comments

Projects
None yet
Member

jseabold commented Aug 24, 2012

 GLM fit signature should allow starting parameters to be specified. This came up as I was reading about how IRLS is not a robust fitting algorithm. http://www.r-bloggers.com/how-robust-is-logistic-regression/
Member

josef-pkt commented Aug 25, 2012

 reading a bit how the initial estimate works, I think `mu = self.family.starting_mu(self.endog)` should depend on data_weights. It uses y.mean() instead of weighted mean in Binomial. an example that breaks with the default starting values is at a comment on the original post http://www.win-vector.com/blog/2012/08/how-robust-is-logistic-regression/
Member

josef-pkt commented Aug 25, 2012

 discrete Logit fails with singular matrix exception when using the default newton method. method='ncg' converges very fast (10 iterations) to the correct solution (result of R glm with adjusted starting values)
Member

josef-pkt commented Aug 25, 2012

 example script for the case in the comment ``````# -*- coding: utf-8 -*- """ Created on Sat Aug 25 10:14:55 2012 Author: Josef Perktold convergence failure of GLM Binomial and Logit example based on comment in http://www.win-vector.com/blog/2012/08/how-robust-is-logistic-regression/ Note: example has one value with predicted probability zero >>> (reslogit.predict() < 1e-10).sum() 1 >>> reslogit.predict().min() 9.6285423080907814e-233 method='ncg' looks good """ import numpy as np from numpy.testing import assert_almost_equal, assert_equal import statsmodels.api as sm y2 = np.array('0 1 0 0 0 1'.split(), int) wt = np.array([50,1,50,1,5,10]) y2 = np.repeat(y2, wt) x2 = np.repeat([0,0,0.001,100,-1,-1], wt) assert_equal(y2.sum(), 11) assert_equal(len(y2), 117) #y2.mean() is 11./117 = 0.094017094017094016 resglm2 = sm.GLM(y2, sm.add_constant(x2, prepend=True), family=sm.families.Binomial()).fit() print resglm2.params #nan try: reslogit = sm.Logit(y2, sm.add_constant(x2, prepend=True)).fit() except np.linalg.linalg.LinAlgError: print 'numpy.linalg.linalg.LinAlgError' #numpy.linalg.linalg.LinAlgError: Singular matrix reslogit = sm.Logit(y2, sm.add_constant(x2, prepend=True)).fit(method='nm') #Warning: Maximum number of iterations has been exceeded reslogit = sm.Logit(y2, sm.add_constant(x2, prepend=True)).fit(method='ncg') #from R glm with starting values (-4, -5) #prints: rglm_params = [-4.603050220003291, -5.296345452725506] rglm_aic = 34.31049560846446 assert_almost_equal(reslogit.params, rglm_params, decimal=8) assert_almost_equal(reslogit.aic, rglm_aic, decimal=12) #with starting values reslogit_s = sm.Logit(y2, sm.add_constant(x2, prepend=True)).fit( start_params=[-4,-5]) assert_almost_equal(reslogit_s.params, rglm_params, decimal=8) assert_almost_equal(reslogit_s.aic, rglm_aic, decimal=12) ``````

Closed

Closed

jseabold added a commit to jseabold/statsmodels that referenced this issue Apr 18, 2014

``` ENH: Allow start_params in GLM. Closes statsmodels#443. ```
``` 5d89811 ```
Member

josef-pkt commented Apr 18, 2014

 For adding the `start_params` I think @yizhong 's change in line 386 is the right way. unit tests could just check the params in the fit history of the results instance. @yizhong pull request would be very welcome
Member Author

jseabold commented Apr 18, 2014

 I think this failing case is orthogonal to the `start_params` issue. Though since I think things like this  are boolean indices anyway then we should index and have two sums right (or use nansum)? This (almost) solves the issue here. I want to refactor all of this now after looking at it again 5 years later...

jseabold added a commit to jseabold/statsmodels that referenced this issue Apr 18, 2014

``` ENH: Allow start_params in GLM. Closes statsmodels#443. ```
``` 30e7327 ```
Member

josef-pkt commented Apr 18, 2014

 AFAIU from the comments, the original blog post was about starting values and non-convergence. That should be a new issue.

jseabold added a commit to jseabold/statsmodels that referenced this issue Apr 18, 2014

``` ENH: Allow start_params in GLM. Closes statsmodels#443. ```
``` f510afa ```

Open

PierreBdR pushed a commit to PierreBdR/statsmodels that referenced this issue Sep 2, 2014

``` ENH: Allow start_params in GLM. Closes statsmodels#443. ```
``` 06013c3 ```