BUG/ENH: fix and improve constant detection #1797

Merged
merged 5 commits into from Jul 20, 2014

Projects

None yet

2 participants

@josef-pkt
Member

see #1794 and #1792

constant detection has now several layers of checks

  • (1) hasconst
  • (2) find column with ptp=0
    • find column with mean = 1
    • find column with mean !=0
  • (3) find implicit constant

a column of zeros does not count as constant anymore, const in (2) needs ptp==0 and mean !=0

disable try ... except, and handle exog is None explicitly (that was the only case that went trough the exception part

if we don't detect a non-zeros constant, then we check for implicit in all cases

(There are lot's of cases because I want to minimize or short circuit the cost of const detection.)

still open:

  • hasconst assumes implicit constant or no constant, cannot be used to specify const_idx
  • column of zeros prevents implicit constant check DONE
  • make it into standalone function and return (k_constant, const_idx) instead of attaching it directly.
  • incomplete cleanup, only cleanup left is to remove special code in RegressionModel
@coveralls

Coverage Status

Coverage increased (+0.02%) when pulling 9a4e437 on josef-pkt:ref_hasconstant into 7df5291 on statsmodels:master.

@josef-pkt
Member

@jseabold The last commit 38987ca removes the try except.

As far as I can see it was only covering the case for exog is None. For all other cases an exception was an error and should be fixed.

the previous diff view josef-pkt@statsmodels:master...38987ca
is slightly better, but mainly shows that instead of a few lines there are a lot more now.

any comments?

@coveralls

Coverage Status

Coverage increased (+0.02%) when pulling 38987ca on josef-pkt:ref_hasconstant into 7df5291 on statsmodels:master.

@coveralls

Coverage Status

Coverage increased (+0.03%) when pulling 38987ca on josef-pkt:ref_hasconstant into 7df5291 on statsmodels:master.

@josef-pkt josef-pkt commented on an outdated diff Jul 18, 2014
statsmodels/base/data.py
self.const_idx = const_idx
- except: # should be an index error but who knows, means no const
+ else:
+ # we only have a zero column and no other constant
+ check_implicit = True
+ elif self.k_constant > 1:
+ # we have more than one constant column
+ # look for ones
+ values = [] # keep values if we need != 0
+ for idx in const_idx:
+ value = self.exog[:, idx].mean()
+ if value == 1:
+ self.k_constant = 1
+ self.const_idx = const_idx[idx]
@josef-pkt
josef-pkt Jul 18, 2014 Member

shouldn't this be self.const_idx = idx ?

@josef-pkt
Member

should be ready to merge when TravisCI is green.

@coveralls

Coverage Status

Coverage increased (+0.81%) when pulling 9a4366b on josef-pkt:ref_hasconstant into 7df5291 on statsmodels:master.

@josef-pkt
Member

rebased and force pushed

@coveralls

Coverage Status

Coverage increased (+0.03%) when pulling d888f7c on josef-pkt:ref_hasconstant into 88a4c51 on statsmodels:master.

@josef-pkt josef-pkt merged commit 40bd7e0 into statsmodels:master Jul 20, 2014

2 checks passed

continuous-integration/appveyor AppVeyor build succeeded
Details
continuous-integration/travis-ci The Travis CI build passed
Details
@josef-pkt josef-pkt deleted the josef-pkt:ref_hasconstant branch Apr 10, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment