Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: patsy raises with zero constraint #89

Closed
josef-pkt opened this issue Jul 8, 2016 · 3 comments
Closed

BUG: patsy raises with zero constraint #89

josef-pkt opened this issue Jul 8, 2016 · 3 comments

Comments

@josef-pkt
Copy link

Why does patsy raise a ValueError in this case?

It looks like a perfectly valid degenerate case.
We are reusing t_test which uses patsy's constraint for other things that pure constraints.
statsmodels/statsmodels#3096

patsy\constraint.py", line 62:

        if np.any(np.all(self.coefs == 0, axis=1)):
            raise ValueError("can't test a constant constraint")
>>> res.t_test(np.zeros((1, 7)))
Traceback (most recent call last):
  File "<pyshell#9>", line 1, in <module>
    res.t_test(np.zeros((1, 7)))
  File "E:\josef_new_notebook\git\statsmodels_py34_pr\statsmodels\base\model.py", line 1204, in t_test
    LC = DesignInfo(names).linear_constraint(r_matrix)
  File "C:\Users\josef\Downloads\WinPython-64bit-3.4.4.2\python-3.4.4.amd64\lib\site-packages\patsy\design_info.py", line 536, in linear_constraint
    return linear_constraint(constraint_likes, self.column_names)
  File "C:\Users\josef\Downloads\WinPython-64bit-3.4.4.2\python-3.4.4.amd64\lib\site-packages\patsy\constraint.py", line 404, in linear_constraint
    return LinearConstraint(variable_names, coefs)
  File "C:\Users\josef\Downloads\WinPython-64bit-3.4.4.2\python-3.4.4.amd64\lib\site-packages\patsy\constraint.py", line 62, in __init__
    raise ValueError("can't test a constant constraint")
ValueError: can't test a constant constraint

>>> res.t_test(1e-20 * np.ones((1, 7)))
<class 'statsmodels.stats.contrast.ContrastResults'>
                             Test for Constraints                             
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
c0          1.534e-18   5.21e-19      2.945      0.004    4.97e-19    2.57e-18
==============================================================================

(In my opinion patsy should not enforce statistics, but can warn instead.)

njsmith added a commit to njsmith/patsy that referenced this issue Jul 9, 2016
@njsmith
Copy link
Member

njsmith commented Jul 9, 2016

See #90

I don't quite understand what you're trying to do here, but I guess I agree that there's no reason to rule out degenerate constraints like this.

There's another similar case that we also error out on right now: constraint matrices with zero rows. Currently the patch in #90 alters patsy to allow for constraints to have a row with all-zero coefficients, but it still doesn't allow for constraints with no rows at all. But semantically, I guess this would be an "always true" constraint, similar to 0*x1 + 0*x2 = 0, so maybe we should allow both? Do you have any opinion on this?

CC: @kshedden

@josef-pkt
Copy link
Author

I never ran into the empty, zero rows constraints. But in many cases, the consistent empty array handling is convenient.

In the case of t_test, after commenting out the zero rows check in patsy:

repr and string summary of test results raises an exception, but all the computation look like they work correctly based on np.linalg and dot behavior with empty arrays.

So, from the statsmodels side the zero row exception could also be removed. (we have to fix the string summary to fully support it)

>>> tt = resnp.t_test(np.ones((0, 7)))
>>> import pprint
>>> pprint.pprint(vars(tt))
{'df_denom': 78.0,
 'dist': <scipy.stats._continuous_distns.t_gen object at 0x0000000007D35CF8>,
 'dist_args': (78.0,),
 'distribution': 't',
 'effect': array([], dtype=float64),
 'pvalue': array([], shape=(0, 0), dtype=float64),
 'sd': array([], shape=(0, 0), dtype=float64),
 'statistic': array([], shape=(0, 0), dtype=float64),
 'tvalue': array([], shape=(0, 0), dtype=float64)}

>>> tt = resnp.t_test((np.ones((0, 7)), []))
>>> pprint.pprint(vars(tt))
{'df_denom': 78.0,
 'dist': <scipy.stats._continuous_distns.t_gen object at 0x0000000007D35CF8>,
 'dist_args': (78.0,),
 'distribution': 't',
 'effect': array([], dtype=float64),
 'pvalue': array([], shape=(0, 0), dtype=float64),
 'sd': array([], shape=(0, 0), dtype=float64),
 'statistic': array([], shape=(0, 0), dtype=float64),
 'tvalue': array([], shape=(0, 0), dtype=float64)}

the above is for OLS with just numpy arrays, but it works the same way with formulas.

@josef-pkt
Copy link
Author

asides:
I was thinking of skipping the patsy detour in t_test with fully specified numpy constraint matrices to avoid any ValueErrors and checks, but decided so far to keep it because it enforces some "required" attributes across models. ("required" = not required for computation but part of minimum common features in internal API)

all zeros constraint, that I ran in before but, AFAIR, wasn't needed in final design for that (constrained estimation):
statsmodels/statsmodels#1704
ValueError if names not defined:
statsmodels/statsmodels#436 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants