-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add full support for contrasts to Formulaic #70
Conversation
dd6266f
to
2ade183
Compare
Codecov Report
@@ Coverage Diff @@
## main #70 +/- ##
==========================================
Coverage 100.00% 100.00%
==========================================
Files 44 44
Lines 1899 2161 +262
==========================================
+ Hits 1899 2161 +262
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
@bashtage fyi. Let me know if this support for contrast matrices is insufficient for |
I'll take a look, thanks. |
formulaic/utils/sparse.py
Outdated
) -> spsparse.csc_matrix: | ||
""" | ||
Categorically encode (via dummy encoding) a `series` as a sparse matrix. | ||
|
||
Args: | ||
series: The iterable which should be sparse encoded. | ||
levels: The levels for which to generate dummies (if not specified, a | ||
dummy variable is generated for every level in `series`). | ||
reduced_rank: Whether to omit the first column in order to avoid |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You removed reduced_rank and now have drop_first.
Ok, so I'll jut go with a noob question. Suppose my model is |
Hi @bashtage ! Apologies, this PR doesn't add support for linear constraints; that work is separate. I suspect your message in the other issue thread was a typo, then? You wrote "contrasts" but perhaps meant "constraints"? I'll bump the priority of that work too. |
I meant in this way: https://en.wikipedia.org/wiki/Contrast_(statistics) . It is what statsmodels calls these things (I don't like the name, but...). You are right that I'm mostly looking for https://patsy.readthedocs.io/en/latest/API-reference.html#linear-constraints Sorry for the confusion. |
Huh... interesting. So actually, that's canonically the sense in which I'm referring to "contrasts" as well, but here in the sense of using them to encode a categorical variable into a full rank matrix. It didn't occur to me to think of them as the same thing, since I thought that the constraints (being tracked in #38) were acting on columns of the model matrix (rather than levels of a category), and it's not clear to me how they sum to zero. I thought linear constraints would be anything of form |
Hmm... thinking about it a bit more, I see the equivalence. If you include 1 your 'x' matrix, then you could write A as a matrix with rows always summing to zero, much like a regular contrast matrix. Got it. |
Both features are very useful - the dummy coding for full compat with patsy, and constraints for hypothesis testing. Thanks. |
2ade183
to
6c216c7
Compare
6c216c7
to
bb7498a
Compare
This patch set is largely complete, but lacks documentation and a few more unit tests for various edge cases. Nevertheless, everything should work pretty robustly as is.
As of this PR, you can use arbitrary contrasts in a formula, e.g.:
y ~ C(A, contr.helmert)
, ory ~ C(A, contr.treatment("base"))
, ory ~ C(A, {"coding": [...], ...})
, etc.