Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add full support for contrasts to Formulaic #70

Merged
merged 5 commits into from
Apr 27, 2022

Conversation

matthewwardrop
Copy link
Owner

@matthewwardrop matthewwardrop commented Apr 5, 2022

This patch set is largely complete, but lacks documentation and a few more unit tests for various edge cases. Nevertheless, everything should work pretty robustly as is.

As of this PR, you can use arbitrary contrasts in a formula, e.g.: y ~ C(A, contr.helmert), or y ~ C(A, contr.treatment("base")), or y ~ C(A, {"coding": [...], ...}), etc.

  • Add documentation, type annotations, etc.

@matthewwardrop matthewwardrop added this to the 0.3.x milestone Apr 5, 2022
@matthewwardrop matthewwardrop self-assigned this Apr 5, 2022
@codecov
Copy link

codecov bot commented Apr 5, 2022

Codecov Report

Merging #70 (bb7498a) into main (f70434b) will not change coverage.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##              main       #70    +/-   ##
==========================================
  Coverage   100.00%   100.00%            
==========================================
  Files           44        44            
  Lines         1899      2161   +262     
==========================================
+ Hits          1899      2161   +262     
Flag Coverage Δ
unittests 100.00% <100.00%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
formulaic/materializers/base.py 100.00% <100.00%> (ø)
formulaic/materializers/pandas.py 100.00% <100.00%> (ø)
formulaic/materializers/types/factor_values.py 100.00% <100.00%> (ø)
formulaic/transforms/__init__.py 100.00% <100.00%> (ø)
formulaic/transforms/contrasts.py 100.00% <100.00%> (ø)
formulaic/utils/cast.py 100.00% <100.00%> (ø)
formulaic/utils/sparse.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f70434b...bb7498a. Read the comment docs.

@matthewwardrop
Copy link
Owner Author

@bashtage fyi. Let me know if this support for contrast matrices is insufficient for statsmodels use-cases.

@bashtage
Copy link
Contributor

bashtage commented Apr 6, 2022

I'll take a look, thanks.

) -> spsparse.csc_matrix:
"""
Categorically encode (via dummy encoding) a `series` as a sparse matrix.

Args:
series: The iterable which should be sparse encoded.
levels: The levels for which to generate dummies (if not specified, a
dummy variable is generated for every level in `series`).
reduced_rank: Whether to omit the first column in order to avoid
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You removed reduced_rank and now have drop_first.

@bashtage
Copy link
Contributor

bashtage commented Apr 6, 2022

Ok, so I'll jut go with a noob question. Suppose my model is y ~ 1 + x1 + x2 + x3. How do I get the linear restriction matrix for the contrast x2 + x3 = 1? What about x1=0; x2 + x3 = 1?

@matthewwardrop
Copy link
Owner Author

matthewwardrop commented Apr 6, 2022

Hi @bashtage ! Apologies, this PR doesn't add support for linear constraints; that work is separate. I suspect your message in the other issue thread was a typo, then? You wrote "contrasts" but perhaps meant "constraints"?

I'll bump the priority of that work too.

@bashtage
Copy link
Contributor

bashtage commented Apr 7, 2022

I meant in this way: https://en.wikipedia.org/wiki/Contrast_(statistics) . It is what statsmodels calls these things (I don't like the name, but...). You are right that I'm mostly looking for https://patsy.readthedocs.io/en/latest/API-reference.html#linear-constraints

Sorry for the confusion.

@matthewwardrop
Copy link
Owner Author

Huh... interesting. So actually, that's canonically the sense in which I'm referring to "contrasts" as well, but here in the sense of using them to encode a categorical variable into a full rank matrix. It didn't occur to me to think of them as the same thing, since I thought that the constraints (being tracked in #38) were acting on columns of the model matrix (rather than levels of a category), and it's not clear to me how they sum to zero. I thought linear constraints would be anything of form Ax = b with A a matrix of coefficients for combinations of the features of a model matrix x, and b a vector of constants. Am I missing something?

@matthewwardrop
Copy link
Owner Author

Hmm... thinking about it a bit more, I see the equivalence. If you include 1 your 'x' matrix, then you could write A as a matrix with rows always summing to zero, much like a regular contrast matrix. Got it.

@bashtage
Copy link
Contributor

bashtage commented Apr 7, 2022

Both features are very useful - the dummy coding for full compat with patsy, and constraints for hypothesis testing. Thanks.

@matthewwardrop matthewwardrop changed the title Draft: Add full support for contrasts to Formulaic Add full support for contrasts to Formulaic Apr 27, 2022
@matthewwardrop matthewwardrop merged commit 88efa81 into main Apr 27, 2022
@matthewwardrop matthewwardrop deleted the add_support_for_contrasts branch April 27, 2022 05:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants