Add full support for contrasts to Formulaic #70

matthewwardrop · 2022-04-05T03:20:38Z

This patch set is largely complete, but lacks documentation and a few more unit tests for various edge cases. Nevertheless, everything should work pretty robustly as is.

As of this PR, you can use arbitrary contrasts in a formula, e.g.: y ~ C(A, contr.helmert), or y ~ C(A, contr.treatment("base")), or y ~ C(A, {"coding": [...], ...}), etc.

Add documentation, type annotations, etc.

codecov · 2022-04-05T05:41:43Z

Codecov Report

Merging #70 (bb7498a) into main (f70434b) will not change coverage.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##              main       #70    +/-   ##
==========================================
  Coverage   100.00%   100.00%            
==========================================
  Files           44        44            
  Lines         1899      2161   +262     
==========================================
+ Hits          1899      2161   +262

Flag	Coverage Δ
unittests	`100.00% <100.00%> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
formulaic/materializers/base.py	`100.00% <100.00%> (ø)`
formulaic/materializers/pandas.py	`100.00% <100.00%> (ø)`
formulaic/materializers/types/factor_values.py	`100.00% <100.00%> (ø)`
formulaic/transforms/__init__.py	`100.00% <100.00%> (ø)`
formulaic/transforms/contrasts.py	`100.00% <100.00%> (ø)`
formulaic/utils/cast.py	`100.00% <100.00%> (ø)`
formulaic/utils/sparse.py	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f70434b...bb7498a. Read the comment docs.

matthewwardrop · 2022-04-05T05:43:04Z

@bashtage fyi. Let me know if this support for contrast matrices is insufficient for statsmodels use-cases.

bashtage · 2022-04-06T16:47:21Z

I'll take a look, thanks.

bashtage · 2022-04-06T16:49:24Z

formulaic/utils/sparse.py

 ) -> spsparse.csc_matrix:
    """
    Categorically encode (via dummy encoding) a `series` as a sparse matrix.

    Args:
        series: The iterable which should be sparse encoded.
+        levels: The levels for which to generate dummies (if not specified, a
+            dummy variable is generated for every level in `series`).
        reduced_rank: Whether to omit the first column in order to avoid


You removed reduced_rank and now have drop_first.

bashtage · 2022-04-06T17:53:27Z

Ok, so I'll jut go with a noob question. Suppose my model is y ~ 1 + x1 + x2 + x3. How do I get the linear restriction matrix for the contrast x2 + x3 = 1? What about x1=0; x2 + x3 = 1?

matthewwardrop · 2022-04-06T20:50:11Z

Hi @bashtage ! Apologies, this PR doesn't add support for linear constraints; that work is separate. I suspect your message in the other issue thread was a typo, then? You wrote "contrasts" but perhaps meant "constraints"?

I'll bump the priority of that work too.

bashtage · 2022-04-07T07:13:26Z

I meant in this way: https://en.wikipedia.org/wiki/Contrast_(statistics) . It is what statsmodels calls these things (I don't like the name, but...). You are right that I'm mostly looking for https://patsy.readthedocs.io/en/latest/API-reference.html#linear-constraints

Sorry for the confusion.

matthewwardrop · 2022-04-07T09:50:51Z

Huh... interesting. So actually, that's canonically the sense in which I'm referring to "contrasts" as well, but here in the sense of using them to encode a categorical variable into a full rank matrix. It didn't occur to me to think of them as the same thing, since I thought that the constraints (being tracked in #38) were acting on columns of the model matrix (rather than levels of a category), and it's not clear to me how they sum to zero. I thought linear constraints would be anything of form Ax = b with A a matrix of coefficients for combinations of the features of a model matrix x, and b a vector of constants. Am I missing something?

matthewwardrop · 2022-04-07T10:05:51Z

Hmm... thinking about it a bit more, I see the equivalence. If you include 1 your 'x' matrix, then you could write A as a matrix with rows always summing to zero, much like a regular contrast matrix. Got it.

bashtage · 2022-04-07T22:07:24Z

Both features are very useful - the dummy coding for full compat with patsy, and constraints for hypothesis testing. Thanks.

matthewwardrop added the WIP label Apr 5, 2022

matthewwardrop added this to the 0.3.x milestone Apr 5, 2022

matthewwardrop self-assigned this Apr 5, 2022

matthewwardrop force-pushed the add_support_for_contrasts branch from dd6266f to 2ade183 Compare April 5, 2022 05:38

matthewwardrop mentioned this pull request Apr 5, 2022

Add the missing transforms to bring parity with patsy / R. #18

Open

7 tasks

bashtage reviewed Apr 6, 2022

View reviewed changes

matthewwardrop added 4 commits April 27, 2022 13:37

Rename encode_categorical transform to contrasts.

7ac3099

Add support for delegated/deferred encoding.

9b12dab

Propagate FactorValues metadata through cast as_columns operations.

50f904f

Don't assume that dictionary values are encoded.

22f1a16

matthewwardrop force-pushed the add_support_for_contrasts branch from 2ade183 to 6c216c7 Compare April 27, 2022 05:32

Refactor and flesh out support for categorical encodings.

bb7498a

matthewwardrop force-pushed the add_support_for_contrasts branch from 6c216c7 to bb7498a Compare April 27, 2022 05:36

matthewwardrop removed the WIP label Apr 27, 2022

matthewwardrop changed the title ~~Draft: Add full support for contrasts to Formulaic~~ Add full support for contrasts to Formulaic Apr 27, 2022

matthewwardrop merged commit 88efa81 into main Apr 27, 2022

matthewwardrop deleted the add_support_for_contrasts branch April 27, 2022 05:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add full support for contrasts to Formulaic #70

Add full support for contrasts to Formulaic #70

matthewwardrop commented Apr 5, 2022 •

edited

codecov bot commented Apr 5, 2022 •

edited

matthewwardrop commented Apr 5, 2022

bashtage commented Apr 6, 2022

bashtage Apr 6, 2022

bashtage commented Apr 6, 2022

matthewwardrop commented Apr 6, 2022 •

edited

bashtage commented Apr 7, 2022

matthewwardrop commented Apr 7, 2022

matthewwardrop commented Apr 7, 2022

bashtage commented Apr 7, 2022

Add full support for contrasts to Formulaic #70

Add full support for contrasts to Formulaic #70

Conversation

matthewwardrop commented Apr 5, 2022 • edited

codecov bot commented Apr 5, 2022 • edited

Codecov Report

matthewwardrop commented Apr 5, 2022

bashtage commented Apr 6, 2022

bashtage Apr 6, 2022

Choose a reason for hiding this comment

bashtage commented Apr 6, 2022

matthewwardrop commented Apr 6, 2022 • edited

bashtage commented Apr 7, 2022

matthewwardrop commented Apr 7, 2022

matthewwardrop commented Apr 7, 2022

bashtage commented Apr 7, 2022

matthewwardrop commented Apr 5, 2022 •

edited

codecov bot commented Apr 5, 2022 •

edited

matthewwardrop commented Apr 6, 2022 •

edited