Intercept is not added after being removed #148

tomicapretto · 2023-07-15T02:30:37Z

In the following example I remove the intercept and then I add it. It turns out "it's never added". I'm not sure if this is a feature you would like to support, but if I mentally parse from left to right I would expect the final formula to contain the intercept.

import pandas as pd
import formulaic
from formulaic import model_matrix
print("version", formulaic.__version__)
df = pandas.DataFrame({"g": ["a", "a", "a", "b", "b"]})
print(model_matrix("0 + g + 1", df))

version 0.6.4
   g[T.a]  g[T.b]
0       1       0
1       1       0
2       1       0
3       0       1
4       0       1

The text was updated successfully, but these errors were encountered:

matthewwardrop · 2023-07-15T16:28:40Z

Hi @tomicapretto ,

Thanks for taking the time to report this. However, this is actually expected behaviour due to the ordered nature of formulae and the automatic full-rank algorithm (where terms to the left take precedence over terms to the right in terms of materialization).

That is:

>>> print(model_matrix("0 + g + 1", df))
   g[T.a]  g[T.b]
0       1       0
1       1       0
2       1       0
3       0       1
4       0       1
>>> print(model_matrix("0 + 1 + g", df))
   Intercept  g[T.b]
0        1.0       0
1        1.0       0
2        1.0       0
3        1.0       1
4        1.0       1

These model matrices are equivalent, and the columns span the vector space (i.e. the model matrix is full rank). In both cases the intercept is spanned, but in the former case it is spanned by the categorical factors.

I'll close this one out for now, but let me know if have further questions!

matthewwardrop · 2023-07-15T16:32:37Z

Hmm... but this does, upon further reflection, not match the behaviour expected by users of R or patsy. It might be worth special casing the full rank algorithm to deal with the intercept. Point taken. I'll fix this!

tomicapretto · 2023-07-16T14:14:10Z

@matthewwardrop your explanation makes perfect sense (i.e. terms to the left take precedence over terms to the right). But as you said, it may be surprising to users coming from Patsy or R. I don't have a strong preference. But if you decide to continue with the current approach, it would be good to leave a note in the documentation explaining why it works the way it works in this specific case. I could open a PR 😄

matthewwardrop · 2023-09-25T23:46:28Z

Thanks again for reporting this @tomicapretto . This should be resolved as of 0.6.5+.

matthewwardrop closed this as completed Jul 15, 2023

matthewwardrop reopened this Jul 15, 2023

matthewwardrop added the enhancement New feature or request label Aug 12, 2023

matthewwardrop self-assigned this Aug 12, 2023

matthewwardrop mentioned this issue Sep 25, 2023

Do not count literal factors toward degree (fixing sorting of e.g. global intercept). #156

Merged

matthewwardrop closed this as completed in #156 Sep 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Intercept is not added after being removed #148

Intercept is not added after being removed #148

tomicapretto commented Jul 15, 2023

matthewwardrop commented Jul 15, 2023

matthewwardrop commented Jul 15, 2023

tomicapretto commented Jul 16, 2023

matthewwardrop commented Sep 25, 2023

Intercept is not added after being removed #148

Intercept is not added after being removed #148

Comments

tomicapretto commented Jul 15, 2023

matthewwardrop commented Jul 15, 2023

matthewwardrop commented Jul 15, 2023

tomicapretto commented Jul 16, 2023

matthewwardrop commented Sep 25, 2023