Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intercept in Contrast Coding Schemes #370

Open
PaulWestenthanner opened this issue Sep 26, 2022 · 7 comments
Open

Intercept in Contrast Coding Schemes #370

PaulWestenthanner opened this issue Sep 26, 2022 · 7 comments

Comments

@PaulWestenthanner
Copy link
Collaborator

PaulWestenthanner commented Sep 26, 2022

Expected Behavior

The constant (all values 1) intercept column should not be added when applying contrast coding schemes (i.e. backward difference, sum, polynomial and helmert coding)

I don't think this intercept column is needed. If you fit a supervised learning model it is probably gonna help to remove the intercept column. I think it is there because when fitting linear models with statsmodels you have to add the intercept.
However I don't like that the output of an encoder would then depend on whether the intercept column is already there or not, e.g. if I first apply encoder A on column A and then encoder B on column B the intercept column of B overwrite A's intercept column hence not adding a new column. Also if I have (for some reason) a column called intercept that is not constant it would get overwritten.

Any opinion? Am I missing something? Is the intercept necessary?

Actual Behavior

A constant column with all values 1 is added

Steps to Reproduce the Problem

Run transform on any fitted contrast coding encoder, e.g.

        train = ['A', 'B', 'C']
        encoder = encoders.BackwardDifferenceEncoder(handle_unknown='value', handle_missing='value')
        encoder.fit_transform(train)
@glevv
Copy link
Contributor

glevv commented Nov 6, 2022

Can intercept be added as a class parameter?
If so, then this is the way to go, imo. Then these classes could be tested with different values of intercept to catch errors and bugs.

@PaulWestenthanner
Copy link
Collaborator Author

Yes we could I think, that should be rather straight forward as well. Would you set with_intercept=True as a default for backwards compatibility or not (which might be more correct)?

@glevv
Copy link
Contributor

glevv commented Nov 6, 2022

Yes we could I think, that should be rather straight forward as well. Would you set with_intercept=True as a default for backwards compatibility or not (which might be more correct)?

Yep, I would set it to True to keep the default behavior intact

@glevv
Copy link
Contributor

glevv commented Jan 27, 2023

Is this one still relevant?

@PaulWestenthanner
Copy link
Collaborator Author

I think it is, I'm slightly in favor of setting with_intercept to False as I don't think many people will need the intercept.
However I'd like to keep this issue open for a little while and hope for more voices from the community, maybe from actual users of the contrast coding schemes. I feel a little uneasy to take this decision by myself without further discussion

@jacekkrawiec
Copy link

jacekkrawiec commented Jun 9, 2023

user's perspective: imo encoders should do what their name says they do - encode existing columns. If there's a need to add an intercept, it has to be done consciously by the user. Especially for someone using statsmodels, where the default way is to use
sm.add_constant. If you're in favor of keeping this feature as a parameter I'd suggest to make it False by default, otherwise I'd remove it completely.

@PaulWestenthanner
Copy link
Collaborator Author

Ok this seems to be a consensus, let's remove it - at least by default

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants