Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: Difference between TargetEncoder and LeaveOneOutEncoder #167

Closed
amueller opened this issue Feb 5, 2019 · 13 comments
Closed

Question: Difference between TargetEncoder and LeaveOneOutEncoder #167

amueller opened this issue Feb 5, 2019 · 13 comments
Labels

Comments

@amueller
Copy link
Member

amueller commented Feb 5, 2019

It's not really clear to me what the difference between TargetEncoder and LeaveOneOutEncoder, as both encode using the target with leave-one-out. Can you maybe clarify and clarify this in the docs?
Does either work for multi-class classification?

@janmotl
Copy link
Collaborator

janmotl commented Feb 6, 2019

It is best to look at some references:

  1. http://dx.doi.org/10.1145/507533.507538
  2. https://pkghosh.wordpress.com/2018/06/18/leave-one-out-encoding-for-categorical-feature-variables-on-spark/

There are two differences. Assuming binary classification:

  1. TargetEncoding returns a weighted average of p(y|x) and p(y). LeaveOneOut does not calculate the average - it just returns an estimate of p(y|x).
  2. LeaveOneOut performs leave-one-out estimation of p(y|x) - it excludes the current row from the estimate. TargetEncoding does not do that - it is using even the current row.

The references and the documentation in the code should possibly be updated. Feel free to submit a pull request.

@amueller
Copy link
Member Author

amueller commented Feb 6, 2019

Thanks for the quick reply. will do. The documentation explicitly says that TargetEncoder uses leave-one-out. So that's wrong?

@janmotl
Copy link
Collaborator

janmotl commented Feb 6, 2019

The documentation for TargetEncoder is wrong (likely because of copy-paste refactoring).
Proof: 'enc.transform(X)' and 'transform(X, y)' give the same result in TargetEncoder. On the other end, if we used LeaveOneOut, we would get different results.

@amueller
Copy link
Member Author

amueller commented Feb 6, 2019

wait, transform takes a y? That's not scikit-learn api... but I guess that's a different issue.

@janmotl
Copy link
Collaborator

janmotl commented Feb 6, 2019

Yes, that's an incompatibility with scikit-learn. LeaveOneOut needs 'y' in order to transform the training data correctly. Of course, the encoder could remember the training 'y', but then the trained encoder would be large even in the deployment...

@amueller
Copy link
Member Author

amueller commented Feb 6, 2019

large meaning number of categories times number of classes, right? That doesn't seem so bad. What does transform do if you only have a single test example? Usually scikit-learn assumes that the test examples are independent, and so running them through one-by-one should give the same result.

@janmotl
Copy link
Collaborator

janmotl commented Feb 7, 2019

large meaning number of categories times number of classes, right?
Large in the sense that we have to remember the whole 'y' because we have to know the target value for each training sample.

Another workaround could be that 'fit()' would return the transformed training set. But I am not sure that it would improve compatibility with scikit-learn.

What does transform do if you only have a single test example? Usually scikit-learn assumes that the test examples are independent, and so running them through one-by-one should give the same result.

The encoders adhere to this logic as well. Leave-one-out is applied only on the training data in order to decrease the overfitting of the model when we observe just a few samples for each category. Leave-one-out is not applied on the testing set. First, we generally do not have the target for the testing set. Second, even if we had the target, it would not decrease the amount of overfitting but it would still increase the error.

@amueller
Copy link
Member Author

amueller commented Feb 7, 2019

Wait, so how do you distinguish between training and test set for transform?
You have the y for training and not for the test set?

In sklearn I think we're slowly going in the direction of allowing fit_transform to do something else than fit().transform() and fit_transform is for transforming the trainingset.

@janmotl
Copy link
Collaborator

janmotl commented Feb 7, 2019

Wait, so how do you distinguish between training and test set for transform?
You have the y for training and not for the test set?

Correct.

In sklearn I think we're slowly going in the direction of allowing fit_transform to do something else than fit().transform() and fit_transform is for transforming the trainingset.

In our case, fit_transform returns self.fit(X, y, **fit_params).transform(X, y). So, it is intended for training set.

@bdubreu-adeo
Copy link

bdubreu-adeo commented Jun 3, 2019

Hello !

LeaveOneOut performs leave-one-out estimation of p(y|x) - it excludes the current row from the estimate. TargetEncoding does not do that - it is using even the current row

Well, except it doesn't exclude anything:

liste1 = ['a','b','a','b','a','b']
liste2 = [1,2,1,4,1,6]
df=pd.DataFrame(np.array([liste1, liste2]).transpose())
df.columns = ['category', 'target']
df

gives this :

category target
a 1
b 2
a 1
b 4
a 1
b 6

encoder = ce.leave_one_out.LeaveOneOutEncoder(cols=['category'], return_df=True)
encoder.fit(df['category'], df['target'], sigma=0.05)
test = encoder.transform(df['category'])
test

category

1.0
4.0
1.0
4.0
1.0
4.0

Excluding rows from the calculation should give me:

1 5 1 4 1 2.5
instead of 1 4 1 4 1 4 which is just the mean of the target for groups A and B, not excluding any rows...

@janmotl
Copy link
Collaborator

janmotl commented Jun 3, 2019

I think the documentation should be more clear about it. LeaveOneOut excludes the current row only in fit_transform(X, y) method. When transform(X2) is used, no exclusion is performed (as the count of rows in X2 can be different from the count of rows in Y... we do not have any other choice).

The idea is, that leave-one-out estimate is used only for training of the downstream model, in order to decrease overfitting of the downstream model. For scoring, we use as exact estimates, as we can get.

If you come with a concrete proposal how to change the documentation, I am happy to do it.

@amueller
Copy link
Member Author

amueller commented Jun 3, 2019

FYI the difference between .fit().transform() and fit_transform() is something we have discussed in sklearn but haven't gotten any consensus yet. It's surprising behavior for the user because it violates one of the sklearn API contracts but it also makes a lot of sense here and it's hard to come up with a better API.

@bdubreu-adeo
Copy link

Thank you for your answers. By perusing the other threads I had managed to figure this out. I have no clue about the documentation. Perhaps one example of classic TE should be followed by a LOO TE using the same example as the slides from Owen ?
Anyway, thank you for your work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants