Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.Sign up
Handle unseen labels in `LabelEncoder` #13423
There are several issues that reference what this PR addresses: #8136 #3599 #9151 #6231
What does this implement/fix? Explain your changes.
The problem here is that
Any other comments?
Some of the use cases are when you have ordinal features. In some cases I have also encounter that for memory concern I cannot/don't want to expand to one-hot encoded type vectors, so keeping an ordinal feature is very useful.
Apologies for the confusion @jnothman , just following the thread of the issue and related PR I ended in #9151, I see that the class was refactored now into
Nevertheless my comment and issue still stands I believe,
I just saw that you raised a similar point in #11997, there you suggest that the user gives that missing value to impute, I thought about that myself for this PR, but then the user will have to handle that logic of calculation outside the pipeline and then create the pipeline with the imputation value calculated, I found it nicer to implement something in the pipeline itself. But I'm willing to do the work to change the proposed behavior or leave both: user gives value to impute and the most common value is imputed. Please let me know, thanks.