Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

We need to figure out how we handle character features and levels #275

Closed
mb706 opened this issue Oct 1, 2019 · 4 comments
Closed

We need to figure out how we handle character features and levels #275

mb706 opened this issue Oct 1, 2019 · 4 comments

Comments

@mb706
Copy link
Collaborator

mb706 commented Oct 1, 2019

We should probably treat character features as something that may have an unlimited number of levels, leave it out from encoding, fixing factors etc. For factors and ordereds we should trust that levels(task$data()[[feature]]) is the same as task$levels()[[feature]]. In that case we can remove the "levels" argument from the data.table functions of PipeOpTaskPreproc.

@mllg
Copy link
Sponsor Member

mllg commented Oct 22, 2019

We can do this, if this helps. Currently blocked by mlr3db: Many databases do not provide a native type for factors, everything is a character. At least we need an option there to auto-convert character -> factor in the backend.

@mb706
Copy link
Collaborator Author

mb706 commented Feb 10, 2020

Since mlr-org/mlr3#369 @mllg do you believe this is no longer blocked?

@mllg
Copy link
Sponsor Member

mllg commented Feb 10, 2020

I guess so.

@mb706 mb706 self-assigned this Feb 10, 2020
@mb706
Copy link
Collaborator Author

mb706 commented Jun 21, 2020

We now consistently handle character features as features without levels.

@mb706 mb706 closed this as completed Jun 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants