-
Notifications
You must be signed in to change notification settings - Fork 238
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[query] implement hl.dummy_code
#13601
Comments
After some reading, I am still not sure what exactly the difference is between dummy coding and one-hot encoding. Suppose there is a categorical variable with However, from the prototype implementation in this issue, the scikit-learn one-hot encoder documentation, and the dummy variable Wikipedia article, I get the impression that dummy coding and one-hot encoding are synonyms and that there is no real distinction. Anyway, I would like to work on this issue. I will base my implementation on the prototype, and perhaps we can add a parameter to drop one of the indicator variables similar to what the scikit-learn one-hot encoder has. |
What happened?
Categorical data requires the user to preprocess their data. The subtle distinctions between dummy coding and one-hot encoding are not obvious to all users. We should provide a simple method, clear docs, and clear examples to ease the analysis of categorical variables.
Here's a prototype implementation
References
Version
0.2.122
Relevant log output
No response
The text was updated successfully, but these errors were encountered: