Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[sklearn] OneHotEncoder does't work correctly #684

Open
faterazer opened this issue Feb 9, 2023 · 6 comments · May be fixed by #696
Open

[sklearn] OneHotEncoder does't work correctly #684

faterazer opened this issue Feb 9, 2023 · 6 comments · May be fixed by #696
Assignees
Labels
bug Something isn't working enhancement New feature or request help wanted Extra attention is needed

Comments

@faterazer
Copy link

Hello, I found this project last week, and thanks for all of these work.

I installed Hummingbird-ml==0.47 by pip, and I want to know which version of sklearn should I use.

I want to use one-hot encoder of sklearn to preprocess my categorical features, but the result's dim of sklearn is different from the dim of converted pytorch model. For sklearn, 15 features -> 69 dim,but for converted pytorch mdoel, 15 features -> 76 dim.

After my check, I'm sure the problem is the argument of sklearn's OneHotEncoder:

Changed in version 1.1: 'infrequent_if_exist' was added to automatically handle unknown categories and infrequent categories.

Is there any way to solve this problem?Thanks for any solution!

@ksaur
Copy link
Collaborator

ksaur commented Feb 9, 2023

Hi @faterazer, thanks for reaching out! We use whatever the most current version of SKL is, so right now 1.2.1.

Was your model trained on the same version of scikit-learn that you're trying to use Hummingbird with? Just trying to make sure it's not a simple fix. (Lots of times, users have issues if the model is trained with an older version of SKL and then they call Hummingbird on a saved model.)

Can you post a little bit of your code so we can take a look? Maybe we need to add the new field.

@faterazer
Copy link
Author

faterazer commented Feb 13, 2023 via email

@ksaur
Copy link
Collaborator

ksaur commented Feb 13, 2023

Hello! I think that the attachment (test.zip) got dropped. If it's easier, you could check them into a fork in github and put a link!

@faterazer
Copy link
Author

Hello! I think that the attachment (test.zip) got dropped. If it's easier, you could check them into a fork in github and put a link!
test.zip
How about this time? I reply directly through Github.

@ksaur
Copy link
Collaborator

ksaur commented Feb 15, 2023

Thank you for your in-depth example with details! I was able to reproduce everything you said.

Yes it looks like we need to add this feature to the list of supported options (and we should at least be putting an error for ones we don't support). We'll add that to the queue!

@ksaur ksaur added bug Something isn't working enhancement New feature or request help wanted Extra attention is needed labels Feb 15, 2023
@faterazer
Copy link
Author

faterazer commented Feb 15, 2023 via email

@ksaur ksaur linked a pull request Apr 4, 2023 that will close this issue
@ksaur ksaur self-assigned this Apr 4, 2023
@ksaur ksaur linked a pull request Apr 4, 2023 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants