-
-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
whitespace is a terrible separator for feature names in PolynomialFeatures.get_feature_names #10742
Comments
@amueller I will try to reproduce the issue, if confirmed I will push a PR to fix it by adding separator as an option. |
Here's an example that reproduces the issue
When including a feature which has a whitespace, in this case the feature "A B" output is ambiguous. |
Changing the code in here from
to this
removes the ambiguity. Here's the above example run with the suggested change:
|
that's not so much more readable!
|
@jnothman That might be true, but at least I thought about these: - But I think they are ugly. We can also throw a warning and replace any whitespace within a feature name with underscore:
@amueller what do you think? |
In terms of backwards compatibility and avoiding cluttering the interface,
I like the underscores. On the other hand, users need only modify the input
in that case.
We're really we're unlikely to get this perfect with strings in any case.
…On 29 Mar 2018 5:04 pm, "Mohamed Ali Jamaoui" ***@***.***> wrote:
@jnothman <https://github.com/jnothman> That might be true, but at least 'A
B*C' adds a bit of readability compared to 'A B C'.
Can you think of something else that would be more readable?
I thought about these:
-'"A B"*"C"'
-'(A B)*(C)'
But I think they are ugly. We can also throw a warning and replace any
whitespace within a feature name with underscore:
- 'A B C' => 'A_B C'or
- 'A B C' => 'A_B*C'
@amueller <https://github.com/amueller> what do you think?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#10742 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz65lPrjh51ADBmEQJwkuMUn_NJvZ6ks5tjHmKgaJpZM4SZMcL>
.
|
@jnothman, I agree that there is no perfect solution to this, but I think we need at least to throw a warning in case the input_features contain a whitespace. |
@amueller what do you think? |
Add a custom separator='' parameter to the function, then the dev can make it what they want. I wouldn't bother with also processing the A^2 stuff, leave that as a formatting example afterwards. default is space for backwards compatibility |
If your feature names have white spaces in them, it's hard to see which features are interactions right now.
Maybe we should make the separator an option of
get_feature_names
or use something like*
?The text was updated successfully, but these errors were encountered: