-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can not resolve column names that are also functions in the environment #65
Comments
Can you reply with a complete reproducable example?
In other words, a snippet of code I can cut'n'paste into Python to test.
…On Tue, Aug 28, 2018 at 4:11 PM Holger Brandl ***@***.***> wrote:
Consider the following example:
diamonds >> mutate(rank=min_rank(X.carat)) >> filter_by(X.rank <10)
This fails with
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/brandl/anaconda3/envs/scikit_playground/lib/python3.6/site-packages/dfply/base.py", line 142, in __rrshift__
result = self.function(other_copy)
File "/Users/brandl/anaconda3/envs/scikit_playground/lib/python3.6/site-packages/dfply/base.py", line 149, in <lambda>
return pipe(lambda x: self.function(x, *args, **kwargs))
File "/Users/brandl/anaconda3/envs/scikit_playground/lib/python3.6/site-packages/dfply/base.py", line 329, in __call__
return self.function(*args, **kwargs)
File "/Users/brandl/anaconda3/envs/scikit_playground/lib/python3.6/site-packages/dfply/base.py", line 282, in __call__
return self.function(df, *args, **kwargs)
File "/Users/brandl/anaconda3/envs/scikit_playground/lib/python3.6/site-packages/dfply/subset.py", line 62, in mask
if arg.dtype != bool:
AttributeError: 'NotImplementedType' object has no attribute 'dtype'
but seems legit to me.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#65>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABOypNefRA-YNbi7opG_GaheojtvzZQuks5uVV1ugaJpZM4WP0cP>
.
|
Isn't that what I did? The only think I've skipped is the |
I think "rank" is a keyword. Try this:
Python 3.6.2 (v3.6.2:5fd33b5, Jul 8 2017, 04:57:36) [MSC v.1900 64 bit
(AMD64)] on win32
from dfply import *
diamonds >> mutate(my_rank=min_rank(X.carat)) >> mask(X.my_rank < 10)
My version of dfply doesn't support filter_by(...), so I've used mask instead which is exactly equivalent.
p.s. I might be wrong, but it's usually better to include a complete reproducible example, including imports and Python version. Sometimes it's the simple things that can throw a spanner in the works.
…On Wed, Aug 29, 2018 at 9:49 AM Holger Brandl ***@***.***> wrote:
Isn't that what I did? The only think I've skipped is the from dfply
import * preamble, which I took for granted in here.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#65 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABOypCLIhH0H92aEaMJSGv1xPx4UK1S9ks5uVlVvgaJpZM4WP0cP>
.
|
I think it's rather a member function of I'll try to submit the next ticket in a more reproducible way. |
Glad its working. All the best!
…On Wed, 29 Aug 2018 14:59 Holger Brandl, ***@***.***> wrote:
I think it's rather a member function of pandas.DataFrame. But when
symbols are being resolved internally by dfply, I'd expect variables to
have precedence.
I'll try to submit the next ticket in a more reproducible way.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#65 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABOypMVpBo4C7vx4fle9ybaPr4pDrNJfks5uVp4HgaJpZM4WP0cP>
.
|
Could you please close this issue? Thanks! |
But the problem is not solved at all?! It also affects dozens of other names with happen to be used by pandas. For sure @kieferk if you think it's not worth fixing or too hard, feel free to do so. |
Perhaps have a more meaningful error message?
Something like "Cannot use 'rank' as a variable name as this is a reserved
word.".
…On Sun, 2 Sep 2018 14:06 Holger Brandl, ***@***.***> wrote:
But the problem is not solved at all?! It also affects dozens of other
names with happen to be used by pandas. rank was just an example. See
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html
for a complete listing.
For sure @kieferk <https://github.com/kieferk> if you think it's not
worth fixing or too hard, feel free to do so.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#65 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABOypFFmpqvj1NGLOdYyFGedul7AQtq5ks5uW9ekgaJpZM4WP0cP>
.
|
This, or giving column names priority over pandas functions when resolving |
I'm open to fixing this if possible, but it's tricky. The The ugly way to deal with this would be to do a check on the context object before it's sent to the function and have special logic in place to "override" the pandas behavior. To be honest I'm not really keen on doing that. Pandas would expect you to access your variable by string name in the case that it duplicates a built-in function, and so I'd advise you to do the same. For example: from dfply import *
diamonds >> head()
carat cut color clarity depth table price x y z
0 0.23 Ideal E SI2 61.5 55.0 326 3.95 3.98 2.43
1 0.21 Premium E SI1 59.8 61.0 326 3.89 3.84 2.31
2 0.23 Good E VS1 56.9 65.0 327 4.05 4.07 2.31
3 0.29 Premium I VS2 62.4 58.0 334 4.20 4.23 2.63
4 0.31 Good J SI2 63.3 58.0 335 4.34 4.35 2.75
diamonds >> select(X['cut']) >> head()
cut
0 Ideal
1 Premium
2 Good
3 Premium
4 Good In your case of course you would have |
Perhaps just give a meaningful error in this case? |
@kieferk thanks for the details. I did not know about the Thanks both of you for your help. |
Consider the following example:
This fails with
but seems legit to me.
The text was updated successfully, but these errors were encountered: