Can not resolve column names that are also functions in the environment #65

holgerbrandl · 2018-08-28T15:09:18Z

Consider the following example:

diamonds >> mutate(rank=min_rank(X.carat)) >> filter_by(X.rank <10)

This fails with

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/brandl/anaconda3/envs/scikit_playground/lib/python3.6/site-packages/dfply/base.py", line 142, in __rrshift__
    result = self.function(other_copy)
  File "/Users/brandl/anaconda3/envs/scikit_playground/lib/python3.6/site-packages/dfply/base.py", line 149, in <lambda>
    return pipe(lambda x: self.function(x, *args, **kwargs))
  File "/Users/brandl/anaconda3/envs/scikit_playground/lib/python3.6/site-packages/dfply/base.py", line 329, in __call__
    return self.function(*args, **kwargs)
  File "/Users/brandl/anaconda3/envs/scikit_playground/lib/python3.6/site-packages/dfply/base.py", line 282, in __call__
    return self.function(df, *args, **kwargs)
  File "/Users/brandl/anaconda3/envs/scikit_playground/lib/python3.6/site-packages/dfply/subset.py", line 62, in mask
    if arg.dtype != bool:
AttributeError: 'NotImplementedType' object has no attribute 'dtype'

but seems legit to me.

The text was updated successfully, but these errors were encountered:

sharpe5 · 2018-08-28T16:19:58Z

Can you reply with a complete reproducable example? In other words, a snippet of code I can cut'n'paste into Python to test.

…

On Tue, Aug 28, 2018 at 4:11 PM Holger Brandl ***@***.***> wrote: Consider the following example: diamonds >> mutate(rank=min_rank(X.carat)) >> filter_by(X.rank <10) This fails with Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/Users/brandl/anaconda3/envs/scikit_playground/lib/python3.6/site-packages/dfply/base.py", line 142, in __rrshift__ result = self.function(other_copy) File "/Users/brandl/anaconda3/envs/scikit_playground/lib/python3.6/site-packages/dfply/base.py", line 149, in <lambda> return pipe(lambda x: self.function(x, *args, **kwargs)) File "/Users/brandl/anaconda3/envs/scikit_playground/lib/python3.6/site-packages/dfply/base.py", line 329, in __call__ return self.function(*args, **kwargs) File "/Users/brandl/anaconda3/envs/scikit_playground/lib/python3.6/site-packages/dfply/base.py", line 282, in __call__ return self.function(df, *args, **kwargs) File "/Users/brandl/anaconda3/envs/scikit_playground/lib/python3.6/site-packages/dfply/subset.py", line 62, in mask if arg.dtype != bool: AttributeError: 'NotImplementedType' object has no attribute 'dtype' but seems legit to me. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#65>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABOypNefRA-YNbi7opG_GaheojtvzZQuks5uVV1ugaJpZM4WP0cP> .

holgerbrandl · 2018-08-29T08:48:45Z

Isn't that what I did? The only think I've skipped is the from dfply import * preamble, which I took for granted in here.

sharpe5 · 2018-08-29T13:15:19Z

I think "rank" is a keyword. Try this: Python 3.6.2 (v3.6.2:5fd33b5, Jul 8 2017, 04:57:36) [MSC v.1900 64 bit (AMD64)] on win32 from dfply import * diamonds >> mutate(my_rank=min_rank(X.carat)) >> mask(X.my_rank < 10) My version of dfply doesn't support filter_by(...), so I've used mask instead which is exactly equivalent. p.s. I might be wrong, but it's usually better to include a complete reproducible example, including imports and Python version. Sometimes it's the simple things that can throw a spanner in the works.

…

On Wed, Aug 29, 2018 at 9:49 AM Holger Brandl ***@***.***> wrote: Isn't that what I did? The only think I've skipped is the from dfply import * preamble, which I took for granted in here. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#65 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABOypCLIhH0H92aEaMJSGv1xPx4UK1S9ks5uVlVvgaJpZM4WP0cP> .

holgerbrandl · 2018-08-29T13:57:56Z

I think it's rather a member function of pandas.DataFrame. But when symbols are being resolved internally by dfply, I'd expect variables to have precedence.

I'll try to submit the next ticket in a more reproducible way.

sharpe5 · 2018-08-29T14:26:48Z

Glad its working. All the best!

…

On Wed, 29 Aug 2018 14:59 Holger Brandl, ***@***.***> wrote: I think it's rather a member function of pandas.DataFrame. But when symbols are being resolved internally by dfply, I'd expect variables to have precedence. I'll try to submit the next ticket in a more reproducible way. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#65 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABOypMVpBo4C7vx4fle9ybaPr4pDrNJfks5uVp4HgaJpZM4WP0cP> .

sharpe5 · 2018-09-01T07:51:56Z

Could you please close this issue? Thanks!

holgerbrandl · 2018-09-02T13:05:39Z

But the problem is not solved at all?! It also affects dozens of other names with happen to be used by pandas. rank was just an example. See https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html for a complete listing.

For sure @kieferk if you think it's not worth fixing or too hard, feel free to do so.

sharpe5 · 2018-09-03T07:13:50Z

Perhaps have a more meaningful error message? Something like "Cannot use 'rank' as a variable name as this is a reserved word.".

…

On Sun, 2 Sep 2018 14:06 Holger Brandl, ***@***.***> wrote: But the problem is not solved at all?! It also affects dozens of other names with happen to be used by pandas. rank was just an example. See https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html for a complete listing. For sure @kieferk <https://github.com/kieferk> if you think it's not worth fixing or too hard, feel free to do so. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#65 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABOypFFmpqvj1NGLOdYyFGedul7AQtq5ks5uW9ekgaJpZM4WP0cP> .

holgerbrandl · 2018-09-03T07:36:21Z

This, or giving column names priority over pandas functions when resolving X.foo. The latter seems more correct to me, but I haven't used dfply much yet.

kieferk · 2018-09-04T23:46:18Z

I'm open to fixing this if possible, but it's tricky. The X symbol is just a generic instance of the Intention class, and as such is at some point evaluated against a "context" object. If the context passed is a pandas DataFrame, which is typically the case, it will apply the function to that DataFrame. The function in this case would be the __getattr__ call for foo (or rank, or whatever it may be).

The ugly way to deal with this would be to do a check on the context object before it's sent to the function and have special logic in place to "override" the pandas behavior. To be honest I'm not really keen on doing that. Pandas would expect you to access your variable by string name in the case that it duplicates a built-in function, and so I'd advise you to do the same. For example:

from dfply import *

diamonds >> head()
   carat      cut color clarity  depth  table  price     x     y     z
0   0.23    Ideal     E     SI2   61.5   55.0    326  3.95  3.98  2.43
1   0.21  Premium     E     SI1   59.8   61.0    326  3.89  3.84  2.31
2   0.23     Good     E     VS1   56.9   65.0    327  4.05  4.07  2.31
3   0.29  Premium     I     VS2   62.4   58.0    334  4.20  4.23  2.63
4   0.31     Good     J     SI2   63.3   58.0    335  4.34  4.35  2.75

diamonds >> select(X['cut']) >> head()
       cut
0    Ideal
1  Premium
2     Good
3  Premium
4     Good

In your case of course you would have 'rank' instead of 'cut'.

sharpe5 · 2018-09-05T06:39:23Z

Perhaps just give a meaningful error in this case?

holgerbrandl · 2018-09-05T06:42:23Z

@kieferk thanks for the details. I did not know about the X['rank'] way of accessing the columns, which is a reasonable/readable way of doing it. I initially thought that it would not be possible to use names such rank for columns at all.

Thanks both of you for your help.

holgerbrandl closed this as completed Sep 5, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can not resolve column names that are also functions in the environment #65

Can not resolve column names that are also functions in the environment #65

holgerbrandl commented Aug 28, 2018

sharpe5 commented Aug 28, 2018 via email

holgerbrandl commented Aug 29, 2018

sharpe5 commented Aug 29, 2018 via email •

edited

holgerbrandl commented Aug 29, 2018

sharpe5 commented Aug 29, 2018 via email

sharpe5 commented Sep 1, 2018

holgerbrandl commented Sep 2, 2018

sharpe5 commented Sep 3, 2018 via email

holgerbrandl commented Sep 3, 2018

kieferk commented Sep 4, 2018

sharpe5 commented Sep 5, 2018

holgerbrandl commented Sep 5, 2018

Can not resolve column names that are also functions in the environment #65

Can not resolve column names that are also functions in the environment #65

Comments

holgerbrandl commented Aug 28, 2018

sharpe5 commented Aug 28, 2018 via email

holgerbrandl commented Aug 29, 2018

sharpe5 commented Aug 29, 2018 via email • edited

holgerbrandl commented Aug 29, 2018

sharpe5 commented Aug 29, 2018 via email

sharpe5 commented Sep 1, 2018

holgerbrandl commented Sep 2, 2018

sharpe5 commented Sep 3, 2018 via email

holgerbrandl commented Sep 3, 2018

kieferk commented Sep 4, 2018

sharpe5 commented Sep 5, 2018

holgerbrandl commented Sep 5, 2018

sharpe5 commented Aug 29, 2018 via email •

edited