Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

String comparison in query() #6155

Closed
michaelbilow opened this issue Jan 29, 2014 · 10 comments · Fixed by #6158

Comments

@michaelbilow
Copy link

commented Jan 29, 2014

Hi, it seems that string comparisons aren't supported in query() yet, so maybe this isn't a bug yet. Anyway, hopefully this behavior will be fixed for future editions of pandas.

import pandas as pd
import numexpr as ne

a = list('abcdef')
b = range(6)
df = pd.DataFrame({'X':pd.Series(a),'Y': pd.Series(b)})

df_Y = df.query('Y < 3')             ## Works fine.
ne_works = ne.evaluate('"a" < "d"')  ## ne_works == np.array([True])
df_X = df.query('X < "d"')           ## RuntimeError: max recursion depth exceeded
@jreback

This comment has been minimized.

Copy link
Contributor

commented Jan 29, 2014

cc @cpcloud

IIRC this is not implemented ATM

@cpcloud

This comment has been minimized.

Copy link
Member

commented Jan 29, 2014

Yep only == and != are tested for strings.

@cpcloud

This comment has been minimized.

Copy link
Member

commented Jan 29, 2014

Let me take a look at why this happens and I'll clean up the error message or I'll implement it.

@cpcloud

This comment has been minimized.

Copy link
Member

commented Jan 29, 2014

Interesting ... it only fails on the numexpr engine ... yay for writing tests first!

@cpcloud

This comment has been minimized.

Copy link
Member

commented Jan 29, 2014

Ok this was simply a case of making those evaluate in Python space when object dtypes are being compared....

I think maybe a doc note about the fact that expressions that are evaluated on a case by case basis in terms of the types of their result is needed ... I could've sworn I wrote something about this, but maybe just in our conversations ....

@jreback

This comment has been minimized.

Copy link
Contributor

commented Jan 29, 2014

hahah...

@cpcloud

This comment has been minimized.

Copy link
Member

commented Jan 29, 2014

@chuyelchulo Couple of details that might interest you:

  • Expressions that would result in an object dtype (including simple variable evaluation) have to be evaluated in Python space. The main reason is for back compat with older versions of numpy that will truncate a string if you call astype(str) on them if they have elements with more than 60 chars. We can't pass object arrays to numexpr thus string comparisons are evaluated in Python space.
  • The upshot is that that only applies to strings. So, if you have an expression for example, that's a string comparison and-ed together with another boolean expression that's from a numeric comparison, the numeric comparison will be evaluated by numpexpr. In fact, in general, query/eval will "pick out" the subexpressions that are eval-able by numexpr and those that must be evaluated in Python space transparently to the user.
@cpcloud

This comment has been minimized.

Copy link
Member

commented Jan 29, 2014

I'm adapting that for the docs, PR coming shortly

@cpcloud

This comment has been minimized.

Copy link
Member

commented Jan 29, 2014

@chuyelchulo Thanks for reporting this.

@michaelbilow

This comment has been minimized.

Copy link
Author

commented Jan 29, 2014

Thanks, that was really interesting!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.