Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH/API: Change query/eval local variable API #5987

Closed
cpcloud opened this issue Jan 17, 2014 · 10 comments · Fixed by #6366
Closed

ENH/API: Change query/eval local variable API #5987

cpcloud opened this issue Jan 17, 2014 · 10 comments · Fixed by #6366
Assignees
Milestone

Comments

@cpcloud
Copy link
Member

cpcloud commented Jan 17, 2014

Currently, with query and eval you can use local variables a la the @ symbol. It's a bit confusing since you're not allowed to have a local variable and a column name with the same name, but it will try to pull the local if possible.

Current API:

Fails with a NameError:

a = 1
df  = DataFrame({'a': randn(10), 'b': randn(10)})
df.query('a > b')

But this works:

df.query('@a > b')

And so does this, which is confusing:

a = 1
df = DataFrame({'b': randn(10), 'c': randn(10)})
df.query('a < b < c')

As suggested by @y-p and @jreback, the following API is less confusing IMO.

From now on, all local variables will need an explicit reference and if there is a column name and a local with the same name then the column will be used. Thus you can always be sure that you're referring to a column, or it doesn't exist, in which case you'll get an error. And if you use @ then you can be sure that you're referring to local, and likewise get an error if it doesn't exist. As a bonus ( 🐺 in 🐑 's clothing), this allows you to use both a local and a column name with the same name.

Examples:

a = 1
df = DataFrame({'a': randn(10), 'b': randn(10)})

# uses the column 'a'
df.query('a > b')

# uses the local
df.query('@a > b')

# fails because I didn't reference the local and there's no 'c' column
c = 1
df.query('a > c')

# local and a column name
df.query('b < @a < a')
@ghost ghost assigned cpcloud Jan 17, 2014
@ghost
Copy link

ghost commented Jan 17, 2014

+1 for adopting "say what you mean" as a guiding principle.

@jreback
Copy link
Contributor

jreback commented Jan 17, 2014

+1 for cool emoji!

@cpcloud
Copy link
Member Author

cpcloud commented Jan 20, 2014

Slight update here: We'll have to keep the current scheme because otherwise locals in pd.eval would have to also have an @ prefix (essentially all vars would need @). I'll submit the PR as soon as it passes travis so folks can take a look

@cpcloud
Copy link
Member Author

cpcloud commented Jan 20, 2014

That means that

c = 1
df.query('a > c')

will still work because first frame cols are searched then locals, then globals. If they weren't, then toplevel pd.eval would require all vars to have a @ prefix.

However, you'll get a warning with a new parameter conflict='warn' and an exception with conflict='error'

@cpcloud
Copy link
Member Author

cpcloud commented Jan 20, 2014

I kind of feel like adding a keyword for this might betray a poor design decision on my part and this might require some more deliberation and discussion.

@ghost
Copy link

ghost commented Jan 20, 2014

I agree re the conflict keyword, it's just throwing the problem at the user, not solving it.
If you think about the semantics a little differently, you could make the case that @ should
mean different things in df context and top level context.

Instead of saying @ means vars and no prefix means columns, think 'scope'.
no prefix means referncing local-scope, while referencing external-scope should
require an explicit @.

  • in df context, i.e. df.eval() the scope consists of the df and variables are external.
  • in top level context, i.e pd.eval(), locals are the scope and globals are external and should require @.
    You can (and should) modify this to mean variables in (local, global) resolution order are the scope
    and @ is an error that issues a warning to help orient users.

When you asked me to review PR a few months ago, I mentioned this ambiguity
problem as a smell, I still think it should go.

@cpcloud
Copy link
Member Author

cpcloud commented Jan 20, 2014

@y-p Agreed. Thanks.

@jreback
Copy link
Contributor

jreback commented Jan 29, 2014

doing this for 0.13.1?

@cpcloud
Copy link
Member Author

cpcloud commented Jan 29, 2014

Assuming that's in a few days...don't think so. This is going to take me a bit ... Just need to sit down with it for a while and hash it out in my head.

@jreback
Copy link
Contributor

jreback commented Jan 29, 2014

figured that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants