Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dataframe.query() select columns #16226

Closed
dingo9 opened this issue May 4, 2017 · 3 comments
Closed

dataframe.query() select columns #16226

dingo9 opened this issue May 4, 2017 · 3 comments
Labels
Indexing Related to indexing on series/frames, not to indexes themselves Usage Question

Comments

@dingo9
Copy link

dingo9 commented May 4, 2017

In function query@pandas/core/frame.py

I found dataframe return eval result, and use self.loc to return new dataframe, and I curious about in which situation dataframe.loc will raise ValueError.

        inplace = validate_bool_kwarg(inplace, 'inplace')
        if not isinstance(expr, compat.string_types):
            msg = "expr must be a string to be evaluated, {0} given"
            raise ValueError(msg.format(type(expr)))
        kwargs['level'] = kwargs.pop('level', 0) + 1
        kwargs['target'] = None
        res = self.eval(expr, **kwargs)

        try:
            new_data = self.loc[res]
        except ValueError:
            # when res is multi-dimensional loc raises, but this is sometimes a
            # valid query
            new_data = self[res]

        if inplace:
            self._update_inplace(new_data)
        else:
            return new_data

Problem description

Because I use Dataframe.query() as data_model api, and I wanna select columns from this api, so I do some googling, found nothing. After read code of query, I found interesting fragment in function query. If loc raise Exception, select column instead. BUT, it catch ValueError, not KeyError.

Version

In current pandas version query@pandas/core/frame.py ( https://github.com/pandas-dev/pandas/commit/20fda2223d5121be3f8204702b5ce1e6037e5b18)
@jorisvandenbossche
Copy link
Member

@dingo9 Please provide a minimal, reproducible example (runnable code snippet) illustrating the problem you have.

@dingo9
Copy link
Author

dingo9 commented May 5, 2017

import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(6,4),columns=list('ABCD'))
df.query("['A']")

I expect it return columns 'A' in dataframe, but it raise KeyError at

        try:
            new_data = self.loc[res]
        except ValueError:
            # when res is multi-dimensional loc raises, but this is sometimes a
            # valid query
            new_data = self[res]

In such fragment, if it raise ValueError, it will return columns 'A' as I expect.

@jreback
Copy link
Contributor

jreback commented May 5, 2017

pls read the docs: http://pandas.pydata.org/pandas-docs/stable/indexing.html#the-query-method-experimental

In [7]: df = pd.DataFrame(np.random.randn(6,4),columns=list('ABCD'))
   ...: df.query("A>0")

Out[7]: 
          A         B         C         D
5  1.105172  1.219918 -1.555644  0.937555

In [8]: df.eval('A')
Out[8]: 
0   -1.534979
1   -0.328682
2   -0.181169
3   -1.026832
4   -0.984711
5    1.105172
dtype: float64

@jreback jreback closed this as completed May 5, 2017
@jreback jreback added Indexing Related to indexing on series/frames, not to indexes themselves Usage Question labels May 5, 2017
@jreback jreback added this to the No action milestone May 5, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Indexing Related to indexing on series/frames, not to indexes themselves Usage Question
Projects
None yet
Development

No branches or pull requests

3 participants