Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.ne fails with abiguous message if comparing a list of columns containing column name 'dtype' #22383

Closed
MBlistein opened this issue Aug 16, 2018 · 9 comments · Fixed by #22416
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves
Milestone

Comments

@MBlistein
Copy link

MBlistein commented Aug 16, 2018

Code Sample, a copy-pastable example if possible

df = pd.DataFrame([[0,1,2,'aa'],[0,1,2,'aa'],[0,1,5,'bb'],[0,1,5,'bb'],[0,1,5,'bb'],['cc',4,4,4]], columns=['a','b','c','dtype'])

df.loc[:, ['a', 'dtype']].ne(df.loc[:, ['a', 'dtype']])

In [10]: df
Out[10]: 
    a  b  c dtype
0   0  1  2    aa
1   0  1  2    aa
2   0  1  5    bb
3   0  1  5    bb
4   0  1  5    bb
5  cc  4  4     4

Problem description

Instead of the expected output, I receive:

In [8]: df.loc[:, ['a','dtype']].ne(df.loc[:, ['a', 'dtype']])
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-8-a0710f18822f> in <module>()
----> 1 df.loc[:, ['a','dtype']].ne(df.loc[:, ['a', 'dtype']])

/home/blistein/env/aan/local/lib/python2.7/site-packages/pandas/core/ops.pyc in f(self, other, axis, level)
   1588                 self, other = self.align(other, 'outer',
   1589                                          level=level, copy=False)
-> 1590             return self._compare_frame(other, na_op, str_rep)
   1591 
   1592         elif isinstance(other, ABCSeries):

/home/blistein/env/aan/local/lib/python2.7/site-packages/pandas/core/frame.pyc in _compare_frame(self, other, func, str_rep)
   4790                 return {col: func(a[col], b[col]) for col in a.columns}
   4791 
-> 4792             new_data = expressions.evaluate(_compare, str_rep, self, other)
   4793             return self._constructor(data=new_data, index=self.index,
   4794                                      columns=self.columns, copy=False)

/home/blistein/env/aan/local/lib/python2.7/site-packages/pandas/core/computation/expressions.pyc in evaluate(op, op_str, a, b, use_numexpr, **eval_kwargs)
    201         """
    202 
--> 203     use_numexpr = use_numexpr and _bool_arith_check(op_str, a, b)
    204     if use_numexpr:
    205         return _evaluate(op, op_str, a, b, **eval_kwargs)

/home/blistein/env/aan/local/lib/python2.7/site-packages/pandas/core/computation/expressions.pyc in _bool_arith_check(op_str, a, b, not_allowed, unsupported)
    173         unsupported = {'+': '|', '*': '&', '-': '^'}
    174 
--> 175     if _has_bool_dtype(a) and _has_bool_dtype(b):
    176         if op_str in unsupported:
    177             warnings.warn("evaluating in Python space because the {op!r} "

/home/blistein/env/aan/local/lib/python2.7/site-packages/pandas/core/generic.pyc in __nonzero__(self)
   1574         raise ValueError("The truth value of a {0} is ambiguous. "
   1575                          "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
-> 1576                          .format(self.__class__.__name__))
   1577 
   1578     __bool__ = __nonzero__

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Expected Output

       a      c
0  False  False
1  False  False
2  False  False
3  False  False
4  False  False
5  False  False

Alternative: a descriptive Error Message, telling me I can't use 'dtype' as column name.

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line]
INSTALLED VERSIONS

commit: None
python: 2.7.12.final.0
python-bits: 64
OS: Linux
OS-release: 4.15.0-29-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: en_US.utf8
LANG: en_US.UTF-8
LOCALE: None.None

pandas: 0.23.4
pytest: 2.8.7
pip: 18.0
setuptools: 40.0.0
Cython: None
numpy: 1.11.0
scipy: 0.17.0
pyarrow: None
xarray: None
IPython: 5.5.0
sphinx: None
patsy: 0.4.1
dateutil: 2.7.3
pytz: 2014.10
blosc: None
bottleneck: None
tables: 3.2.2
numexpr: 2.4.3
feather: None
matplotlib: 1.5.1
openpyxl: 2.3.0
xlrd: 0.9.4
xlwt: 0.7.5
xlsxwriter: None
lxml: 3.5.0
bs4: None
html5lib: 1.0.1
sqlalchemy: None
pymysql: 0.7.2.None
psycopg2: 2.6.1 (dt dec mx pq3 ext lo64)
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@WillAyd WillAyd added Bug Indexing Related to indexing on series/frames, not to indexes themselves labels Aug 17, 2018
@WillAyd
Copy link
Member

WillAyd commented Aug 17, 2018

Thanks for the report - investigation and PRs are always welcome

@baidoosik
Copy link
Contributor

@WillAyd could i try to solve this issue?

@baidoosik
Copy link
Contributor

@WillAyd
i want to discuss some points.

this issue occurs when column name is dtype.

  1. make exception(it is not possible to make column's name dtype) when make a dataframe.

  2. make exception in ne function.
    ex) if dataframe's columname == 'dtype": ~~

which method is better ? i wait your response.

thank you!

@WillAyd
Copy link
Member

WillAyd commented Aug 17, 2018

Can’t say for sure what the right solution is but I would not do the first option you mention. If you want to try your hand at a PR feel free to find the root cause and just push up a change with tests - the core team can review from there

@MBlistein
Copy link
Author

Hey, thanks for the responses. I am currently overloaded with work. Am goint to keep following here though and, if not solved, come back to it at some point!

@TomAugspurger
Copy link
Contributor

The user's code from the original issue should work.

I think _has_bool_dtype in pandsa/core/expressions.py will need to be updated to work with dataframes that have a column named dtype

@baidoosik
Copy link
Contributor

@WillAyd thank you your reply! i will soon open the pull request!

@baidoosik
Copy link
Contributor

@TomAugspurger hi, i fix _has_bool_dtype function. but i don't know which test directory is right directory. this is my first try to contribute open source...could you tell me which directory is more appropriate?
thank you.

@TomAugspurger
Copy link
Contributor

Maybe pandas/tests/test_expressions.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants