.ne fails with abiguous message if comparing a list of columns containing column name 'dtype' #22383

MBlistein · 2018-08-16T09:22:06Z

Code Sample, a copy-pastable example if possible

df = pd.DataFrame([[0,1,2,'aa'],[0,1,2,'aa'],[0,1,5,'bb'],[0,1,5,'bb'],[0,1,5,'bb'],['cc',4,4,4]], columns=['a','b','c','dtype'])

df.loc[:, ['a', 'dtype']].ne(df.loc[:, ['a', 'dtype']])

In [10]: df
Out[10]: 
    a  b  c dtype
0   0  1  2    aa
1   0  1  2    aa
2   0  1  5    bb
3   0  1  5    bb
4   0  1  5    bb
5  cc  4  4     4

Problem description

Instead of the expected output, I receive:

In [8]: df.loc[:, ['a','dtype']].ne(df.loc[:, ['a', 'dtype']])
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-8-a0710f18822f> in <module>()
----> 1 df.loc[:, ['a','dtype']].ne(df.loc[:, ['a', 'dtype']])

/home/blistein/env/aan/local/lib/python2.7/site-packages/pandas/core/ops.pyc in f(self, other, axis, level)
   1588                 self, other = self.align(other, 'outer',
   1589                                          level=level, copy=False)
-> 1590             return self._compare_frame(other, na_op, str_rep)
   1591 
   1592         elif isinstance(other, ABCSeries):

/home/blistein/env/aan/local/lib/python2.7/site-packages/pandas/core/frame.pyc in _compare_frame(self, other, func, str_rep)
   4790                 return {col: func(a[col], b[col]) for col in a.columns}
   4791 
-> 4792             new_data = expressions.evaluate(_compare, str_rep, self, other)
   4793             return self._constructor(data=new_data, index=self.index,
   4794                                      columns=self.columns, copy=False)

/home/blistein/env/aan/local/lib/python2.7/site-packages/pandas/core/computation/expressions.pyc in evaluate(op, op_str, a, b, use_numexpr, **eval_kwargs)
    201         """
    202 
--> 203     use_numexpr = use_numexpr and _bool_arith_check(op_str, a, b)
    204     if use_numexpr:
    205         return _evaluate(op, op_str, a, b, **eval_kwargs)

/home/blistein/env/aan/local/lib/python2.7/site-packages/pandas/core/computation/expressions.pyc in _bool_arith_check(op_str, a, b, not_allowed, unsupported)
    173         unsupported = {'+': '|', '*': '&', '-': '^'}
    174 
--> 175     if _has_bool_dtype(a) and _has_bool_dtype(b):
    176         if op_str in unsupported:
    177             warnings.warn("evaluating in Python space because the {op!r} "

/home/blistein/env/aan/local/lib/python2.7/site-packages/pandas/core/generic.pyc in __nonzero__(self)
   1574         raise ValueError("The truth value of a {0} is ambiguous. "
   1575                          "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
-> 1576                          .format(self.__class__.__name__))
   1577 
   1578     __bool__ = __nonzero__

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Expected Output

       a      c
0  False  False
1  False  False
2  False  False
3  False  False
4  False  False
5  False  False

Alternative: a descriptive Error Message, telling me I can't use 'dtype' as column name.

Output of `pd.show_versions()`

[paste the output of `pd.show_versions()` here below this line]
INSTALLED VERSIONS

commit: None
python: 2.7.12.final.0
python-bits: 64
OS: Linux
OS-release: 4.15.0-29-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: en_US.utf8
LANG: en_US.UTF-8
LOCALE: None.None

pandas: 0.23.4
pytest: 2.8.7
pip: 18.0
setuptools: 40.0.0
Cython: None
numpy: 1.11.0
scipy: 0.17.0
pyarrow: None
xarray: None
IPython: 5.5.0
sphinx: None
patsy: 0.4.1
dateutil: 2.7.3
pytz: 2014.10
blosc: None
bottleneck: None
tables: 3.2.2
numexpr: 2.4.3
feather: None
matplotlib: 1.5.1
openpyxl: 2.3.0
xlrd: 0.9.4
xlwt: 0.7.5
xlsxwriter: None
lxml: 3.5.0
bs4: None
html5lib: 1.0.1
sqlalchemy: None
pymysql: 0.7.2.None
psycopg2: 2.6.1 (dt dec mx pq3 ext lo64)
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

WillAyd · 2018-08-17T02:44:19Z

Thanks for the report - investigation and PRs are always welcome

baidoosik · 2018-08-17T06:46:32Z

@WillAyd could i try to solve this issue?

baidoosik · 2018-08-17T08:02:25Z

@WillAyd
i want to discuss some points.

this issue occurs when column name is dtype.

make exception(it is not possible to make column's name dtype) when make a dataframe.
make exception in ne function.
ex) if dataframe's columname == 'dtype": ~~

which method is better ? i wait your response.

thank you!

WillAyd · 2018-08-17T08:50:27Z

Can’t say for sure what the right solution is but I would not do the first option you mention. If you want to try your hand at a PR feel free to find the root cause and just push up a change with tests - the core team can review from there

MBlistein · 2018-08-17T10:50:02Z

Hey, thanks for the responses. I am currently overloaded with work. Am goint to keep following here though and, if not solved, come back to it at some point!

TomAugspurger · 2018-08-17T11:50:15Z

The user's code from the original issue should work.

I think _has_bool_dtype in pandsa/core/expressions.py will need to be updated to work with dataframes that have a column named dtype

baidoosik · 2018-08-18T10:39:32Z

@WillAyd thank you your reply! i will soon open the pull request!

baidoosik · 2018-08-18T10:55:21Z

@TomAugspurger hi, i fix _has_bool_dtype function. but i don't know which test directory is right directory. this is my first try to contribute open source...could you tell me which directory is more appropriate?
thank you.

TomAugspurger · 2018-08-18T14:09:01Z

Maybe pandas/tests/test_expressions.py

#22383 (#22416)

pandas-dev#22383 (pandas-dev#22416)

WillAyd added Bug Indexing Related to indexing on series/frames, not to indexes themselves labels Aug 17, 2018

baidoosik mentioned this issue Aug 19, 2018

.ne fails if comparing a list of columns containing column name 'dtype' #22383 #22416

Merged

4 tasks

jreback added this to the 0.24.0 milestone Aug 23, 2018

jreback modified the milestones: 0.24.0, Contributions Welcome Dec 2, 2018

jreback modified the milestones: Contributions Welcome, 0.24.0 Dec 30, 2018

jreback closed this as completed in #22416 Dec 31, 2018

jreback pushed a commit that referenced this issue Dec 31, 2018

.ne fails if comparing a list of columns containing column name 'dtype'

cae4616

#22383 (#22416)

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this issue Feb 28, 2019

.ne fails if comparing a list of columns containing column name 'dtype'

ee0a7f1

pandas-dev#22383 (pandas-dev#22416)

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this issue Feb 28, 2019

.ne fails if comparing a list of columns containing column name 'dtype'

d232832

pandas-dev#22383 (pandas-dev#22416)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.ne fails with abiguous message if comparing a list of columns containing column name 'dtype' #22383

.ne fails with abiguous message if comparing a list of columns containing column name 'dtype' #22383

MBlistein commented Aug 16, 2018 •

edited

Loading

[paste the output of `pd.show_versions()` here below this line]
INSTALLED VERSIONS

WillAyd commented Aug 17, 2018

baidoosik commented Aug 17, 2018

baidoosik commented Aug 17, 2018

WillAyd commented Aug 17, 2018

MBlistein commented Aug 17, 2018

TomAugspurger commented Aug 17, 2018

baidoosik commented Aug 18, 2018

baidoosik commented Aug 18, 2018

TomAugspurger commented Aug 18, 2018

.ne fails with abiguous message if comparing a list of columns containing column name 'dtype' #22383

.ne fails with abiguous message if comparing a list of columns containing column name 'dtype' #22383

Comments

MBlistein commented Aug 16, 2018 • edited Loading

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line] INSTALLED VERSIONS

WillAyd commented Aug 17, 2018

baidoosik commented Aug 17, 2018

baidoosik commented Aug 17, 2018

WillAyd commented Aug 17, 2018

MBlistein commented Aug 17, 2018

TomAugspurger commented Aug 17, 2018

baidoosik commented Aug 18, 2018

baidoosik commented Aug 18, 2018

TomAugspurger commented Aug 18, 2018

MBlistein commented Aug 16, 2018 •

edited

Loading

Output of `pd.show_versions()`

[paste the output of `pd.show_versions()` here below this line]
INSTALLED VERSIONS