Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pandas any() returning false with true values present #23070

Closed
maanukuttan opened this issue Oct 10, 2018 · 3 comments

Comments

Projects
None yet
3 participants
@maanukuttan
Copy link

commented Oct 10, 2018

Code Sample

In [1]: from io import StringIO

In [2]: import pandas as pd

In [3]: data = StringIO("""issue_date,issue_date_dt
   ...: ,
   ...: ,
   ...: 19600215.0,1960-02-15
   ...: ,
   ...: ,""")

In [4]: df = pd.read_csv(data, parse_dates=[1])

In [5]: df
Out[5]:
   issue_date issue_date_dt
0         NaN           NaT
1         NaN           NaT
2  19600215.0    1960-02-15
3         NaN           NaT
4         NaN           NaT

In [6]: df.any(axis=0)
Out[6]:
issue_date       True
issue_date_dt    True
dtype: bool

In [7]: df.any(axis=1)
Out[7]:
0    False
1    False
2    False
3    False
4    False
dtype: bool

Problem description

df.any(axis=0) behaves as expected. It returns True for both the columns, but df.any(axis=1) returns False for all the rows.
Note: A question with similar issue can be found here

Note: If you use notnull then we are getting the required output

In [9]: df.notnull().any(1)
Out[9]:
0    False
1    False
2     True
3    False
4    False
dtype: bool

Expected Output

df.any(axis=1) should return True for those rows with True values.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 158 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: None.None

pandas: 0.23.4
pytest: None
pip: 10.0.1
setuptools: 39.2.0
Cython: None
numpy: 1.15.0
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 6.5.0
sphinx: None
patsy: None
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.2.3
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 1.0.1
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@TomAugspurger

This comment has been minimized.

Copy link
Contributor

commented Oct 10, 2018

Simpler repro:

In [21]: df = pd.DataFrame({"A": [1.0], "B": [pd.Timestamp('1960-02-15')]})
    ...:
    ...:
    ...:

In [22]: df.any(1)
Out[22]:
0    False
dtype: bool

@TomAugspurger TomAugspurger added this to the 0.24.0 milestone Oct 10, 2018

@TomAugspurger

This comment has been minimized.

Copy link
Contributor

commented Oct 10, 2018

Hmm, this seems wrong. We have mixed dtypes in

pandas/pandas/core/frame.py

Lines 7142 to 7143 in 362f2e2

if axis == 1 and self._is_mixed_type and self._is_datelike_mixed_type:
numeric_only = True

so we set numeric_only=True.

Then we go down to

pandas/pandas/core/frame.py

Lines 7196 to 7199 in 362f2e2

if filter_type is None or filter_type == 'numeric':
data = self._get_numeric_data()
elif filter_type == 'bool':
data = self._get_bool_data()
and our filter_type is bool, so we select boolean data only.

(Pdb) l
7202            else:
7203                if numeric_only:
7204                    if filter_type is None or filter_type == 'numeric':
7205                        data = self._get_numeric_data()
7206                    elif filter_type == 'bool':
7207 ->                     data = self._get_bool_data()
7208                    else:  # pragma: no cover
7209                        msg = ("Generating numeric_only data with filter_type {f}"
7210                               "not supported.".format(f=filter_type))
7211                        raise NotImplementedError(msg)
7212                    values = data.values
(Pdb) n
> /Users/taugspurger/sandbox/pandas/pandas/core/frame.py(7212)_reduce()
-> values = data.values
(Pdb) data
Empty DataFrame
Columns: []
Index: [0]

And since that's empty, the .any(1) will be false. Off the top of my head, I'm not sure what the fix is right now.

@TomAugspurger

This comment has been minimized.

Copy link
Contributor

commented Nov 6, 2018

Moving off of 0.24, but would certainly welcome a fix if anyone wants to work on this.

@jreback jreback modified the milestones: Contributions Welcome, 0.24.0 Dec 30, 2018

jreback added a commit that referenced this issue Dec 30, 2018

thoo added a commit to thoo/pandas that referenced this issue Dec 30, 2018

Merge remote-tracking branch 'upstream/master' into read_excel-docstring
* upstream/master:
  DOC: Fixing broken references in the docs (pandas-dev#24497)
  DOC: Splitting api.rst in several files (pandas-dev#24462)
  Fix misdescription in escapechar (pandas-dev#24490)
  Floor and ceil methods during pandas.eval which are provided by numexpr (pandas-dev#24355)
  BUG: Pandas any() returning false with true values present (GH pandas-dev#23070) (pandas-dev#24434)
  Misc separable pieces of pandas-dev#24024 (pandas-dev#24488)
  use capsys.readouterr() as named tuple (pandas-dev#24489)
  REF/TST: replace capture_stderr with pytest capsys fixture (pandas-dev#24496)
  TST- Fixing issue with test_parquet test unexpectedly passing (pandas-dev#24480)
  DOC: Doc build for a single doc made much faster, and clean up (pandas-dev#24428)
  BUG: Fix+test timezone-preservation in DTA.repeat (pandas-dev#24483)
  Implement reductions from pandas-dev#24024 (pandas-dev#24484)

thoo added a commit to thoo/pandas that referenced this issue Dec 30, 2018

Merge remote-tracking branch 'upstream/master' into excel-related-doc…
…strings

* upstream/master:
  TST: Skip db tests unless explicitly specified in -m pattern (pandas-dev#24492)
  Mix EA into DTA/TDA; part of 24024 (pandas-dev#24502)
  DOC: Fix building of a single API document (pandas-dev#24506)
  DOC: Fixing broken references in the docs (pandas-dev#24497)
  DOC: Splitting api.rst in several files (pandas-dev#24462)
  Fix misdescription in escapechar (pandas-dev#24490)
  Floor and ceil methods during pandas.eval which are provided by numexpr (pandas-dev#24355)
  BUG: Pandas any() returning false with true values present (GH pandas-dev#23070) (pandas-dev#24434)
  Misc separable pieces of pandas-dev#24024 (pandas-dev#24488)
  use capsys.readouterr() as named tuple (pandas-dev#24489)
  REF/TST: replace capture_stderr with pytest capsys fixture (pandas-dev#24496)
  TST- Fixing issue with test_parquet test unexpectedly passing (pandas-dev#24480)
  DOC: Doc build for a single doc made much faster, and clean up (pandas-dev#24428)
  BUG: Fix+test timezone-preservation in DTA.repeat (pandas-dev#24483)
  Implement reductions from pandas-dev#24024 (pandas-dev#24484)

devin-petersohn added a commit to devin-petersohn/pandas that referenced this issue Feb 3, 2019

Fixing regression in `DataFrame.all` and `DataFrame.any`
* `bool_only` parameter is supported again
* Commit 36ab8c9 created this regression
  due to a bug in all/any (pandas-dev#23070)
* Reverted the regression and fixed the bug with a condition
* Added tests for `bool_only` parameter
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.