-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DataFrame.groupby().std() fails on filtered DataFrame #16174
Comments
we exclude non-numeric columns in aggregations. however, bool is valid for some.
so we could fix generally, by simply astyping bool columns (we already cast certain columns for computation anyhow), or could pull back and remove bool from numeric aggregations like sum/mean. |
sqrt and var can also make sense for booleans, but we seem to fail for when the column being aggregated has no variance. In [5]: pd.DataFrame({"A": [1, 1, 1], "B": [True, True, True], "C": [1, 1, 1]}).groupby("A").std()
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-5-375645985fa5> in <module>()
----> 1 pd.DataFrame({"A": [1, 1, 1], "B": [True, True, True], "C": [1, 1, 1]}).groupby("A").std()
/Users/taugspurger/Envs/pandas-dev/lib/python3.6/site-packages/pandas/pandas/core/groupby.py in std(self, ddof, *args, **kwargs)
1080 # TODO: implement at Cython level?
1081 nv.validate_groupby_func('std', args, kwargs)
-> 1082 return np.sqrt(self.var(ddof=ddof, **kwargs))
1083
1084 @Substitution(name='groupby')
AttributeError: 'bool' object has no attribute 'sqrt'
In [6]: pd.DataFrame({"A": [1, 1, 2], "B": [True, True, True], "C": [1, 1, 1]}).groupby("A").std()
Out[6]:
B C
A
1 0.0 0.0
2 NaN NaN
In [7]: pd.DataFrame({"A": [1, 1, 1], "B": [True, True, False], "C": [1, 1, 1]}).groupby("A").std()
Out[7]:
B C
A
1 0.57735 0.0 |
Really, the underlying issue is probably unrelated to groupby. In [45]: pd.DataFrame({"A": [1, 1, 1, 1], "B": [True, True, True, True], "C": [1, 1, 1, 2]}).groupby("A").var()
Out[45]:
B C
A
1 False 0.25 Should the In [46]: np.var([1, 1, 1, 1])
Out[46]: 0.0 |
Whoops, still had a groupby in there. My bad, so it is related to groupby. We do handle the regular case correctly. Still, that's the issue is that |
master is giving the expected output
could use a test. |
Code Sample, a copy-pastable example if possible
Problem description
Required elements for the error to appear are:
In my more-complicated real-world data where I ran into the error, I would also see an Exception complaining about type float:
However, even in that case, deleting the bool column would resolve the issue.
Presumably I'll be able to work around the issue by calling .std() on individual columns of the DataFrameGroupBy object, but it seems like pandas should be able to handle this case w/o choking.
Expected Output
Output of
pd.show_versions()
commit: None
python: 3.5.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.9.16-gentoo
machine: x86_64
processor: Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.19.1
nose: None
pip: 7.1.2
setuptools: 30.4.0
Cython: 0.25.1
numpy: 1.10.4
scipy: 0.16.1
statsmodels: 0.6.1
xarray: None
IPython: None
sphinx: None
patsy: 0.4.1
dateutil: 2.4.2
pytz: 2016.3
blosc: None
bottleneck: 1.0.0
tables: None
numexpr: 2.6.1
matplotlib: 1.5.3
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.5.3
html5lib: 0.9999999
httplib2: 0.9.2
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: 2.6.2 (dt dec pq3 ext lo64)
jinja2: 2.9.5
boto: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: