Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rolling(axis='columns').count() ignores axis= keyword #13503

Closed
mrocklin opened this issue Jun 23, 2016 · 7 comments

Comments

@mrocklin
Copy link
Contributor

commented Jun 23, 2016

addtl example on #13753

Code Sample, a copy-pastable example if possible

In [1]: import pandas as pd

In [2]: df = pd.DataFrame({'x': range(5), 'y': range(5)})

In [3]: df
Out[3]: 
   x  y
0  0  0
1  1  1
2  2  2
3  3  3
4  4  4

In [4]: df.rolling(2, axis='columns').sum()  # meets expectations
Out[4]: 
    x    y
0 NaN  0.0
1 NaN  2.0
2 NaN  4.0
3 NaN  6.0
4 NaN  8.0

In [5]: df.rolling(2, axis='columns').count()  # appears to be the same as axis='rows'
Out[5]: 
     x    y
0  1.0  1.0
1  2.0  2.0
2  2.0  2.0
3  2.0  2.0
4  2.0  2.0

In [6]: df.rolling(2, axis='rows').count()  # yeah, exactly the same
Out[6]: 
     x    y
0  1.0  1.0
1  2.0  2.0
2  2.0  2.0
3  2.0  2.0
4  2.0  2.0

output of pd.show_versions()

In [7]: pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.5.1.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-24-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.18.1
nose: 1.3.7
pip: 8.1.1
setuptools: 20.3
Cython: 0.23.4
numpy: 1.10.4
scipy: 0.17.0
statsmodels: 0.6.1
xarray: None
IPython: 4.1.2
sphinx: 1.3.5
patsy: 0.4.0
dateutil: 2.5.1
pytz: 2016.2
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.5
matplotlib: 1.5.1
openpyxl: 2.3.2
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.8.4
lxml: 3.6.0
bs4: 4.4.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.12
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.39.0
pandas_datareader: None

Arose from tests in dask/dask#1280

@jreback

This comment has been minimized.

Copy link
Contributor

commented Jun 23, 2016

yes .count() is one of the few methods that is implemented in a special way so it takes a different path and is not using the .axis arg :<

@jreback jreback added this to the Next Major Release milestone Jun 23, 2016

@mrocklin

This comment has been minimized.

Copy link
Contributor Author

commented Jun 23, 2016

Should I consider this a bug or is count(axis=) not supported?

@jreback

This comment has been minimized.

Copy link
Contributor

commented Jun 23, 2016

no, its a bug.

@jorisvandenbossche

This comment has been minimized.

Copy link
Member

commented Jun 24, 2016

Actually, using axis=1 seems broken for the other methods (which use _apply) as well if you have different dtypes:

In [49]: df = pd.DataFrame({'x': range(5), 'y': np.arange(0, 1, 0.2)})

In [50]: df
Out[50]:
   x    y
0  0  0.0
1  1  0.2
2  2  0.4
3  3  0.6
4  4  0.8

In [51]: df.rolling(2, axis=1).sum()
Out[51]:
    x   y
0 NaN NaN
1 NaN NaN
2 NaN NaN
3 NaN NaN
4 NaN NaN

Because the _apply does this for each block of uniform dtype, which is fine for axis=0, but not for axis=1.

@yhaque1213

This comment has been minimized.

Copy link
Contributor

commented Apr 6, 2019

I'm interested in helping out with this! Can I work on it?

@yhaque1213 yhaque1213 referenced this issue Apr 11, 2019

Merged

BUG: rolling.count with axis=1 #26055

3 of 4 tasks complete

@jreback jreback modified the milestones: Contributions Welcome, 0.25.0 Apr 16, 2019

@yhaque1213

This comment has been minimized.

Copy link
Contributor

commented Apr 17, 2019

@jorisvandenbossche

Actually, using axis=1 seems broken for the other methods (which use _apply) as well if you have different dtypes:

In [49]: df = pd.DataFrame({'x': range(5), 'y': np.arange(0, 1, 0.2)})

In [50]: df
Out[50]:
   x    y
0  0  0.0
1  1  0.2
2  2  0.4
3  3  0.6
4  4  0.8

In [51]: df.rolling(2, axis=1).sum()
Out[51]:
    x   y
0 NaN NaN
1 NaN NaN
2 NaN NaN
3 NaN NaN
4 NaN NaN

Because the _apply does this for each block of uniform dtype, which is fine for axis=0, but not for axis=1.

I think that this issue is not an issue with the axis, it is an issue with the way that the blocks are being created in the call to _to_dict_of_blocks() in generic.py, blocks are created by homogeneous data types. This splits the original DataFrame by type, causing the sum() problem later on.

If the DataFrame is of a uniform type, then it will create one block that includes both columns.
If the DataFrame’s columns are of different types, then it creates two separate blocks, one for each column. When the rolling sum is calculated for each block, then the prepended NaN's take over, resulting in this error.

It seems like this should be filed under a different issue, since it does not have anything to do with axis.

@jreback

This comment has been minimized.

Copy link
Contributor

commented Apr 17, 2019

axis=1 generally is not implemented at all as unless you have a single dtype you have mixed types across columns

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.