Error with pivot_table and categorical data when add dropna args in version 0.23 #21133

diegogarcilazo · 2018-05-19T14:33:16Z

pd.__version__
'0.23.0'

labels = ["< 1 mes","1 a 11 meses","1 a 5 años","6 a 11 años","12 a 17 años","18 a + años","alli"]

df = pd.DataFrame(
    {'a': pd.cut(range(0,50), bins = [0,5,10,20,30,40,50,60], labels = labels, right = True),
     'b':(' '.join(['One'] * 20) + ' ' +' '.join(['Two'] * 30)).split(), 'c': range(10,60)})

df.pivot_table(index='a', aggfunc = 'count', fill_value = 0\
            , margins = True, margins_name = 'Total', values = 'c',dropna = False)

output
               c
a               
< 1 mes        5
1 a 11 meses   5
1 a 5 años    10
6 a 11 años   10
12 a 17 años  10
18 a + años    9
alli           0
Total         50


df.pivot_table(index='a', aggfunc = 'count', fill_value = 0\
            , margins = True, margins_name = 'Total', values = 'c',dropna = True)

output

               c
a               
NaN            5
< 1 mes        5
1 a 11 meses  10
1 a 5 años    10
6 a 11 años   10
12 a 17 años   9
18 a + años    0
Total         49

pd.__version__
'0.22.0'

df.pivot_table(index='a', aggfunc = 'count', fill_value = 0\
            , margins = True, margins_name = 'Total', values = 'c',dropna = False)

               c
a               
< 1 mes        5
1 a 11 meses   5
1 a 5 años    10
6 a 11 años   10
12 a 17 años  10
18 a + años    9
alli           0
Total         50

df.pivot_table(index='a', aggfunc = 'count', fill_value = 0\
            , margins = True, margins_name = 'Total', values = 'c',dropna = True)

               c
a               
< 1 mes        5
1 a 11 meses   5
1 a 5 años    10
6 a 11 años   10
12 a 17 años  10
18 a + años    9
alli           0
Total         49

When entering the argument dropna = False with the function pivot_table the value and order of the categories is correct. But when dropna = True both the order and the value of the categories do not perform as expected, add a NaN category and drop last category. With other dataset w/o NaN values dropna = True causes categories lose coherence with the values.

The same goes for the pd.crosstab () function

INSTALLED VERSIONS ------------------ commit: None python: 3.6.5.final.0 python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 58 Stepping 9, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: Spanish_Argentina.1252

pandas: 0.23.0
pytest: 3.5.1
pip: 10.0.1
setuptools: 39.1.0
Cython: 0.28.2
numpy: 1.14.2
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 6.4.0
sphinx: 1.7.4
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.4
blosc: None
bottleneck: 1.2.1
tables: 3.4.3
numexpr: 2.6.5
feather: 0.3.1
matplotlib: 2.2.2
openpyxl: 2.5.3
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.4
lxml: 4.2.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.7
pymysql: None
psycopg2: 2.7.1 (dt dec pq3 ext lo64)
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

gfyoung · 2018-05-21T06:57:09Z

@diegogarcilazo : Thanks for reporting this! Did this behavior work in a previous version of pandas?

jreback · 2018-05-21T10:29:36Z

@diegogarcilazo pls edit the top to show the output

diegogarcilazo · 2018-05-21T13:20:40Z

Thanks you for pay attention. @jreback I just added the outputs. @gfyoung In version 0.22 this behavior does not appear I just added the output with 0.22 version too.

jorisvandenbossche · 2018-05-24T19:51:46Z

@diegogarcilazo Thanks for the example. I made a simplified version that also illustrates the regression:

In [59]: df = pd.DataFrame({'A': pd.Categorical([np.nan, 'low', 'high', 'low', 'high'], categories=['low', 'high'], ordered=True), 
    ...:                    'B': range(5)})

In [60]: df.pivot_table(index='A', values='B', dropna=False)
Out[60]: 
      B
A      
low   2
high  3

In [61]: df.pivot_table(index='A', values='B', dropna=True)
Out[61]: 
        B
A        
NaN   2.0
low   3.0
high  NaN

For that last one, the correct output on 0.22:

In [23]: df.pivot_table(index='A', values='B', dropna=True)
Out[23]: 
      B
A      
low   2
high  3

closes pandas-dev#21133

diegogarcilazo changed the title ~~Pandas error with pivot_table and categorical data when add dropna args in version 0.23~~ Error with pivot_table and categorical data when add dropna args in version 0.23 May 19, 2018

gfyoung added Reshaping Concat, Merge/Join, Stack/Unstack, Explode Categorical Categorical Data Type labels May 21, 2018

diegogarcilazo mentioned this issue May 23, 2018

BUG: Wrong grouping of categoricals when observed=True #21151

Closed

jorisvandenbossche added the Regression Functionality that used to work in a prior pandas version label May 24, 2018

jorisvandenbossche added this to the 0.23.1 milestone May 24, 2018

jreback self-assigned this May 25, 2018

jreback added a commit to jreback/pandas that referenced this issue May 29, 2018

BUG: dropna incorrect with categoricals in pivot_table

b89e9cf

closes pandas-dev#21133

jreback mentioned this issue May 29, 2018

BUG: dropna incorrect with categoricals in pivot_table #21252

Merged

jreback added a commit to jreback/pandas that referenced this issue May 31, 2018

BUG: dropna incorrect with categoricals in pivot_table

1745fb5

closes pandas-dev#21133

jreback added a commit to jreback/pandas that referenced this issue Jun 7, 2018

BUG: dropna incorrect with categoricals in pivot_table

985013b

closes pandas-dev#21133

jorisvandenbossche closed this as completed in #21252 Jun 7, 2018

jschendel mentioned this issue Jun 21, 2018

crosstab gives wrong result if a categorical Series contains NaNs #21565

Closed

endremborza mentioned this issue Sep 20, 2019

Pivot nans fix #28540

Closed

5 tasks

rhshadrach mentioned this issue Jun 4, 2023

DEPR: Some dropna behaviors in DataFrame.pivot_table #53521

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error with pivot_table and categorical data when add dropna args in version 0.23 #21133

Error with pivot_table and categorical data when add dropna args in version 0.23 #21133

diegogarcilazo commented May 19, 2018 •

edited

Loading

gfyoung commented May 21, 2018

jreback commented May 21, 2018

diegogarcilazo commented May 21, 2018

jorisvandenbossche commented May 24, 2018

Error with pivot_table and categorical data when add dropna args in version 0.23 #21133

Error with pivot_table and categorical data when add dropna args in version 0.23 #21133

Comments

diegogarcilazo commented May 19, 2018 • edited Loading

gfyoung commented May 21, 2018

jreback commented May 21, 2018

diegogarcilazo commented May 21, 2018

jorisvandenbossche commented May 24, 2018

diegogarcilazo commented May 19, 2018 •

edited

Loading