Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error with pivot_table and categorical data when add dropna args in version 0.23 #21133

Closed
diegogarcilazo opened this issue May 19, 2018 · 4 comments · Fixed by #21252
Closed
Assignees
Labels
Categorical Categorical Data Type Regression Functionality that used to work in a prior pandas version Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Milestone

Comments

@diegogarcilazo
Copy link

diegogarcilazo commented May 19, 2018

pd.__version__
'0.23.0'

labels = ["< 1 mes","1 a 11 meses","1 a 5 años","6 a 11 años","12 a 17 años","18 a + años","alli"]

df = pd.DataFrame(
    {'a': pd.cut(range(0,50), bins = [0,5,10,20,30,40,50,60], labels = labels, right = True),
     'b':(' '.join(['One'] * 20) + ' ' +' '.join(['Two'] * 30)).split(), 'c': range(10,60)})

df.pivot_table(index='a', aggfunc = 'count', fill_value = 0\
            , margins = True, margins_name = 'Total', values = 'c',dropna = False)

output
               c
a               
< 1 mes        5
1 a 11 meses   5
1 a 5 años    10
6 a 11 años   10
12 a 17 años  10
18 a + años    9
alli           0
Total         50


df.pivot_table(index='a', aggfunc = 'count', fill_value = 0\
            , margins = True, margins_name = 'Total', values = 'c',dropna = True)

output

               c
a               
NaN            5
< 1 mes        5
1 a 11 meses  10
1 a 5 años    10
6 a 11 años   10
12 a 17 años   9
18 a + años    0
Total         49

pd.__version__
'0.22.0'

df.pivot_table(index='a', aggfunc = 'count', fill_value = 0\
            , margins = True, margins_name = 'Total', values = 'c',dropna = False)

               c
a               
< 1 mes        5
1 a 11 meses   5
1 a 5 años    10
6 a 11 años   10
12 a 17 años  10
18 a + años    9
alli           0
Total         50

df.pivot_table(index='a', aggfunc = 'count', fill_value = 0\
            , margins = True, margins_name = 'Total', values = 'c',dropna = True)

               c
a               
< 1 mes        5
1 a 11 meses   5
1 a 5 años    10
6 a 11 años   10
12 a 17 años  10
18 a + años    9
alli           0
Total         49

When entering the argument dropna = False with the function pivot_table the value and order of the categories is correct. But when dropna = True both the order and the value of the categories do not perform as expected, add a NaN category and drop last category. With other dataset w/o NaN values dropna = True causes categories lose coherence with the values.

The same goes for the pd.crosstab () function

INSTALLED VERSIONS ------------------ commit: None python: 3.6.5.final.0 python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 58 Stepping 9, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: Spanish_Argentina.1252

pandas: 0.23.0
pytest: 3.5.1
pip: 10.0.1
setuptools: 39.1.0
Cython: 0.28.2
numpy: 1.14.2
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 6.4.0
sphinx: 1.7.4
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.4
blosc: None
bottleneck: 1.2.1
tables: 3.4.3
numexpr: 2.6.5
feather: 0.3.1
matplotlib: 2.2.2
openpyxl: 2.5.3
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.4
lxml: 4.2.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.7
pymysql: None
psycopg2: 2.7.1 (dt dec pq3 ext lo64)
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@diegogarcilazo diegogarcilazo changed the title Pandas error with pivot_table and categorical data when add dropna args in version 0.23 Error with pivot_table and categorical data when add dropna args in version 0.23 May 19, 2018
@gfyoung gfyoung added Reshaping Concat, Merge/Join, Stack/Unstack, Explode Categorical Categorical Data Type labels May 21, 2018
@gfyoung
Copy link
Member

gfyoung commented May 21, 2018

@diegogarcilazo : Thanks for reporting this! Did this behavior work in a previous version of pandas?

@jreback
Copy link
Contributor

jreback commented May 21, 2018

@diegogarcilazo pls edit the top to show the output

@diegogarcilazo
Copy link
Author

Thanks you for pay attention. @jreback I just added the outputs. @gfyoung In version 0.22 this behavior does not appear I just added the output with 0.22 version too.

@jorisvandenbossche jorisvandenbossche added the Regression Functionality that used to work in a prior pandas version label May 24, 2018
@jorisvandenbossche jorisvandenbossche added this to the 0.23.1 milestone May 24, 2018
@jorisvandenbossche
Copy link
Member

@diegogarcilazo Thanks for the example. I made a simplified version that also illustrates the regression:

In [59]: df = pd.DataFrame({'A': pd.Categorical([np.nan, 'low', 'high', 'low', 'high'], categories=['low', 'high'], ordered=True), 
    ...:                    'B': range(5)})

In [60]: df.pivot_table(index='A', values='B', dropna=False)
Out[60]: 
      B
A      
low   2
high  3

In [61]: df.pivot_table(index='A', values='B', dropna=True)
Out[61]: 
        B
A        
NaN   2.0
low   3.0
high  NaN

For that last one, the correct output on 0.22:

In [23]: df.pivot_table(index='A', values='B', dropna=True)
Out[23]: 
      B
A      
low   2
high  3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Categorical Categorical Data Type Regression Functionality that used to work in a prior pandas version Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants