pivot_table not displaying values columns in expected order #17041

ericbhanson · 2017-07-20T19:14:16Z

Code Sample, a copy-pastable example if possible

# Your code here
column_order = ['Year', 'Month', 'Currency', 'Total Net', 'Fees']
df_fills = df_fills.reindex_axis(column_order, axis = 1)

pd.pivot_table(df_fills, values = ['Total Net', 'Fees'], index = ['Year', 'Month'], 
               columns = ['Currency'], aggfunc = np.sum, margins = True, fill_value = 0)

Problem description

When creating the dataframe, Fees column comes first (it's from an external data set), while Total Net comes second (it's calculated). I reordered them using reindex_axis and when asking Python to show the dataframe, I get the expected order. However, when creating a pivot table, Fees always comes first, no matter what.

Expected Output

pivot_table should display columns of values in the order entered in the function.

Output of `pd.show_versions()`

# Paste the output here pd.show_versions() here INSTALLED VERSIONS ------------------ commit: None python: 2.7.13.final.0 python-bits: 64 OS: Darwin OS-release: 16.7.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: None.None

pandas: 0.20.2
pytest: 3.0.7
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.0
xarray: None
IPython: 5.3.0
sphinx: 1.5.6
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.3.0
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: 2.4.7
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.7.3
bs4: 4.6.0
html5lib: 0.999
sqlalchemy: 1.1.9
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

gfyoung · 2017-07-20T19:54:22Z

@ericbhanson : Thanks for the issue! For easier evaluation, could you provide (in DataFrame form) the current and expected output from the code?

ericbhanson · 2017-07-20T20:01:08Z

@gfyoung happy to - are you looking for a simple copy and paste of what I'm seeing in Jupyter versus what I'm expecting to see?

gfyoung · 2017-07-20T20:04:03Z

Yes, that would work (also the code to construct your expected output would be good too)

ericbhanson · 2017-07-20T20:26:03Z

@gfyoung

Turns out to be a little more difficult than I expected 😄. Since Git doesn't do pivot tables, and the data is sensitive information, I took a screen shot of the top of the pivot table.

Here's the code, with the source of the data removed.

import pandas as pd
import numpy as np

pd.options.display.float_format = '${:,.2f}'.format

df_fills = pd.DataFrame()

for fills in # [API data source] #:
    for fill in fills:
        df_fills = df_fills.append(pd.Series(fill), ignore_index = True)

df_fills.rename(columns = {'product_id' : 'Currency', 'fee' : 'Fees'}, inplace = True)
df_fills['Year'] = pd.DatetimeIndex(df_fills['created_at']).year
df_fills['Month'] = pd.DatetimeIndex(df_fills['created_at']).month
df_fills['price'] = pd.to_numeric(df_fills['price'])
df_fills['size'] = pd.to_numeric(df_fills['size'])
df_fills['Fees'] = pd.to_numeric(df_fills['Fees'])
df_fills['buy_net'] = (df_fills[(df_fills["side"] == 'buy')]['price'] * 
                       df_fills[(df_fills["side"] == 'buy')]['size']) - df_fills[(df_fills["side"] == 'buy')]['Fees']
df_fills['sell_net'] = ((df_fills[(df_fills["side"] == 'sell')]['price'] * 
                       df_fills[(df_fills["side"] == 'sell')]['size']) - df_fills[(df_fills["side"] == 'sell')]['Fees'])
df_fills = df_fills.fillna(0)
df_fills['Total Net'] = df_fills['sell_net'] - df_fills['buy_net']
df_fills.drop(['liquidity', 'order_id', 'profile_id', 'settled', 'trade_id', 'user_id', 'price', 'size',
               'created_at', 'side', 'buy_net', 'sell_net'], axis = 'columns', inplace = True)

column_order = ['Year', 'Month', 'Currency', 'Total Net', 'Fees']
df_fills = df_fills.reindex_axis(column_order, axis = 1)

pd.pivot_table(df_fills, values = ['Total Net', 'Fees'], index = ['Year', 'Month'], 
               columns = ['Currency'], aggfunc = np.sum, margins = True, fill_value = 0)

As you can see, the Fees column shows up first, but - based on the call to reindex_axis() and how I listed the columns in pivot_table() - it should be showing up after Total Net, which is what I want. Let me know if that makes sense.

gfyoung · 2017-07-20T21:37:22Z

@ericbhanson : That definitely helps! Thanks for doing this, especially given the sensitive nature of the data (which is actually not even necessary here).

Could you put pictures of what is currently output and what you expected in your original issue description? That would be useful for anyone who just want to get an idea of what's going on.

BTW, feel free to also contribute a PR to patch this!

jreback · 2017-07-21T10:24:26Z

I am not at all clear what is going on. this needs a code reproducible example.

ericbhanson · 2017-07-21T14:30:53Z

@jreback in the screen shot I enclosed above, do you see how the column marked Fees is displayed before the column marked Total Net? This is the opposite order of what I would expect, given that I listed Total Net first in my list of values when creating the pivot table.

I can see if I can reproduce with a more generic dataset so I can include a complete code example, but please let me know if what I'm saying makes sense.

jreback · 2017-07-21T14:35:34Z

@ericbhanson screen shots are not very useful. it makes it impossible to even look at what you are doing. pls post a copy-pastable minimal example.

ericbhanson · 2017-07-21T15:10:29Z

@jreback A dummy CSV data file that closely replicates my original data source and works with the code below can be found here: https://drive.google.com/open?id=0B5QZk8ir7q-yTEJVOGdPeWtyZ1U

The code I'm using:

import pandas as pd
import numpy as np

pd.options.display.float_format = '${:,.2f}'.format

df_fills = pd.read_csv('pivot_example.csv')

df_fills.rename(columns = {'product_id' : 'Currency', 'fee' : 'Fees'}, inplace = True)
df_fills['Year'] = pd.DatetimeIndex(df_fills['created_at']).year
df_fills['Month'] = pd.DatetimeIndex(df_fills['created_at']).month
df_fills['price'] = pd.to_numeric(df_fills['price'])
df_fills['size'] = pd.to_numeric(df_fills['size'])
df_fills['Fees'] = pd.to_numeric(df_fills['Fees'])
df_fills['buy_net'] = (df_fills[(df_fills["side"] == 'buy')]['price'] * 
                       df_fills[(df_fills["side"] == 'buy')]['size']) - df_fills[(df_fills["side"] == 'buy')]['Fees']
df_fills['sell_net'] = ((df_fills[(df_fills["side"] == 'sell')]['price'] * 
                       df_fills[(df_fills["side"] == 'sell')]['size']) - df_fills[(df_fills["side"] == 'sell')]['Fees'])
df_fills = df_fills.fillna(0)
df_fills['Total Net'] = df_fills['sell_net'] - df_fills['buy_net']
df_fills.drop(['liquidity', 'order_id', 'profile_id', 'settled', 'trade_id', 'user_id', 'price', 'size',
               'created_at', 'side', 'buy_net', 'sell_net'], axis = 'columns', inplace = True)

column_order = ['Year', 'Month', 'Currency', 'Total Net', 'Fees']
df_fills = df_fills.reindex_axis(column_order, axis = 1)

pd.pivot_table(df_fills, values = ['Total Net'], index = ['Year', 'Month'], 
               columns = ['Currency'], aggfunc = np.sum, margins = True, fill_value = 0)

jreback · 2017-07-21T23:52:18Z

@ericbhanson pls make a much simpler example.all of the things before pivot_table are not relevant. simply construct a sample (short) frame.

ericbhanson · 2017-07-23T01:34:34Z

@jreback to avoid further back and forth, please let me know specifically what you need to see in the data frame in order to validate the existence of the bug.

ericbhanson · 2017-07-24T22:45:25Z

After further investigation with other column names, I have determined that the default behavior of pivot_table() is to display the contents of the value parameter in alphabetical order. I will look into making a code contribution to allow for user-defined sorting.

gfyoung · 2017-07-24T23:04:54Z

@ericbhanson : Excellent! Contributions are definitely welcome and appreciated! Thanks for looking into this despite the inability to share your data 😄

helikopterodaktyl · 2017-11-22T10:13:29Z

I am having the same issue, here's a simple example code:

import pandas as pd

tuples_list = [("John", "Foo", 47, 173), ("Michael", "Bar", 33, 182)]
df = pd.DataFrame(tuples_list, columns=["firstname", "lastname", "age", "height"])
print(df.pivot_table(index=["firstname", "lastname"], values=["age", "height"]))
print(df.pivot_table(index=["firstname", "lastname"], values=["height", "age"]))

Both pivot_tables return the same output, however I'd expect the second one to have the height and age columns swapped. If I change the order in 'index=' field, it will be reflected in the resulting pivot_table

vhlongmore · 2018-06-23T04:59:24Z

@ericbhanson If you haven't already made a code contribution but it is forthcoming, you may also want to consider that when margin=True, the margin row gets sorted automatically as well (and probably shouldn't ever as it is for sub/grand totals).

princeinzion · 2019-05-05T15:06:58Z

import pandas as pd

tuples_list = [("John", "Foo", 47, 173), ("Michael", "Bar", 33, 182)]
df = pd.DataFrame(tuples_list, columns=["firstname", "lastname", "age", "height"])
print(df.pivot_table(index=["firstname", "lastname"], values=["age", "height"]))
print(df.pivot_table(index=["firstname", "lastname"], values=["height", "age"]))

Create a re-index for your pivot data. See my example below.

import pandas as pd

tuples_list = [("John", "Foo", 47, 173), ("Michael", "Bar", 33, 182)]
df = pd.DataFrame(tuples_list, columns=["firstname", "lastname", "age", "height"])
my_report = df.pivot_table(index=["firstname", "lastname"], values=["age", "height"])

new_order= ["height", "age"]
my_report = my_report.reindex(new_order, axis=1)
my_report

sreehari54 · 2019-07-03T14:49:13Z

princeinzion .. thanks a lot . reindex resolved the issue

skinTest · 2019-07-24T08:38:39Z

import pandas as pd
tuples_list = [("John", "Foo", 47, 173), ("Michael", "Bar", 33, 182)]
df = pd.DataFrame(tuples_list, columns=["firstname", "lastname", "age", "height"])
print(df.pivot_table(index=["firstname", "lastname"], values=["age", "height"]))
print(df.pivot_table(index=["firstname", "lastname"], values=["height", "age"]))

Create a re-index for your pivot data. See my example below.
import pandas as pd

tuples_list = [("John", "Foo", 47, 173), ("Michael", "Bar", 33, 182)]
df = pd.DataFrame(tuples_list, columns=["firstname", "lastname", "age", "height"])
my_report = df.pivot_table(index=["firstname", "lastname"], values=["age", "height"])

new_order= ["height", "age"]
my_report = my_report.reindex(new_order, axis=1)
my_report

will this work when multiple columns are passed to the function like pivot_tables(columns=['foo', 'bar'])?

AngieZhang1025 · 2019-09-09T11:31:31Z

Any workaround on this?

jc085 · 2020-03-18T16:38:34Z

the solution from princeinzion is the way, I just want to add that depending on df you are working with , yo could need to add the level when reindexing:

my_report = df.pivot_table(index=[ "ticker"], values=["c_within_l1", "c_beyond_l1", "c_beyond_l2", "c_beyond_l3"], aggfunc=[np.mean])
new_order=["c_within_l1", "c_beyond_l1", "c_beyond_l2", "c_beyond_l3"]
my_report = my_report.reindex([ new_order, axis=1, level=1)
my_report

in this case if you don't add the level, it throws the tuple error

mohitthenewj · 2020-05-17T21:23:51Z

the solution from princeinzion is the way, I just want to add that depending on df you are working with , yo could need to add the level when reindexing:

my_report = df.pivot_table(index=[ "ticker"], values=["c_within_l1", "c_beyond_l1", "c_beyond_l2", "c_beyond_l3"], aggfunc=[np.mean])
new_order=["c_within_l1", "c_beyond_l1", "c_beyond_l2", "c_beyond_l3"]
my_report = my_report.reindex([ new_order, axis=1, level=1)
my_report

in this case if you don't add the level, it throws the tuple error

This was helpful, solved in my case.

joelvanveluwen · 2021-06-03T13:42:36Z

Is someone working on this? I'm not convinced that alphabetical ordering is desirable compared to just taking the order of the values supplied.

stefansimik · 2021-11-18T23:00:02Z

This is really not expected behavior.
Many students on my courses are asking the same question. It is very confusing behavior.
Sorting columns in order - as they are specified in pivot-table - would be much more reasonable & intuitive.

I wanted to add, that not only columns - are currently sorted alphabetically (in output), but also applied functions, if they are specified in list like here:
df.pivot_table(index='Region', aggfunc={'OrderAmount': ['sum', 'count', 'mean']})

Final columns will be in different order - 'count', 'mean', 'sum' (i.e. alphabetically again, not in specified order)

abedhammoud · 2022-01-07T10:59:43Z

I agree with the general sentiment in this discussion. In my case, I have a report where I would like to have the output dataframe showing QTY (quantity), followed by AMT (amount). When I pass {'QTY': 'sum', 'AMT': 'sum'} to pd.pivot_table, I expected the output dataframe to maintain this order. Currently, pd.pivot_table sorts the output columns alphabetically, putting AMT before QTY, which is not how usually such variables are presented.

ShawnZhong · 2022-05-11T05:36:36Z

It seems that pivot_table explicitly sorts the index:

pandas/pandas/core/reshape/pivot.py

Lines 219 to 220 in d49a244

    
           if isinstance(table, ABCDataFrame): 
        
               table = table.sort_index(axis=1)

Since v1.3.0, a new parameter sort=True is added to pivot_table. Should we disable index sorting when sort=False is explicitly passed?

gfyoung added the Reshaping Concat, Merge/Join, Stack/Unstack, Explode label Jul 20, 2017

gfyoung added the Bug label Jul 20, 2017

jreback removed Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Jul 21, 2017

gfyoung added Enhancement Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Jul 24, 2017

john-bodley mentioned this issue Jan 28, 2020

[fix] Pivot table metric ordering apache/superset#9031

Merged

12 tasks

ShawnZhong mentioned this issue May 11, 2022

ENH: do not sort resulting columns when sort=False #46994

Merged

4 tasks

jreback added this to the 1.5 milestone May 19, 2022

jreback closed this as completed in #46994 May 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pivot_table not displaying values columns in expected order #17041

pivot_table not displaying values columns in expected order #17041

ericbhanson commented Jul 20, 2017

gfyoung commented Jul 20, 2017

ericbhanson commented Jul 20, 2017

gfyoung commented Jul 20, 2017

ericbhanson commented Jul 20, 2017

gfyoung commented Jul 20, 2017

jreback commented Jul 21, 2017

ericbhanson commented Jul 21, 2017

jreback commented Jul 21, 2017

ericbhanson commented Jul 21, 2017

jreback commented Jul 21, 2017

ericbhanson commented Jul 23, 2017

ericbhanson commented Jul 24, 2017

gfyoung commented Jul 24, 2017 •

edited

helikopterodaktyl commented Nov 22, 2017 •

edited

vhlongmore commented Jun 23, 2018

princeinzion commented May 5, 2019 •

edited

sreehari54 commented Jul 3, 2019

skinTest commented Jul 24, 2019

AngieZhang1025 commented Sep 9, 2019

jc085 commented Mar 18, 2020

mohitthenewj commented May 17, 2020

joelvanveluwen commented Jun 3, 2021

stefansimik commented Nov 18, 2021 •

edited

abedhammoud commented Jan 7, 2022

ShawnZhong commented May 11, 2022

pivot_table not displaying values columns in expected order #17041

pivot_table not displaying values columns in expected order #17041

Comments

ericbhanson commented Jul 20, 2017

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

gfyoung commented Jul 20, 2017

ericbhanson commented Jul 20, 2017

gfyoung commented Jul 20, 2017

ericbhanson commented Jul 20, 2017

gfyoung commented Jul 20, 2017

jreback commented Jul 21, 2017

ericbhanson commented Jul 21, 2017

jreback commented Jul 21, 2017

ericbhanson commented Jul 21, 2017

jreback commented Jul 21, 2017

ericbhanson commented Jul 23, 2017

ericbhanson commented Jul 24, 2017

gfyoung commented Jul 24, 2017 • edited

helikopterodaktyl commented Nov 22, 2017 • edited

vhlongmore commented Jun 23, 2018

princeinzion commented May 5, 2019 • edited

sreehari54 commented Jul 3, 2019

skinTest commented Jul 24, 2019

AngieZhang1025 commented Sep 9, 2019

jc085 commented Mar 18, 2020

mohitthenewj commented May 17, 2020

joelvanveluwen commented Jun 3, 2021

stefansimik commented Nov 18, 2021 • edited

abedhammoud commented Jan 7, 2022

ShawnZhong commented May 11, 2022

Output of `pd.show_versions()`

gfyoung commented Jul 24, 2017 •

edited

helikopterodaktyl commented Nov 22, 2017 •

edited

princeinzion commented May 5, 2019 •

edited

stefansimik commented Nov 18, 2021 •

edited