Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
BUG: DataFrame.groupby() on tuple column works only when column name is "key" #14848
Comments
jorisvandenbossche
added Bug Groupby
labels
Dec 10, 2016
|
Tuples in columns in not that well supported/tested, but this looks indeed like a bug. Welcome to look into it! |
dragonator4
commented
Dec 10, 2016
|
It was working perfect in 0.19.0. Updating to 0.19.1 broke my code and this turned out to be the cause. I usually don't work with tuples in columns, but my current project called for it. I had the choice of using three columns to store the three values in the tuple, but since tuples are immutable, I chose to use them. I have workarounds, so it is not high priority for me. |
|
There's a correlation between the length of the column name and the number of items in the tuples. So |
Dr-Irv
added a commit
to Dr-Irv/pandas
that referenced
this issue
Jan 10, 2017
|
|
Dr-Irv |
67cb94c
|
Dr-Irv
added a commit
to Dr-Irv/pandas
that referenced
this issue
Jan 10, 2017
|
|
Dr-Irv |
fed9801
|
Dr-Irv
referenced
this issue
Jan 11, 2017
Closed
BUG: Fix for GH #14848 for groupby().describe() with tuples as the Index #15110
Dr-Irv
added a commit
to Dr-Irv/pandas
that referenced
this issue
Jan 12, 2017
|
|
Dr-Irv |
5a0f0f7
|
Dr-Irv
added a commit
to Dr-Irv/pandas
that referenced
this issue
Jan 12, 2017
|
|
Dr-Irv |
c781d5a
|
Dr-Irv
added a commit
to Dr-Irv/pandas
that referenced
this issue
Jan 12, 2017
|
|
Dr-Irv |
9489cb2
|
jreback
added this to the
0.20.0
milestone
Jan 13, 2017
jreback
closed this
in ab0d236
Jan 13, 2017
mroeschke
referenced
this issue
Feb 1, 2017
Closed
API: Reformat output of groupby.describe (#4792) #15260
AnkurDedania
added a commit
to AnkurDedania/pandas
that referenced
this issue
Mar 21, 2017
|
|
Dr-Irv + AnkurDedania |
be4e1cf
|
dragonator4 commentedDec 10, 2016
•
edited
This is the weirdest bug I have seen in Pandas. But I am guessing (hoping) the fix will not be too difficult.
Code Sample
Consider the following two code blocks:
Block 1: key column is called "k"
Block 2: key column is called "key"
Note that the same, static data is used, so that nothing else may be different, and hence culpable.
Problem description
Running a simple
.groupby().describe()operation produces the following results:Note that
groupby().mean(),sum(), and a few others work fine.describe()is the only one I think is causing the problem.Expected Output
Obviously, the expected output for
df1.groupby('k').describe()should be the same asdf2.groupby('key').describe().Output of
pd.show_versions()pandas: 0.19.1
nose: None
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.1
numpy: 1.11.2
scipy: 0.18.1
statsmodels: 0.6.1
xarray: 0.8.2
IPython: 5.1.0
sphinx: None
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2016.7
blosc: None
bottleneck: 1.1.0
tables: 3.3.0
numexpr: 2.6.1
matplotlib: 1.5.3
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: 0.9.4
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.8
boto: None
pandas_datareader: None