Deprecate groupby() squeeze option #32380

dechamps · 2020-02-29T21:46:36Z

Code Sample

import pandas as pd
print(pd.DataFrame([{
    'A': 1,
    'B': 1,
}, {
    'A': 2,
    'B': 2,
}, {
    'A': 2,
    'B': 3,
}]).groupby('A', squeeze=True).count())

Problem description

I expected .groupby(squeeze=True) to, well, squeeze, and count() to return a Series. Instead squeeze=True doesn't seem to do anything, and count() returns a DataFrame.

A workaround is to write .groupby('A').count().squeeze(), which does work.

Expected Output

A
1    1
2    2
Name: B, dtype: int64

Actual output

Output of `pd.show_versions()`

INSTALLED VERSIONS ------------------ commit : None python : 3.7.6.final.0 python-bits : 64 OS : Linux OS-release : 5.4.0-4-amd64 machine : x86_64 processor : byteorder : little LC_ALL : None LANG : en_GB.UTF-8 LOCALE : en_GB.UTF-8

pandas : 1.0.1
numpy : 1.17.4
pytz : 2019.3
dateutil : 2.8.1
pip : 18.1
setuptools : 44.0.0
Cython : None
pytest : 4.6.9
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.5.0
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.1
IPython : 7.12.0
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : 4.5.0
matplotlib : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
pytest : 4.6.9
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None
numba : None

The text was updated successfully, but these errors were encountered:

TomAugspurger · 2020-03-02T20:22:39Z

It doesn't appear to be documented, and I'm not familiar with it. @dechamps are you interested in walking through the code to see what it's intended for?

jreback · 2020-03-02T20:33:55Z

we should deprecate this option - i don’t think original usecases that i added are worth it

dechamps · 2020-03-02T22:43:49Z

It doesn't appear to be documented

Well the groupby() squeeze parameter does have documentation, it seems.

Personally I don't care much about the parameter - it's simple enough to just call squeeze() later anyway. However it is of course confusing if the parameter is there but does nothing.

WillAyd · 2020-03-03T01:05:54Z

+1 to deprecate as well; seems out of place as an argument here

MarcoGorelli · 2020-03-10T15:12:07Z

However it is of course confusing if the parameter is there but does nothing.

Here's an example of where it does something

In [2]: from pandas import DataFrame                                                                                                                                                                   

In [3]:     df3 = DataFrame( 
   ...:         [ 
   ...:             {"val1": 1, "val2": 20, 'val3': 1}, 
   ...:             {"val1": 1, "val2": 20, 'val3': 2}, 
   ...:             {"val1": 1, "val2": 20, 'val3': 3}, 
   ...:             {"val1": 1, "val2": 20, 'val3': 4}, 
   ...:         ] 
   ...:     ) 
   ...:                                                                                                                                                                                                

In [4]: df3.set_index(['val1', 'val2']).groupby(['val1', 'val2'], squeeze=True).apply(sum)                                                                                                             
Out[4]: 
val3    10
dtype: int64

In [5]: df3.set_index(['val1', 'val2']).groupby(['val1', 'val2'], squeeze=False).apply(sum)                                                                                                            
Out[5]: 
           val3
val1 val2      
1    20      10

Anyway, @dechamps , are you interested in submitting a pull request to deprecate it? If so, see https://pandas.pydata.org/docs/development/contributing.html - else, I'd happily take it and see the groupby logic simplified :)

dechamps · 2020-03-10T19:27:43Z

Honestly I'd be prefer you do it - I have zero familiarity with Pandas development workflows.

mlyons-tcc · 2020-09-29T17:24:17Z

Here's a rant: You deprecated squeeze in 1.1.0 in violation of your own deprecation policy introduced in 1.0.0.

edit: Perhaps it was intended to throw a DeprecationWarning instead of the FutureWarning that was used. FutureWarning indicates that it has already been deprecated and user is still using it.

jreback · 2020-09-30T03:58:10Z

we can and will deprecate things in almost every version

what the policy is not to remove those depreciated until a next major version eg 2.0

we don't use DeprecationWarning because it's not shown by default and IMHO just useless

FutureWarning is visible

you don't have to change you code and can continue to use it if you would like

mlyons-tcc · 2020-09-30T18:40:00Z

Great to hear that it is not going away until 2.0! I'm much appreciative of the deprecation policy you provided, and I took the FutureWarning to mean something else since I expected a DeprecationWarning.

Thanks for providing the rationale of FutureWarning.

Arguments Against Using FutureWarning for Deprecations
As I mentioned, I found the usage of FutureWarning to be ambiguous in terms of what the intentions were. It seemed to me that it could go away at any point in time and that a major release post deprecation had already happened. Or more scary that behavior was going to change since there is "existing use of FutureWarning to warn about constructs that will remain valid code in the future, but will have different semantics" (pep-0565). Otherwise, wouldn't I get a DeprecationWarning?

I think the big cause of confusion is that python changed its definition/recommendation of Deprecation and Future warnings in PEP-565, implemented in py3.7. Now instead of differentiating based on behavior, they are differentiating based on audience. "intended for other Python developers" as opposed to "intended for end users of applications that are written in Python". I think PEP565 has provided clear guidance that the type of warning that pandas should be providing for depreciations should in fact be a DeprecationWarning.

With regards to visibility, as of 3.7, DeprecationWarning are only visible if called from "__main__" by default. It's great for hiding the warnings from the application users. Recommendation is provided to use a test suite to make them visible. Warning visibility is also controllable in a number of ways so I definitely would not consider DeprecationWarning useless; it's actually quite proper.

Argument in Support of FutureWarning
Unfortunately, none of the REPLs are making DeprecationWarnings generated from modules visible as far as I can tell. IPython even went so far as to say they want to hide deprecation warnings from modules because some of their dependencies produce a bunch :eyeroll:. Many in the scientific computing landscape depend solely on Jupyter so it would be unfortunate if they never get visibility into these warnings as they code/execute in a notebook environment. Because of that, as much as I truly believe these should be DeprecationWarnings, I see the importance for them to be FutureWarning if it is of most importance to surface these warnings to that community of users that are not first and foremost software devs.

Final Thought
Using FutureWarning does cause a problem in that someone using pandas to create applications cannot easily ignore pandas deprecation warnings intended for developers without also ignoring warnings intended for the user of the application. If/when Jupyter decides to start surfacing DeprecationWarnings by default, I think it would be a good time to change the type of warning generated in pandas.

jreback · 2020-10-01T03:46:52Z

we are unlikely to change the warning type as visibility is most important

AlexanderNenninger · 2022-07-04T07:48:07Z

Hi,

I need exactly this behavior when applying functions to the GroupBy. Is there guidance on alternatives?

However it is of course confusing if the parameter is there but does nothing.

Here's an example of where it does something

In [2]: from pandas import DataFrame                                                                                                                                                                   

In [3]:     df3 = DataFrame( 
   ...:         [ 
   ...:             {"val1": 1, "val2": 20, 'val3': 1}, 
   ...:             {"val1": 1, "val2": 20, 'val3': 2}, 
   ...:             {"val1": 1, "val2": 20, 'val3': 3}, 
   ...:             {"val1": 1, "val2": 20, 'val3': 4}, 
   ...:         ] 
   ...:     ) 
   ...:                                                                                                                                                                                                

In [4]: df3.set_index(['val1', 'val2']).groupby(['val1', 'val2'], squeeze=True).apply(sum)                                                                                                             
Out[4]: 
val3    10
dtype: int64

In [5]: df3.set_index(['val1', 'val2']).groupby(['val1', 'val2'], squeeze=False).apply(sum)                                                                                                            
Out[5]: 
           val3
val1 val2      
1    20      10

Anyway, @dechamps , are you interested in submitting a pull request to deprecate it? If so, see https://pandas.pydata.org/docs/development/contributing.html - else, I'd happily take it and see the groupby logic simplified :)

…re. Besides, the current code is cause error when working with a newer pandas version 1.4.4. So using a work around as recommended in pandas-dev/pandas#32380

brandonrwin · 2023-08-01T03:08:55Z

Hi,

I need exactly this behavior when applying functions to the GroupBy. Is there guidance on alternatives?

I believe it's

df3.set_index(['val1', 'val2']).groupby(['val1', 'val2']).apply(sum).squeeze()

DataFrameGroupBy.apply() returns a DataFrame, and you use Dataframe.squeeze() on that.

MarcoGorelli added the Reshaping Concat, Merge/Join, Stack/Unstack, Explode label Feb 29, 2020

WillAyd added the Deprecate Functionality to remove in pandas label Mar 3, 2020

WillAyd added this to the Contributions Welcome milestone Mar 3, 2020

WillAyd changed the title ~~The groupby() squeeze option doesn't seem to do anything~~ Deprecate groupby() squeeze option Mar 3, 2020

phofl mentioned this issue Apr 1, 2020

32380 deprecate squeeze in groupby #33218

Merged

5 tasks

jreback modified the milestones: Contributions Welcome, 1.1 May 22, 2020

jreback closed this as completed in #33218 May 22, 2020

phofl mentioned this issue Oct 14, 2022

DEP: Enforce deprecation of squeeze argument in groupby #49082

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deprecate groupby() squeeze option #32380

Deprecate groupby() squeeze option #32380

dechamps commented Feb 29, 2020

TomAugspurger commented Mar 2, 2020

jreback commented Mar 2, 2020

dechamps commented Mar 2, 2020

WillAyd commented Mar 3, 2020

MarcoGorelli commented Mar 10, 2020

dechamps commented Mar 10, 2020

mlyons-tcc commented Sep 29, 2020 •

edited

Loading

jreback commented Sep 30, 2020 •

edited

Loading

mlyons-tcc commented Sep 30, 2020 •

edited

Loading

jreback commented Oct 1, 2020

AlexanderNenninger commented Jul 4, 2022

brandonrwin commented Aug 1, 2023 •

edited

Loading

Deprecate groupby() squeeze option #32380

Deprecate groupby() squeeze option #32380

Comments

dechamps commented Feb 29, 2020

Code Sample

Problem description

Expected Output

Actual output

Output of pd.show_versions()

TomAugspurger commented Mar 2, 2020

jreback commented Mar 2, 2020

dechamps commented Mar 2, 2020

WillAyd commented Mar 3, 2020

MarcoGorelli commented Mar 10, 2020

dechamps commented Mar 10, 2020

mlyons-tcc commented Sep 29, 2020 • edited Loading

jreback commented Sep 30, 2020 • edited Loading

mlyons-tcc commented Sep 30, 2020 • edited Loading

jreback commented Oct 1, 2020

AlexanderNenninger commented Jul 4, 2022

brandonrwin commented Aug 1, 2023 • edited Loading

Output of `pd.show_versions()`

mlyons-tcc commented Sep 29, 2020 •

edited

Loading

jreback commented Sep 30, 2020 •

edited

Loading

mlyons-tcc commented Sep 30, 2020 •

edited

Loading

brandonrwin commented Aug 1, 2023 •

edited

Loading