Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REGR: merge no longer accepts a list for suffixes keyword #34741

Closed
jorisvandenbossche opened this issue Jun 13, 2020 · 15 comments · Fixed by #34810
Closed

REGR: merge no longer accepts a list for suffixes keyword #34741

jorisvandenbossche opened this issue Jun 13, 2020 · 15 comments · Fixed by #34810
Labels
Blocker Blocking issue or pull request for an upcoming release
Milestone

Comments

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Jun 13, 2020

see #34208 (comment)

The linked PR changed the merge routines to now only accept tuples for the suffixes keyword, and no longer general sequence.

This is a breaking change (eg it breaks some GeoPandas functions), but actually also changing documented behaviour: the merge docstring indeed says "tuple", but eg merge_asof still says "tuple or list". And in our user guide, we were ourselves using a list in one example, and it explicitly says "tuple or list" (see https://pandas.pydata.org/pandas-docs/version/1.0/user_guide/merging.html#overlapping-value-columns)

In general, I am certainly fine with restricting the set of accepted types (eg to avoid the confusing issue with sets, the original report), but this is 1) breaking a documented behaviour and 2) a change that could also easily be done with a deprecation warning.
And given our versioning policy, if it is easy to do a change with a deprecation, then we should try to do that, IMO.

@jorisvandenbossche jorisvandenbossche added the Blocker Blocking issue or pull request for an upcoming release label Jun 13, 2020
@jorisvandenbossche jorisvandenbossche added this to the 1.1 milestone Jun 13, 2020
@jreback
Copy link
Contributor

jreback commented Jun 13, 2020

why are you opening this issue? and how is this a regression?

@jorisvandenbossche
Copy link
Member Author

jorisvandenbossche commented Jun 13, 2020

Maybe read my linked comment? EDIT: updated the top post to include that comment

@jreback
Copy link
Contributor

jreback commented Jun 13, 2020

i read your comment

the code was wrong and it’s now fixed
if it happened to work that’s not a reason for a deprecation warning; this is why we have release notes

@jorisvandenbossche
Copy link
Member Author

It was documented to work with a list of strings.

@jreback
Copy link
Contributor

jreback commented Jun 13, 2020

@jorisvandenbossche
Copy link
Member Author

Jeff, please read my original comment: I clearly said that the merge user guide indeed indicates that it should be a tuple (the one you linked to), but the merge_asof docstring actually mentions list as well (https://pandas.pydata.org/docs/reference/api/pandas.merge_asof.html), and the same for the user guide (https://pandas.pydata.org/pandas-docs/version/1.0/user_guide/merging.html#overlapping-value-columns)

@jreback
Copy link
Contributor

jreback commented Jun 13, 2020

and your point? deprecating for merge_asof is fine but no action for merge is warranted

just because we can deprecate doesn’t mean we should - this just leaves code around for way too long

so happy to have a patch to deprecate / fix merge_asof as an addendum (which was a doc error initially iirc)

@jorisvandenbossche
Copy link
Member Author

deprecating for merge_asof is fine but no action for merge is warranted

It's documented for merge as well, in the user guide.

just because we can deprecate doesn’t mean we should - this just leaves code around for way too long

I quote our versioning policy (https://pandas.pydata.org/docs/dev/development/policies.html):

Whenever possible, a deprecation path will be provided rather than an outright breaking change.

And doing it with a deprecation would be easily possible here.

@jreback
Copy link
Contributor

jreback commented Jun 13, 2020

it doesn’t matter if it’s easy
and since when does the user guide trump the doc string itself?

as i said i am sympathetic for merge_asof but merge is a good change

if nothing broke in our test suite then there is no reason to revert / change this

geopandas needs a patch here -

@TomAugspurger
Copy link
Contributor

#34208 looks like an API breaking change. We'll need to adjust for it here in pandas. Warning that we'll restrict the types in the future seems easiest (and personally I'd be fine with just checking for sets and accepting anything else, but don't have a strong opinion).

@jorisvandenbossche
Copy link
Member Author

personally I'd be fine with just checking for sets and accepting anything else

+1, I think certainly lists would be perfectly fine to keep supporting

@TomAugspurger
Copy link
Contributor

FWIW, this also broke Dask.

@jreback
Copy link
Contributor

jreback commented Jun 15, 2020

if u want to change to a deprecation warning ok i guess

but this should be a tuple never a list or anything else iterable

@jorisvandenbossche
Copy link
Member Author

but this should be a tuple never a list or anything else iterable

Why not? What is inherently wrong about using a list for this?

@TomAugspurger
Copy link
Contributor

I think we generally duck-type user arguments like this where there's no ambiguity (i.e. not indexing).

TomAugspurger added a commit to TomAugspurger/pandas that referenced this issue Jun 15, 2020
Closes pandas-dev#34741, while
retaining the spirit of the spirit of
pandas-dev#34208.
TomAugspurger added a commit to TomAugspurger/dask that referenced this issue Jun 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Blocker Blocking issue or pull request for an upcoming release
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants