Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: Allow non-tuples in pandas.merge #34810

Merged
merged 8 commits into from Jun 30, 2020

Conversation

TomAugspurger
Copy link
Contributor

Closes #34741, while
retaining the spirit of the spirit of
#34208.

@@ -746,6 +745,7 @@ Deprecations
- :meth:`DataFrame.to_dict` has deprecated accepting short names for ``orient`` in future versions (:issue:`32515`)
- :meth:`Categorical.to_dense` is deprecated and will be removed in a future version, use ``np.asarray(cat)`` instead (:issue:`32639`)
- The ``fastpath`` keyword in the ``SingleBlockManager`` constructor is deprecated and will be removed in a future version (:issue:`33092`)
- Providing ``suffixes`` as a ``set`` in :func:`pandas.merge` is deprecated. Provide a tuple instead (:issue:`33740`, :issue:`34741`).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we are going to do this (I guess @jorisvandenbossche really wants this), then let's update the doc-string of pd.merge to be correct (needs list-like) and the docs in reshape.rst as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pd.merge docs already state that we want a sequence.

    suffixes : Sequence, default is ("_x", "_y")
        A length-2 sequence where

Updated the DataFrame.merge docs to be consistent.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm we don't use sequence elsewhere do we, don't we use list-like everywhere?

raise TypeError(
f"suffixes should be tuple of (str, str). But got {type(suffixes).__name__}"
if isinstance(suffixes, set):
warnings.warn(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just raise? also what about a dict or non-list-like for example?

do we check is_list_like anywhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'll need a warning according to our backwards compatibility poilcy.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's easy to do it with a warning, so I would also just do that.

@TomAugspurger if we don't want to have it set-specific, could also do if not is_list_like(suffixes, allow_sets=False), although in practice that might give the same.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that should use what @jorisvandenbossche suggests here , e.g. dict would pass this test but is not acceptable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dicts are ordered though, so they shouldn't be a problem. I thought the original issue was non-deterministic outputs because sets aren't ordered?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually sets are not ordered, but I think we need an actual sequence here (sure .keys() satisfies), but i'd rather be explicit. we do not treat dicts as list-like anywhere else.

@jreback jreback added the Reshaping Concat, Merge/Join, Stack/Unstack, Explode label Jun 15, 2020
Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a test that passes the suffixes as a list? (to ensure we don't accidentally remove this again)

@@ -746,6 +745,7 @@ Deprecations
- :meth:`DataFrame.to_dict` has deprecated accepting short names for ``orient`` in future versions (:issue:`32515`)
- :meth:`Categorical.to_dense` is deprecated and will be removed in a future version, use ``np.asarray(cat)`` instead (:issue:`32639`)
- The ``fastpath`` keyword in the ``SingleBlockManager`` constructor is deprecated and will be removed in a future version (:issue:`33092`)
- Providing ``suffixes`` as a ``set`` in :func:`pandas.merge` is deprecated. Provide a tuple instead (:issue:`33740`, :issue:`34741`).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm we don't use sequence elsewhere do we, don't we use list-like everywhere?

raise TypeError(
f"suffixes should be tuple of (str, str). But got {type(suffixes).__name__}"
if isinstance(suffixes, set):
warnings.warn(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that should use what @jorisvandenbossche suggests here , e.g. dict would pass this test but is not acceptable

def test_merge_suffix_type_error(col1, col2, suffixes):
a = pd.DataFrame({col1: [1, 2, 3]})
b = pd.DataFrame({col2: [3, 4, 5]})
def test_merge_suffix_set():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

call this non_list_like parmaterize on dict as well

@jreback jreback added this to the 1.1 milestone Jun 30, 2020
@jreback jreback merged commit eb48e10 into pandas-dev:master Jun 30, 2020
@jreback
Copy link
Contributor

jreback commented Jun 30, 2020

thanks @TomAugspurger

with pytest.raises(TypeError, match=msg):
pd.merge(a, b, left_index=True, right_index=True, suffixes=suffixes)
with tm.assert_produces_warning(FutureWarning):
pd.merge(a, b, left_index=True, right_index=True, suffixes={"left", "right"})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are not actually passing the parameterized suffixes here, so I think that the dict case is not tested, and will also not raise a warning (as dicts are considered list-like)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

REGR: merge no longer accepts a list for suffixes keyword
3 participants