Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: set should not be considered list_like #23061

Closed
h-vetinari opened this issue Oct 9, 2018 · 7 comments

Comments

Projects
None yet
4 participants
@h-vetinari
Copy link
Contributor

commented Oct 9, 2018

The name of the function is_list_like from pandas.core.dtypes.common suggests that whatever gets True returned will be "like a list". As such, it's used in a handful of places - e.g. I got asked to use it in #20347, and now again in #22486. What I found out in the latter one is that it's true for sets:

>>> from pandas.core.dtypes.common import is_list_like
>>> is_list_like({1, 2, 3})
True

This has some uncomfortable consequences - for str.cat it's a bug (#23009), and for df.set_index it would be too.

@jreback asked me to try out removing set from is_list_like (#22486 (comment)), so this issue is following up on that.

@h-vetinari why don't you try (separate PR) excluding set from is_list_like and see what the implications of that are.

There are some rare cases like .isin, where sets should also be included. I'd say to have a looser definition is_set_like that's basically is_list_like or set. Alternatively, one could think of is_list_like(strict=False) to include sets.

Another question is if this needs a deprecation cycle and how?

@TomAugspurger

This comment has been minimized.

Copy link
Contributor

commented Oct 9, 2018

FWIW, changing these is_* in the past have led to us breaking user-code. I don't think it's possible to have a high-level is_list_like that does the right thing in for every use case.

@jschendel

This comment has been minimized.

Copy link
Member

commented Oct 9, 2018

I imagine this issue extends beyond just set, e.g. dict is considered list-like too:

In [2]: d = dict(zip('xyz', [10, 20, 30]))

In [3]: d
Out[3]: {'x': 10, 'y': 20, 'z': 30}

In [4]: is_list_like(d)
Out[4]: True

In [5]: s = pd.Series(list('abc'))

In [6]: s.str.cat(d)
Out[6]:
0    ax
1    by
2    cz
dtype: object

Quickly reading over the associated issues, it looks like what you want is to basically place an additional restriction on is_list_like so unordered objects return False? Given that is_list_like relies on collections.abc.Iterable, it seems like such a change would cut out a significant number of things that were previously considered list-like, and to @TomAugspurger's point, would lead to breaking user code. I wouldn't be opposed to adding a function along the lines of is_ordered_list_like though.

@h-vetinari

This comment has been minimized.

Copy link
Contributor Author

commented Oct 9, 2018

@jschendel

dict is ordered from 3.6 onwards, but in any case, it could be included in the path I've now chosen to pursue for a first PR:

  • add a strict-kwarg to is_list_like, which defaults to False (raising FutureWarning if not passed), in which case the behaviour is unchanged. For strict=True, sets are excluded.
  • this has the advantage that it can't break things.

@h-vetinari h-vetinari referenced this issue Oct 9, 2018

Merged

Add allow_sets-kwarg to is_list_like #23065

4 of 4 tasks complete
@h-vetinari

This comment has been minimized.

Copy link
Contributor Author

commented Oct 9, 2018

Added a first draft of a PR in #23065. Happy to add dict (with PY35) to the list of excluded types for strict=True.

@TomAugspurger

This comment has been minimized.

Copy link
Contributor

commented Oct 9, 2018

I don't think that's a good idea. In the past changing these have led to surprising breaks in user code. These checks are used all over pandas, so it's hard to predict the ramifications of changing one.

If a user has a set-like object, then what would they do when they see that warning cropping up from their object?

I think if the caller needs to know that the collection is ordered, then they should assert that independently.

@h-vetinari

This comment has been minimized.

Copy link
Contributor Author

commented Oct 9, 2018

@TomAugspurger

  • It's not a breaking change
  • I've hunted down all the places where it's used internally
  • No tests break

If a user has a set-like object, then what would they do when they see that warning cropping up from their object?

They're not gonna see it from pandas, unless they from pandas.core.dtypes.common import is_list_like. And then, the warning tells them that sets will not be considered list-like in the future unless strict=False is specified. Can't see how this is ambiguous...?

@h-vetinari h-vetinari referenced this issue Oct 9, 2018

Merged

API: better error-handling for df.set_index #22486

4 of 4 tasks complete

@jreback jreback added this to the 0.24.0 milestone Oct 17, 2018

@h-vetinari

This comment has been minimized.

Copy link
Contributor Author

commented Oct 23, 2018

@jreback @gfyoung @WillAyd this issue has been closed by #23065.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.