Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: set_index for Series #21684

Open
h-vetinari opened this issue Jun 29, 2018 · 3 comments
Open

ENH: set_index for Series #21684

h-vetinari opened this issue Jun 29, 2018 · 3 comments

Comments

@h-vetinari
Copy link
Contributor

h-vetinari commented Jun 29, 2018

I was surprised to find that DataFrame.set_index has no analogue for Series. This is especially surprising because I remember reading in several comments here that it is strongly preferred (i.a. by @jreback) that users do not set attributes like .name or .index directly -- but currently, the only options to change the index on a Series is either that, or reconstructing with pd.Series(s, index=desired_index).

This is relevant in many scenarios, but in case someone would like a more concrete example -- I'm currently working on having .duplicated be able to return an inverse for DataFrame / Series / Index, see #21645. This inverse needs to link two different indexes -- the one of the original object, and the index of the deduplicated one. To reconstruct from the unique values, one needs exactly such a .set_index operation, because .reindex in itself cannot read a set of indexes and assign them to a different set of indexes in one go (xref #21685).

s = pd.Series(['a', 'b', 'a', 'c', 'a', 'b'])
isdup, inv = s.duplicated(keep='last', return_inverse=True)
isdup
# 0     True
# 1     True
# 2     True
# 3    False
# 4    False
# 5    False
# dtype: bool

inv
# 0    4
# 1    5
# 2    4
# 3    3
# 4    4
# 5    5
# dtype: int64

unique = s.loc[~isdup]
unique
# 3    c
# 4    a
# 5    b
# dtype: object

reconstruct = unique.reindex(inv)
reconstruct 
# 4    a
# 5    b
# 4    a
# 3    c
# 4    a
# 5    b
# dtype: object

This object obviously still has the wrong index to be equal to the original. For DataFrames, the reconstruction would work as unique.reindex(inv.values).set_index(inv.index), and consequently, this should be available for Series as well:

Desired:

reconstruct = unique.reindex(inv.values).set_index(inv.index)
reconstruct
# 0    a
# 1    b
# 2    a
# 3    c
# 4    a
# 5    b
# dtype: object
@gfyoung gfyoung added Enhancement Indexing Related to indexing on series/frames, not to indexes themselves labels Jul 1, 2018
@gfyoung
Copy link
Member

gfyoung commented Jul 1, 2018

@h-vetinari : IINM, I imagine that the implementation for DataFrame.set_index could somehow be adopted to Series pretty easily. Perhaps they could even share the implementation in core/generic.py.

@ghost
Copy link

ghost commented Jul 22, 2019

The PR discussion for this turned quite ugly. For those finding this in the future, here's a summary:

  1. DataFrame.set_index does support passing an array as new index labels.
  2. it's not meant to be used, is only lightly documented, and core wants to deprecate and remove it.
  3. set_axis is available on both DataFrame and Series, and handles this case exclusively.
  4. I've suggested in ENH: Add Series.set_index #27504 that Series.set_index be updated to have the same behavior, and that both be modified to raise a deprecation warning on use, asking users to use set_axis.

We'll see how the latter goes.

@ghost ghost mentioned this issue Jul 22, 2019
@giuliobeseghi
Copy link

It seems that set_axis is a bit of a mess:

  1. DataFrame and Series:
    a. The default for inplace is True, should be False
    b. It should probably follow the convention of other methods (e.g. rename_axis) with the index/columns argument

  2. Series:
    a. The docs suggest that the value of axis can be passed, and can be different from zero. It shouldn't make sense - series just have one axis, the index. But maybe it's just the docs out of date?

@jbrockmendel jbrockmendel removed the Indexing Related to indexing on series/frames, not to indexes themselves label Feb 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants