Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: index name lost when indexing with another index #9943

Closed
sergeny opened this issue Apr 20, 2015 · 8 comments
Closed

BUG: index name lost when indexing with another index #9943

sergeny opened this issue Apr 20, 2015 · 8 comments
Labels
Bug good first issue Indexing Related to indexing on series/frames, not to indexes themselves Reshaping Concat, Merge/Join, Stack/Unstack, Explode

Comments

@sergeny
Copy link

sergeny commented Apr 20, 2015

Very subtle. Index name stays when using .ix [ list ], but gets lost when using .ix[ Int64Index ].

import pandas as pd
from pandas.util.testing import assert_frame_equal
import numpy as np

assert pd.__version__ == '0.16.0'
df = pd.DataFrame([np.nan, np.nan], columns = ['tags'], index=pd.Int64Index([4815961, 4815962], dtype='int64', name='id'))

assert str(df) == '         tags\nid           \n4815961   NaN\n4815962   NaN'
# OK.

L = [4815962]

assert list(L) == list(df.index.intersection(L))
# succeeds. It's just a type difference


print df.ix[L].tags.index.name
#>>> 'id'
print df.ix[df.index.intersection(L)].tags.index.name
#>>>




assert  df.ix[L].tags.index.name == df.ix[df.index.intersection(L)].tags.index.name
# assertion failure. Should really succeed.

assert_frame_equal(df.ix[L], df.ix[df.index.intersection(L)])
# assertion failure. Should really succeed.
# AssertionError: attr is not equal [names]: FrozenList([u'id']) != FrozenList([None])
@jreback
Copy link
Contributor

jreback commented Apr 20, 2015

There are 2 things going on here:

@shoyer

@jreback jreback added this to the Next Major Release milestone Apr 20, 2015
@jreback jreback added Bug Indexing Related to indexing on series/frames, not to indexes themselves Reshaping Concat, Merge/Join, Stack/Unstack, Explode Difficulty Novice labels Apr 20, 2015
@shoyer
Copy link
Member

shoyer commented Apr 20, 2015

I agree, indexing should never change index names.

hsperr added a commit to hsperr/pandas that referenced this issue Apr 22, 2015
@jreback jreback modified the milestones: 0.16.1, Next Major Release Apr 22, 2015
@jreback jreback modified the milestones: 0.17.0, 0.16.1 Apr 28, 2015
@jreback jreback modified the milestones: Next Major Release, 0.17.0 Aug 15, 2015
@Dr-Irv
Copy link
Contributor

Dr-Irv commented Feb 22, 2018

@jreback When this was opened, you wrote:

.intersection (and prob .union) are zonking the name if its None.

I'd like to confirm that the behavior for .union and .intersection should be different. Namely, for .union, if the names are the same, or only one is specified, but the other is not, then take the name, but if the names are different, set the name to None. But for .intersection, only take the names if they are exactly the same.

If that is the case, then the behavior reported initially is what is expected, because the intersection is being taken between an Index with a name, and a list (that has no name).

@Dr-Irv
Copy link
Contributor

Dr-Irv commented Feb 23, 2018

Following up on my question to @jreback above about the behaviors, I think the behavior of .union and .intersection with respect to names needs to be the same, using the pattern for intersection. Namely, if the names are the same, use that name, otherwise return None. The reason is that a chained union operation can then give odd results if the order of the union changes. For example, let's say you have 3 indexes as follows:

i1 = pd.Index([1,2], name='i1')
i2 = pd.Index([3,4], name='i2')
i3 = pd.Index([5,6], name='i3')

And then you compute i1.union(i2.union(i3)), under the "intersection" behavior, the resulting name of this index is None. But if we use the "union" behavior, then the name of the result is "i1". However, changing the order of the union, as in (i1.union(i2)).union(i3), using the "union" behavior, the resulting name would be "i3".

In fact, pandas 0.22.0 has a bug in the following case (which only occurs when one of the indexes in the union operation is empty, or if taking the union or intersection of 2 indexes that are the same, but have different names):

j1 = pd.Index([1,2], name='j1')
j2 = pd.Index([], name='j2')
j3 = pd.Index([], name='j3')

In this case, j1.union(j2).union(j3) returns Int64Index([1, 2], dtype='int64', name='j3'), while just changing the order to j3.union(j1).union(j2) returns Int64Index([1, 2], dtype='int64', name='j2').

I hope to have straightened this out when I get things right in the pull request #19849.

@jreback
Copy link
Contributor

jreback commented Feb 23, 2018

we have very specific matching behavior for Index operations

meaning that if they match you get the name back; if they are different you get None (this includes the case of one has None one has a value)

so not sure this should be any different

@Dr-Irv
Copy link
Contributor

Dr-Irv commented Feb 23, 2018

@jreback I agree, and my latest version of the pull request #19849 addresses this and has the behavior you describe. There were some boundary cases that were not handled that way (see the j1,j2,j3) example above as an example.

@gfyoung
Copy link
Member

gfyoung commented Nov 6, 2018

#19849 is now merged. Not sure how it affects the status of this issue? @jreback

@mroeschke
Copy link
Member

It appears that #19849 was supposed to close this issue, but if that's not the case happy to reopen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug good first issue Indexing Related to indexing on series/frames, not to indexes themselves Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants