Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Index.intersection bug? #8362

Closed
tom-alcorn opened this issue Sep 22, 2014 · 6 comments
Closed

Index.intersection bug? #8362

tom-alcorn opened this issue Sep 22, 2014 · 6 comments
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves
Milestone

Comments

@tom-alcorn
Copy link
Contributor

I ran into an odd behaviour when taking the intersection of two Indexes. Specifically,

left = pd.Index(['A','B','A','C'])
right = pd.Index(['B','D'])
left.intersection(right)

returns

Index(['B', 'C'], dtype='object')

However, I would expect this to return

Index(['B'], dtype='object')

If Index(['B', 'C'], dtype='object') is the intended behaviour, can someone explain the rationale behind it?

@jreback
Copy link
Contributor

jreback commented Sep 22, 2014

can I show pandas version

@tom-alcorn
Copy link
Contributor Author

>>> pd.version.version
'0.14.1'

@jreback
Copy link
Contributor

jreback commented Sep 22, 2014

yep this is a non-tested case of non-monotinic non-unique indexers

which results in index.take([1,-1]) in this case (which incorrectly works; the -1 is an indexer that should be filtered out).

care to do a pull-request? (last case in core/index.py.Index.intersection)

@jreback jreback added Bug Indexing Related to indexing on series/frames, not to indexes themselves labels Sep 22, 2014
@jreback jreback added this to the 0.15.0 milestone Sep 22, 2014
@tom-alcorn
Copy link
Contributor Author

This looks like the problem:

try:
    indexer = self.get_indexer(other.values)
    indexer = indexer.take((indexer != -1).nonzero()[0])
except:
    # duplicates
    indexer = self.get_indexer_non_unique(other.values)[0].unique()

Do you think this would make more sense?

try:
    indexer = self.get_indexer_non_unique(other.values)[0].unique()
    indexer = self.get_indexer(other.values)
    indexer = indexer.take((indexer != -1).nonzero()[0])
except:
    raise

@jreback
Copy link
Contributor

jreback commented Sep 22, 2014

no much simpler

before the take do

indexer = indexer[indexer != -1]

-1 is the get_indexer placeholder for missing so needs to be taken out

tom-alcorn added a commit to tom-alcorn/pandas that referenced this issue Sep 23, 2014
tom-alcorn added a commit to tom-alcorn/pandas that referenced this issue Sep 23, 2014
tom-alcorn added a commit to tom-alcorn/pandas that referenced this issue Sep 23, 2014
@jreback
Copy link
Contributor

jreback commented Sep 29, 2014

closed by #8374

@jreback jreback closed this as completed Sep 29, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants