New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Fix KeyError in merge on CategoricalIndex #20777

Merged
merged 5 commits into from May 3, 2018

Conversation

Projects
None yet
2 participants
@fjetter
Contributor

fjetter commented Apr 21, 2018

For categorical type indices a KeyError is raised when the index level is used during a merge on an index level

Example:

import pandas as pd
left = pd.DataFrame(
    {"left_data": [1, 2]},
   index=pd.CategoricalIndex(data=["A", "B"], categories=["A", "B"], name='index_col')
)
right = pd.DataFrame(
      {"right_data": [1.0, 2.0]},
   index=pd.CategoricalIndex(data=["A", "B"], categories=["A", "B"], name='index_col')
)
result = left.merge(right, on=['index_col'])

With this fix, the behavior of the test test_merge_datetime_index(self, klass) changed, though. IMHO, the behavior in this PR is more consistent since it is the same for all input types but I'm not sure what the actual behavior should be and I couldn't find a section in the documentation explaining this path.

  • closes #xxxx
  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry
@jreback

This comment has been minimized.

Contributor

jreback commented Apr 21, 2018

can u show. short example of what you are trying to do

@fjetter

This comment has been minimized.

Contributor

fjetter commented Apr 21, 2018

The following code raises a KeyError if the index is of categorical type but is ok for all other types

import pandas as pd
left = pd.DataFrame(
    {"left_data": [1, 2]},
   index=pd.CategoricalIndex(data=["A", "B"], categories=["A", "B"], name='index_col')
)
right = pd.DataFrame(
      {"right_data": [1.0, 2.0]},
   index=pd.CategoricalIndex(data=["A", "B"], categories=["A", "B"], name='index_col')
)
result = left.merge(right, on=['index_col'])

I updated the title and the description

@fjetter fjetter changed the title from BUG: Fix group key inference for CategoricalIndex to BUG: Fix KeyError in merge on CategoricalIndex Apr 21, 2018

@jreback

if you can fix up the formatting will have another look. pls add a whatsnew note as well. (reshaping bug fixes)

@pytest.mark.parametrize('index',
[

This comment has been minimized.

@jreback

jreback Apr 21, 2018

Contributor

write like

@pytest.mark.parametrize(
    'index',
     [

so it is closer to the left margin, then each of the indexes can be writen more simply, also don't use the data kwarg
e.g.
```Index(['A', 'B'], name='index_col')``

name='index_col'),
])
def test_merge_index_types(index):
left = DataFrame(

This comment has been minimized.

@jreback

jreback Apr 21, 2018

Contributor

should be a 1-liner

@codecov

This comment has been minimized.

codecov bot commented Apr 22, 2018

Codecov Report

Merging #20777 into master will increase coverage by 0.03%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #20777      +/-   ##
==========================================
+ Coverage   91.81%   91.85%   +0.03%     
==========================================
  Files         153      153              
  Lines       49471    49310     -161     
==========================================
- Hits        45422    45292     -130     
+ Misses       4049     4018      -31
Flag Coverage Δ
#multiple 90.24% <100%> (+0.03%) ⬆️
#single 41.89% <50%> (+0.04%) ⬆️
Impacted Files Coverage Δ
pandas/core/reshape/merge.py 94.25% <ø> (ø) ⬆️
pandas/core/algorithms.py 94.38% <100%> (-0.11%) ⬇️
pandas/core/indexing.py 93.08% <0%> (-0.48%) ⬇️
pandas/core/strings.py 98.32% <0%> (-0.31%) ⬇️
pandas/core/dtypes/base.py 91.89% <0%> (-0.22%) ⬇️
pandas/core/dtypes/cast.py 87.85% <0%> (-0.21%) ⬇️
pandas/core/indexes/api.py 98.78% <0%> (-0.15%) ⬇️
pandas/core/series.py 93.9% <0%> (-0.13%) ⬇️
pandas/core/dtypes/missing.py 92.85% <0%> (-0.09%) ⬇️
pandas/core/indexes/datetimelike.py 96.72% <0%> (-0.08%) ⬇️
... and 31 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d3d3352...0664858. Read the comment docs.

@fjetter

This comment has been minimized.

Contributor

fjetter commented May 3, 2018

@jreback I fixed the formatting and added a changelog entry for the bug I fixed. Since I didn't know for sure if the other change in behavior is intended I left it out for now. Can add another entry once this is settled

@jreback jreback added this to the 0.23.0 milestone May 3, 2018

@jreback

jreback approved these changes May 3, 2018

@jreback jreback merged commit 21f5fb1 into pandas-dev:master May 3, 2018

0 of 3 checks passed

ci/circleci Your tests are queued behind your running builds
Details
continuous-integration/appveyor/pr Waiting for AppVeyor build to complete
Details
continuous-integration/travis-ci/pr The Travis CI build is in progress
Details
@jreback

This comment has been minimized.

Contributor

jreback commented May 3, 2018

thanks @fjetter nice patch!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment