Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: multi-index joining returns wrong multiindex #16182

Closed
OXPHOS opened this issue May 1, 2017 · 1 comment
Closed

BUG: multi-index joining returns wrong multiindex #16182

OXPHOS opened this issue May 1, 2017 · 1 comment

Comments

@OXPHOS
Copy link
Contributor

OXPHOS commented May 1, 2017

Code Sample, a copy-pastable example if possible

modified from TestMergeMulti.test_join_multi_levels

import pandas as pd

household = (
    pd.DataFrame(
        dict(A=[1, 2, 3],
             B=[0, 1, 0],
             C=[19.3, 31.7, 29]),
        columns=['A', 'B', 'C'])
        .set_index('A'))
portfolio = (
    pd.DataFrame(
        dict(A=[1, 2, 2, 3, 3, 3, 4],
             d=["nl0", "nl3", "gb0",
                "gb0", "lu4", "nl5", 'EMPTY'],
             e=["ABN", "Robeco", "Royal", "Royal",
                "AAB", "Postbank", 'EMPTY'],
             f=[1.0, 0.4, 0.6, 0.15, 0.6, 0.25, 1.0]),
        columns=['A', 'd', 'e', 'f'])
        .set_index(['A', 'd']))
result = household.join(portfolio, how='inner')

print household 
     B     C
  A         
  1  0  19.3
  2  1  31.7
  3  0  29.0

print portfolio
                  e     f
  A d                    
  1 nl0         ABN  1.00
  2 nl3      Robeco  0.40
    gb0       Royal  0.60
  3 gb0       Royal  0.15
    lu4         AAB  0.60
    nl5    Postbank  0.25
  4 EMPTY     EMPTY  1.00

print result
         B     C         e     f
  A d                           
  1 nl0  0  19.3       ABN  1.00
  2 nl3  1  31.7    Robeco  0.40
    gb0  1  31.7     Royal  0.60
  3 gb0  0  29.0     Royal  0.15
    lu4  0  29.0       AAB  0.60
    nl5  0  29.0  Postbank  0.25

print result.columns
  MultiIndex(levels=[[1, 2, 3], [u'EMPTY', u'gb0', u'lu4', u'nl0', u'nl3', u'nl5']],
             labels=[[0, 1, 1, 2, 2, 2], [3, 4, 1, 1, 2, 5]],
             names=[u'A', u'd'])

Problem description

The result looks okay but I think the 'EMPTY' should be dropped from the MultiIndex.

Expected Output

  MultiIndex(levels=[[1, 2, 3], [ u'gb0', u'lu4', u'nl0', u'nl3', u'nl5']],
             labels=[[0, 1, 1, 2, 2, 2], [2, 3, 0, 0, 1, 4]],
             names=[u'A', u'd'])

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 2.7.13.final.0 python-bits: 64 OS: Darwin OS-release: 16.5.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: None.None

pandas: 0.19.2
nose: 1.3.7
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.11.3
scipy: 0.18.1
statsmodels: 0.6.1
xarray: None
IPython: 5.1.0
sphinx: 1.5.1
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: 1.2.0
tables: 3.3.0
numexpr: 2.6.1
matplotlib: 2.0.0
openpyxl: 2.4.1
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.7.2
bs4: 4.5.3
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.1.5
pymysql: None
psycopg2: None
jinja2: 2.9.4
boto: 2.45.0
pandas_datareader: None

@TomAugspurger
Copy link
Contributor

See #2770. This is a detail of how multiindexes (currently) work.

You can use http://pandas-docs.github.io/pandas-docs-travis/generated/pandas.MultiIndex.remove_unused_levels.html?highlight=remove_unused#pandas.MultiIndex.remove_unused_levels to remove the unused levels afterwards.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants