Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Single-level tuple index to multi-level index on pd.concat #24783

Open
harisbal opened this issue Jan 15, 2019 · 4 comments
Open

Single-level tuple index to multi-level index on pd.concat #24783

harisbal opened this issue Jan 15, 2019 · 4 comments
Labels
Bug Nested Data Data where the values are collections (lists, sets, dicts, objects, etc.). Reshaping Concat, Merge/Join, Stack/Unstack, Explode

Comments

@harisbal
Copy link
Contributor

harisbal commented Jan 15, 2019

import pandas as pd
import numpy as np

s1 = pd.Series(np.random.randn(2), index=[('a', 'b'), ('x', 'y', 'z')])
s1.index.name = 'Idx'
s1.name = 's1'

s2 = pd.Series(np.random.randn(2), index=[('a', 'b'), ('j', 'k', 'l')])
s2.index.name = 'Idx'
s2.name = 's2'

# Result
pd.concat([s1, s2], axis=1, sort=False)

Problem description

Concatenating two series with a tuple as index, results in a multi-indexed dataframe

Expected Output

s1.to_frame().join(s2)

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line]
INSTALLED VERSIONS

commit: None
python: 3.6.6.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.24.0.dev0+1028.gdb2066b7d
pytest: 3.8.1
pip: 18.1
setuptools: 40.2.0
Cython: 0.28.5
numpy: 1.15.1
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 5.0.0
sphinx: None
patsy: None
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: None
tables: 3.4.4
numexpr: 2.6.8
feather: None
matplotlib: 2.2.3
openpyxl: 2.5.6
xlrd: 1.1.0
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 1.0.1
sqlalchemy: 1.2.11
pymysql: 0.9.2
psycopg2: 2.7.5 (dt dec pq3 ext lo64)
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

@simonjayhawkins
Copy link
Member

i guess the expected should be

s1.to_frame().join(s2, how='outer')

which gives

s1 s2
Idx
(a, b) -0.635822 0.036792
(j, k, l) NaN 2.380095
(x, y, z) 1.149677 NaN

so pd.concat([s1, s2], axis=1, sort=False) also appears to be losing values

s1 s2
a b NaN NaN NaN
x y z 1.149677 NaN
j k l NaN 2.380095

@simonjayhawkins
Copy link
Member

@WillAyd #24687 (comment). is this issue a bug?

@simonjayhawkins
Copy link
Member

xref #24688

@summonholmes
Copy link

@WillAyd #24687 (comment). is this issue a bug?

Just closed it. This issue was more of an esoteric demonstration, where too many nested tuples results in some breakages (perhaps due to underlying bugs). There is no support for what I was doing, but how was I supposed to know? I could get away with it without raising an error, until lib.clean_index_list() started returning strangely shaped ndarrays.

@mroeschke mroeschke added Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels May 27, 2019
@jbrockmendel jbrockmendel added the Nested Data Data where the values are collections (lists, sets, dicts, objects, etc.). label Sep 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Nested Data Data where the values are collections (lists, sets, dicts, objects, etc.). Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

No branches or pull requests

5 participants