Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MultiIndex level dropped when multiplying two series with a single entry #25891

Open
mellesies opened this issue Mar 27, 2019 · 3 comments
Open
Labels
Bug MultiIndex Numeric Operations Arithmetic, Comparison, and Logical operations

Comments

@mellesies
Copy link

When multiplying two Series with overlapping MultiIndices and a single entry, pandas drops levels from the MultiIndex. This doesn't happen when either Series contains multiple entries.

Using pandas 0.24.1 and Numpy 1.16.2

This works fine

# Create Series s1, with a scope over {T, N} and set two values
index1 = pd.MultiIndex.from_product([[], []], names=['T', 'N'])
s1 = pd.Series(index=index1)
s1['T.1A', 'N.0'] = 0.5
s1['T.1B', 'N.0'] = 0.5

# Create Series s2 with a scope over {N, M} and set a single value
index2 = pd.MultiIndex.from_product([[], []], names=['N', 'M'])
s2 = pd.Series(index=index2)
s2['N.0', 'M.0'] = 0.5

# When multiplying s1 and s2 pandas will align the Series using the index.
# Note that the index has 3 levels: {N, T, M}
print(s1 * s2)

# Prints:
# N    T     M  
# N.0  T.1A  M.0    0.25
#      T.1B  M.0    0.25
# dtype: float64

This doesn't return the expected index levels

# Create Series s1, with a scope over {T, N} and set a single value
index1 = pd.MultiIndex.from_product([[], []], names=['T', 'N'])
s1 = pd.Series(index=index1)
s1['T.1A', 'N.0'] = 0.5

# Create Series s2 with a scope over {N, M} and set a single value
index2 = pd.MultiIndex.from_product([[], []], names=['N', 'M'])
s2 = pd.Series(index=index2)
s2['N.0', 'M.0'] = 0.5

# Multiply s1 and s2. Correctly yields 0.25. But where did index level 'M' go!?
print(s1 * s2)

# Prints:
# T     N  
# T.1A  N.0    0.25
# dtype: float64

Problem description

Index level 'M' is dropped in the result. It appears the index from the first Series, s1 is carried over?

Expected Output

I'd expect the result to have an index that is essentially the union of the indices from s1 and s2.

# N    T     M  
# N.0  T.1A  M.0    0.25
# dtype: float64

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.8.final.0
python-bits: 64
OS: Darwin
OS-release: 18.2.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: en_US.utf-8
LANG: None
LOCALE: en_US.UTF-8

pandas: 0.24.1
pytest: None
pip: 19.0.3
setuptools: 40.6.2
Cython: None
numpy: 1.16.2
scipy: None
pyarrow: None
xarray: None
IPython: 7.3.0
sphinx: None
patsy: None
dateutil: 2.8.0
pytz: 2018.9
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml.etree: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

@WillAyd
Copy link
Member

WillAyd commented Mar 28, 2019

@toobaz thoughts here? I didn't realize we would infer alignment here at all so surprised either of these examples work though you would certainly have better insight

@toobaz
Copy link
Member

toobaz commented Mar 28, 2019

@WillAyd I'm surprised too. The operations shown should all return Series of NaNs.

@mellesies you say

I'd expect the result to have an index that is essentially the union of the indices from s1 and s2.

... but although this might look intuitive in the example proposed, I think it is very difficult to generalize cleanly. What if for instance two Series have MultiIndexes with same level names but no overlapping values? Truth is: in current pandas, we typically don't attribute too much relevance to level names in alignment (as opposed to their position).

This said, in case you were "inspired" by any reference in the docs to such nested alignment, could you share a pointer?

@mellesies
Copy link
Author

This wasn't inspired by the docs. It "just worked", also on other levels (and I was very happy it did ;-). For example, where the indices don't align, NaNs are returned:

# Create Series s1, with a scope over {T, N} and set two values
index1 = pd.MultiIndex.from_product([[], []], names=['T', 'N'])
s1 = pd.Series(index=index1)
s1['T.1A', 'N.0'] = 0.5
s1['T.1A', 'N.1'] = 0.5

# s1:
# T     N  
# T.1A  N.0    0.5
#       N.1    0.5
# dtype: float64

# Create Series s2 with a scope over {N, M} and set a single value. 
# Note that 'N.1' is missing in the index.
index2 = pd.MultiIndex.from_product([[], []], names=['N', 'M'])
s2 = pd.Series(index=index2)
s2['N.0', 'M.0'] = 0.5

# s2:
# N    M  
# N.0  M.0    0.5
# dtype: float64

# When multiplying s1 and s2 pandas will align the Series using the index.
# Note that the index hs 3 levels: {N, T, M}
print(s1 * s2)

# N    T     M  
# N.0  T.1A  M.0    0.25
# N.1  T.1A  NaN     NaN

When there are no overlapping indices, pandas will raise a ValueError, stating it cannot join without overlapping index names. I guess the expected/desired behaviour for non-overlapping indices depends on the use case, so I've implemented some code that computes the outer product for this situation (I'm working with probability tables/factors). But the error messages suggests it was intentionally implemented?

# Create Series s1, with a scope over {T, N}.
index1 = pd.MultiIndex.from_product([[], []], names=['T', 'N'])
s1 = pd.Series(index=index1)
s1['T.1A', 'N.0'] = 0.5
s1['T.1A', 'N.1'] = 0.5

# Create Series s2 with a scope over {X, Y}.
index2 = pd.MultiIndex.from_product([[], []], names=['X', 'Y'])
s2 = pd.Series(index=index2)
s2['x0', 'y0'] = 0.5
s2['x1', 'y0'] = 0.5

# This won't work ...
try:
    s1 * s2
except ValueError as e:
    print(e)

# cannot join with no overlapping index names

@mroeschke mroeschke added Bug Numeric Operations Arithmetic, Comparison, and Logical operations labels Jun 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug MultiIndex Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

No branches or pull requests

4 participants