MultiIndex level dropped when multiplying two series with a single entry #25891

mellesies · 2019-03-27T10:22:49Z

When multiplying two Series with overlapping MultiIndices and a single entry, pandas drops levels from the MultiIndex. This doesn't happen when either Series contains multiple entries.

Using pandas 0.24.1 and Numpy 1.16.2

This works fine

# Create Series s1, with a scope over {T, N} and set two values
index1 = pd.MultiIndex.from_product([[], []], names=['T', 'N'])
s1 = pd.Series(index=index1)
s1['T.1A', 'N.0'] = 0.5
s1['T.1B', 'N.0'] = 0.5

# Create Series s2 with a scope over {N, M} and set a single value
index2 = pd.MultiIndex.from_product([[], []], names=['N', 'M'])
s2 = pd.Series(index=index2)
s2['N.0', 'M.0'] = 0.5

# When multiplying s1 and s2 pandas will align the Series using the index.
# Note that the index has 3 levels: {N, T, M}
print(s1 * s2)

# Prints:
# N    T     M  
# N.0  T.1A  M.0    0.25
#      T.1B  M.0    0.25
# dtype: float64

This doesn't return the expected index levels

# Create Series s1, with a scope over {T, N} and set a single value
index1 = pd.MultiIndex.from_product([[], []], names=['T', 'N'])
s1 = pd.Series(index=index1)
s1['T.1A', 'N.0'] = 0.5

# Create Series s2 with a scope over {N, M} and set a single value
index2 = pd.MultiIndex.from_product([[], []], names=['N', 'M'])
s2 = pd.Series(index=index2)
s2['N.0', 'M.0'] = 0.5

# Multiply s1 and s2. Correctly yields 0.25. But where did index level 'M' go!?
print(s1 * s2)

# Prints:
# T     N  
# T.1A  N.0    0.25
# dtype: float64

Problem description

Index level 'M' is dropped in the result. It appears the index from the first Series, s1 is carried over?

Expected Output

I'd expect the result to have an index that is essentially the union of the indices from s1 and s2.

# N    T     M  
# N.0  T.1A  M.0    0.25
# dtype: float64

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit: None
python: 3.6.8.final.0
python-bits: 64
OS: Darwin
OS-release: 18.2.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: en_US.utf-8
LANG: None
LOCALE: en_US.UTF-8

pandas: 0.24.1
pytest: None
pip: 19.0.3
setuptools: 40.6.2
Cython: None
numpy: 1.16.2
scipy: None
pyarrow: None
xarray: None
IPython: 7.3.0
sphinx: None
patsy: None
dateutil: 2.8.0
pytz: 2018.9
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml.etree: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

The text was updated successfully, but these errors were encountered:

WillAyd · 2019-03-28T02:10:08Z

@toobaz thoughts here? I didn't realize we would infer alignment here at all so surprised either of these examples work though you would certainly have better insight

toobaz · 2019-03-28T06:45:34Z

@WillAyd I'm surprised too. The operations shown should all return Series of NaNs.

@mellesies you say

I'd expect the result to have an index that is essentially the union of the indices from s1 and s2.

... but although this might look intuitive in the example proposed, I think it is very difficult to generalize cleanly. What if for instance two Series have MultiIndexes with same level names but no overlapping values? Truth is: in current pandas, we typically don't attribute too much relevance to level names in alignment (as opposed to their position).

This said, in case you were "inspired" by any reference in the docs to such nested alignment, could you share a pointer?

mellesies · 2019-03-28T09:18:11Z

This wasn't inspired by the docs. It "just worked", also on other levels (and I was very happy it did ;-). For example, where the indices don't align, NaNs are returned:

# Create Series s1, with a scope over {T, N} and set two values
index1 = pd.MultiIndex.from_product([[], []], names=['T', 'N'])
s1 = pd.Series(index=index1)
s1['T.1A', 'N.0'] = 0.5
s1['T.1A', 'N.1'] = 0.5

# s1:
# T     N  
# T.1A  N.0    0.5
#       N.1    0.5
# dtype: float64

# Create Series s2 with a scope over {N, M} and set a single value. 
# Note that 'N.1' is missing in the index.
index2 = pd.MultiIndex.from_product([[], []], names=['N', 'M'])
s2 = pd.Series(index=index2)
s2['N.0', 'M.0'] = 0.5

# s2:
# N    M  
# N.0  M.0    0.5
# dtype: float64

# When multiplying s1 and s2 pandas will align the Series using the index.
# Note that the index hs 3 levels: {N, T, M}
print(s1 * s2)

# N    T     M  
# N.0  T.1A  M.0    0.25
# N.1  T.1A  NaN     NaN

When there are no overlapping indices, pandas will raise a ValueError, stating it cannot join without overlapping index names. I guess the expected/desired behaviour for non-overlapping indices depends on the use case, so I've implemented some code that computes the outer product for this situation (I'm working with probability tables/factors). But the error messages suggests it was intentionally implemented?

# Create Series s1, with a scope over {T, N}.
index1 = pd.MultiIndex.from_product([[], []], names=['T', 'N'])
s1 = pd.Series(index=index1)
s1['T.1A', 'N.0'] = 0.5
s1['T.1A', 'N.1'] = 0.5

# Create Series s2 with a scope over {X, Y}.
index2 = pd.MultiIndex.from_product([[], []], names=['X', 'Y'])
s2 = pd.Series(index=index2)
s2['x0', 'y0'] = 0.5
s2['x1', 'y0'] = 0.5

# This won't work ...
try:
    s1 * s2
except ValueError as e:
    print(e)

# cannot join with no overlapping index names

WillAyd added the MultiIndex label Mar 28, 2019

mroeschke added Bug Numeric Operations Arithmetic, Comparison, and Logical operations labels Jun 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MultiIndex level dropped when multiplying two series with a single entry #25891

MultiIndex level dropped when multiplying two series with a single entry #25891

mellesies commented Mar 27, 2019

INSTALLED VERSIONS

WillAyd commented Mar 28, 2019

toobaz commented Mar 28, 2019

mellesies commented Mar 28, 2019

MultiIndex level dropped when multiplying two series with a single entry #25891

MultiIndex level dropped when multiplying two series with a single entry #25891

Comments

mellesies commented Mar 27, 2019

This works fine

This doesn't return the expected index levels

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

WillAyd commented Mar 28, 2019

toobaz commented Mar 28, 2019

mellesies commented Mar 28, 2019

Output of `pd.show_versions()`