New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MultiIndex level dropped when multiplying two series with a single entry #25891
Comments
@toobaz thoughts here? I didn't realize we would infer alignment here at all so surprised either of these examples work though you would certainly have better insight |
@WillAyd I'm surprised too. The operations shown should all return Series of NaNs. @mellesies you say
... but although this might look intuitive in the example proposed, I think it is very difficult to generalize cleanly. What if for instance two This said, in case you were "inspired" by any reference in the docs to such nested alignment, could you share a pointer? |
This wasn't inspired by the docs. It "just worked", also on other levels (and I was very happy it did ;-). For example, where the indices don't align, NaNs are returned: # Create Series s1, with a scope over {T, N} and set two values
index1 = pd.MultiIndex.from_product([[], []], names=['T', 'N'])
s1 = pd.Series(index=index1)
s1['T.1A', 'N.0'] = 0.5
s1['T.1A', 'N.1'] = 0.5
# s1:
# T N
# T.1A N.0 0.5
# N.1 0.5
# dtype: float64
# Create Series s2 with a scope over {N, M} and set a single value.
# Note that 'N.1' is missing in the index.
index2 = pd.MultiIndex.from_product([[], []], names=['N', 'M'])
s2 = pd.Series(index=index2)
s2['N.0', 'M.0'] = 0.5
# s2:
# N M
# N.0 M.0 0.5
# dtype: float64
# When multiplying s1 and s2 pandas will align the Series using the index.
# Note that the index hs 3 levels: {N, T, M}
print(s1 * s2)
# N T M
# N.0 T.1A M.0 0.25
# N.1 T.1A NaN NaN When there are no overlapping indices, pandas will raise a ValueError, stating it cannot join without overlapping index names. I guess the expected/desired behaviour for non-overlapping indices depends on the use case, so I've implemented some code that computes the outer product for this situation (I'm working with probability tables/factors). But the error messages suggests it was intentionally implemented? # Create Series s1, with a scope over {T, N}.
index1 = pd.MultiIndex.from_product([[], []], names=['T', 'N'])
s1 = pd.Series(index=index1)
s1['T.1A', 'N.0'] = 0.5
s1['T.1A', 'N.1'] = 0.5
# Create Series s2 with a scope over {X, Y}.
index2 = pd.MultiIndex.from_product([[], []], names=['X', 'Y'])
s2 = pd.Series(index=index2)
s2['x0', 'y0'] = 0.5
s2['x1', 'y0'] = 0.5
# This won't work ...
try:
s1 * s2
except ValueError as e:
print(e)
# cannot join with no overlapping index names |
When multiplying two Series with overlapping MultiIndices and a single entry, pandas drops levels from the MultiIndex. This doesn't happen when either Series contains multiple entries.
Using pandas 0.24.1 and Numpy 1.16.2
This works fine
This doesn't return the expected index levels
Problem description
Index level 'M' is dropped in the result. It appears the index from the first Series,
s1
is carried over?Expected Output
I'd expect the result to have an index that is essentially the union of the indices from
s1
ands2
.Output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.8.final.0
python-bits: 64
OS: Darwin
OS-release: 18.2.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: en_US.utf-8
LANG: None
LOCALE: en_US.UTF-8
pandas: 0.24.1
pytest: None
pip: 19.0.3
setuptools: 40.6.2
Cython: None
numpy: 1.16.2
scipy: None
pyarrow: None
xarray: None
IPython: 7.3.0
sphinx: None
patsy: None
dateutil: 2.8.0
pytz: 2018.9
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml.etree: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None
The text was updated successfully, but these errors were encountered: