Joining MultiIndexes with NaNs treats NaN as a match-any (unlike regular index joins) #25138
Labels
Bug
Missing-data
np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
MultiIndex
Reshaping
Concat, Merge/Join, Stack/Unstack, Explode
Thank you so much for your work on pandas 🤗
I don't sufficiently understand
pd.concat
and MultiIndex (yet!) to explain what is going on here, so I hope someone more knowledgeable can shed light on these curious symptoms:Code Sample
The same results can be achieved with DataFrames in (at least) these other ways, so in that way it is very consistent 😀
Problem description
I am surprised to see NaNs being filled out for
left
when the index is a MultiIndex. This is because I would expect the same underlying logic for joining on MultiIndex as on a regular Index: Either NaN counts as a match-all value (MultiIndex behaviour), or NaN just matches theother
's NaN like a regular value (regular Index behaviour).I failed to find docs describing either concat'ing of MultiIndex with NaNs or regular Index with NaNs, so I am unsure if this is intended behaviour.
Expected Output
I expect the two approaches to have the same output, be it NaN as a match-all or NaN as a regular value.
Output of
pd.show_versions()
Tested in two pandas versions.
pandas 0.24.1:
[paste the output of
pd.show_versions()
here below this line]INSTALLED VERSIONS
commit: None
python: 3.7.1.final.0
python-bits: 64
OS: Linux
OS-release: 4.18.0-13-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.24.1
pytest: None
pip: 18.1
setuptools: 40.6.3
Cython: None
numpy: 1.15.4
scipy: 1.2.0
pyarrow: None
xarray: None
IPython: 7.2.0
sphinx: None
patsy: None
dateutil: 2.7.5
pytz: 2018.9
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml.etree: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None
pandas 0.23.4:
pandas: 0.23.4
pytest: None
pip: 18.1
setuptools: 40.6.3
Cython: None
numpy: 1.15.4
scipy: 1.2.0
pyarrow: None
xarray: None
IPython: 7.2.0
sphinx: None
patsy: None
dateutil: 2.7.5
pytz: 2018.7
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: 2.5.12
xlrd: 1.2.0
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: