-
-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should DataFrame.merge match NaN with NaN? #22491
Comments
I was able to solve my problem with
Nevertheless, I still feel like this shouldn't be the default behavior for DataFrame.merge. What do you think? |
Hmm yea I don't think the NA values should be producing a match here - @TomAugspurger any thoughts? |
Agreed, I also would not expect NAs to match here. |
I ran into this today with a dataset. In my case, I wanted a merge with an outer join, but I saw the same My workaround was merging data frames like this (adapted to match the example above): data = pd.merge(df1[df1['b'].notnull()],
df2[df2['d'].notnull()], how='outer',
left_on='b', right_on='d')
data = pd.concat([data, df1[df1['b'].isnull()],
df2[df2['d'].isnull()]],
ignore_index=True, sort=False) I didn't know that merge would match Anyway, +1 on not matching |
I kind of feel like |
I agree that this behavior is unexpected. Since |
Closing as duplicate of #32306 with a more recent discussion on the future policy we want. |
Code Sample, a copy-pastable example if possible
Problem description
df1:
df2:
Current output:
Expected Output
What's happening is the NaN is
df1.b
is matching the NaNs indf2.d
.I don't see a situation in which this would be desirable behavior, but if such a situation exists, surely the opposite is also conceivable, and so there should be some documented option in DataFrame.merge which accomplishes this.
What do you think?
Output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.7.0.final.0
python-bits: 64
OS: Darwin
OS-release: 17.7.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.23.4
pytest: None
pip: 18.0
setuptools: 39.0.1
Cython: None
numpy: 1.15.1
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 6.5.0
sphinx: None
patsy: None
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.2.3
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 1.0.1
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: