Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataFrame.update silently does nothing when indices are of differing type #19905

Open
birdcolour opened this issue Feb 26, 2018 · 3 comments
Open
Labels

Comments

@birdcolour
Copy link

Code to reproduce

import numpy as np

df_int = pd.DataFrame(
    {'col': ['foo', 'bar', np.nan]},
    index=[1,2,3]
)
df_obj = pd.DataFrame(
    {'col': [np.nan, np.nan, 'baz']},
    index=['1', '2', '3']
)

print(df_int)
print(df_obj)

# >>>
#    col
# 1  foo
# 2  bar
# 3  NaN
#    col
# 1  NaN
# 2  NaN
# 3  baz

# Note that the indices appear identical, but are actually different dtypes

df_int.update(df_obj)
print(df_int)

# Intended output
# >>>
#    col
# 1  foo
# 2  bar
# 3  baz

# Actual output
# >>>
#      a
# 1  foo
# 2  bar
# 3  NaN

Problem description

Since update compares values of indices, when two dataframes with differing index dtypes are compared, it is possible that no matches are made when this is not the intended behaviour the user expects, and there is no feedback to the user that this has happened. This is particularly surprising when indices appear to be identical, as highlighted above. A warning should be raised to signal that either:

  • tells the user that the indices are not the same type, which may produce some unintended results.
  • states that a type comparison is taking place that will never produce any matches.
@birdcolour
Copy link
Author

Related to #4094

@TomAugspurger
Copy link
Contributor

states that a type comparison is taking place that will never produce any matches.

This is hard to do in general. In your case, a regular Index can contain objects with any type, including the same type as the Int64Index.

@gfyoung
Copy link
Member

gfyoung commented Mar 2, 2018

@TomAugspurger : That being said, we shouldn't aligning on indices that are clearly not equal (i.e. string and numeric), so this is still a bug IMO.

@gfyoung gfyoung added Bug Indexing Related to indexing on series/frames, not to indexes themselves labels Mar 2, 2018
@mroeschke mroeschke removed the Indexing Related to indexing on series/frames, not to indexes themselves label Jun 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants