-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: update should try harder to preserve dtypes #4094
Comments
Is this marked as fixed? I'm using pandas 0.19.1 and I have the following problem:
The dtype of both a.bool_column and b.bool_column are bool, but after |
it's marked as open |
upd |
Same issue for the |
This is also a problem with the new nullable integer types. In [1]: df = pd.DataFrame({'A': ['a', 'b', 'c'], 'B': ['d', 'e', 'f']}, dtype='string')
In [2]: df.dtypes
Out[2]:
A string
B string
dtype: object
In [3]: df2 = pd.DataFrame({'A': ['a2', 'b2', 'c2'], 'B': ['d', 'e', 'f']}, dtype='string')
In [4]: df2.dtypes
Out[4]:
A string
B string
dtype: object
In [5]: df.update(df2)
In [6]: df.dtypes
Out[6]:
A object
B object
dtype: object |
Just confirming that this seems to be an issue with all the new-style dtypes. I wrote a quick test. Given: new_types = {
"Int64": pd.DataFrame({"A": [1, 2, 3], "B": [1, 2, 3],}, dtype="Int64"),
"UInt32": pd.DataFrame({"A": [1, 2, 3], "B": [1, 2, 3],}, dtype="UInt32"),
"boolean": pd.DataFrame({"A": [True, False], "B": [True, False],}, dtype="boolean"),
"string": pd.DataFrame({"A": ["1", "2", "3"], "B": ["1", "2", "3"],}, dtype="string"),
}
old_types = {
"int64": pd.DataFrame({"A": [1, 2, 3], "B": [1, 2, 3],}, dtype="int64"),
"float": pd.DataFrame({"A": [1, 2, 3], "B": [1, 2, 3],}, dtype="float"),
"bool": pd.DataFrame({"A": [True, False], "B": [True, False],}, dtype="bool"),
} For each dtype/df pair, more or less: for dtype, df in new_types.items():
df2 = df.select_dtypes(dtype)
df.update(df2) And then printed the dtypes of the original df, the intermediate/selected df2 (as a sanity check to ensure it wasn't responsible for modifying the dtypes), and then the updated df. The output was:
You can see that old-style dtypes are preserved, but Additionally, this looks to be the cause of Lines 8213 to 8217 in 9c9789c
|
This looks to be int now and I believe we have a test for this already so closing
|
The issue seems with passing a single element Series, as the warning states. But slicing the only element of the Series, doesn't pose any warning for Pandas:
|
more examples in #13957
http://stackoverflow.com/questions/17398216/unwanted-type-conversion-in-pandas-dataframe-update
The text was updated successfully, but these errors were encountered: