-
-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Series (DataFrame column) inplace interpolate UnboundLocalError #6281
Comments
can you show: python ci/print_versions.py |
interpolation by definition will convert in general to float64. (generally you also start with float64 because if you are trying to interpolate you need nan's to interpolate). dtype conversion is controlled by the downcast kw to avoid a performance penalty. |
cc @TomAugspurger can you have a look at this. |
I've got a PR for the Error coming, just sloppy coding on my part. For the dtypes issue, |
default is like this because you can have float looking turn back into floats, e.g. |
I think @abrakababra was surprised because it did covert from float to int. We have the default as In [11]: df = pd.DataFrame([1., 2., 3., 4.])
In [12]: df.dtypes
Out[12]:
0 float64
dtype: object
In [13]: df.interpolate().dtypes
Out[13]:
0 int64
dtype: object |
@TomAugspurger oh..maybe because of back compat.... what happens if you turn it off? |
Previously This is from 0.11: In [18]: s
Out[18]:
0 1
1 2
2 NaN
3 4
Name: a, dtype: float64
In [19]: s.interpolate()
Out[19]:
0 1
1 2
2 3
3 4
Name: a, dtype: float64 So I don't think it was because of backwards compat. Turning it off (with |
Yes, just surprised about downcast='infer' as default. Glad about having this option for whenever it may become handy but in my case (appending to a existing column) it was not what I wanted/expected. |
@TomAugspurger I have no problem mayking |
@abrakababra thanks for the report. @jreback agreed about the API change. I'll fix it this afternoon. |
@jreback Had a problem come up with changing the In [14]: df = DataFrame({'A': [1, 2, np.nan, 4, 5, np.nan, 7],
....: 'C': [1, 2, 3, 5, 8, 13, 21]})
In [15]: df.dtypes
Out[15]:
A float64
C int64
dtype: object So one float, one int, with no NaNs in the int. We apply the interpolation method to each block, so it gets applied to the int block/column. The interpolation will generally return a float dtype. In [18]: df.interpolate(method='cubic', downcast=None).dtypes
Out[18]:
A float64
C float64
dtype: object So the int column was changed to float. With |
could 'skip' columns that don't need interpolation (e.g. don't have nans), sounds ok to me |
Hello, I recently stumbled across this:
The problem arises with setting inplace to True regardless of 'downcast'.
But speaking of which: running interpolate() without any options, I would excpect to keep data types by default! I often chunk through datasets and glue them together with to_hdf. It took me a while to figure out that a float column in one chunk had just zeros in it so interpolate downcasted it to int => to_hdf raised on appending.
Thx in advance
The text was updated successfully, but these errors were encountered: