Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Series (DataFrame column) inplace interpolate UnboundLocalError #6281

Closed
abrakababra opened this issue Feb 6, 2014 · 13 comments · Fixed by #6284
Closed

Series (DataFrame column) inplace interpolate UnboundLocalError #6281

abrakababra opened this issue Feb 6, 2014 · 13 comments · Fixed by #6284
Labels
API Design Bug Dtype Conversions Unexpected or buggy dtype conversions
Milestone

Comments

@abrakababra
Copy link

Hello, I recently stumbled across this:

df1 = pd.DataFrame({'a':[1.,2.,3.,4.,nan,6,7,8]}, dtype='<f4')
df1.a.interpolate(inplace=True, downcast=None)


---------------------------------------------------------------------------
UnboundLocalError                         Traceback (most recent call last)
<ipython-input-9-bf0994f72664> in <module>()
----> 1 df1.a.interpolate(inplace=True, downcast=None)

C:\Program Files\Python27\lib\site-packages\pandas\core\generic.pyc in interpolate(self, method, axis, limit, inplace, downcast, **kwargs)
   2304         if axis == 1:
   2305             res = res.T
-> 2306         return res
   2307 
   2308     #----------------------------------------------------------------------

UnboundLocalError: local variable 'res' referenced before assignment

The problem arises with setting inplace to True regardless of 'downcast'.

But speaking of which: running interpolate() without any options, I would excpect to keep data types by default! I often chunk through datasets and glue them together with to_hdf. It took me a while to figure out that a float column in one chunk had just zeros in it so interpolate downcasted it to int => to_hdf raised on appending.

Thx in advance

@jreback
Copy link
Contributor

jreback commented Feb 6, 2014

can you show: python ci/print_versions.py

@jreback
Copy link
Contributor

jreback commented Feb 6, 2014

interpolation by definition will convert in general to float64. (generally you also start with float64 because if you are trying to interpolate you need nan's to interpolate). dtype conversion is controlled by the downcast kw to avoid a performance penalty.

@jreback jreback added the Bug label Feb 6, 2014
@jreback jreback added this to the 0.14.0 milestone Feb 6, 2014
@jreback
Copy link
Contributor

jreback commented Feb 6, 2014

cc @TomAugspurger can you have a look at this.

@TomAugspurger
Copy link
Contributor

I've got a PR for the Error coming, just sloppy coding on my part.

For the dtypes issue, downcast=None works correctly right? It's just that the default seems weird?

@jreback
Copy link
Contributor

jreback commented Feb 6, 2014

default is like this because you can have float looking turn back into floats, e.g. 1., 2., 3. would become int, which is odd as they start as float64. iow we don't upcast unless explicity done.

@TomAugspurger
Copy link
Contributor

I think @abrakababra was surprised because it did covert from float to int. We have the default as downcast='infer'

In [11]: df = pd.DataFrame([1., 2., 3., 4.])

In [12]: df.dtypes
Out[12]: 
0    float64
dtype: object

In [13]: df.interpolate().dtypes
Out[13]: 
0    int64
dtype: object

@jreback
Copy link
Contributor

jreback commented Feb 6, 2014

@TomAugspurger oh..maybe because of back compat....

what happens if you turn it off?

@TomAugspurger
Copy link
Contributor

Previously Series.interpolate just used numpy.interp which preserved the float dtype.

This is from 0.11:

In [18]: s
Out[18]: 
0     1
1     2
2   NaN
3     4
Name: a, dtype: float64

In [19]: s.interpolate()
Out[19]: 
0    1
1    2
2    3
3    4
Name: a, dtype: float64

So I don't think it was because of backwards compat. Turning it off (with downcast=None) preserves the float dtype. A bunch of tests fails, but just because I wrote them to expect the recast as int if possible.

@abrakababra
Copy link
Author

Yes, just surprised about downcast='infer' as default. Glad about having this option for whenever it may become handy but in my case (appending to a existing column) it was not what I wanted/expected.
@TomAugspurger Thanks for fixing the 'inplace=True' issue :-)

@jreback
Copy link
Contributor

jreback commented Feb 6, 2014

@TomAugspurger I have no problem mayking downcast=None as default, just doc as an API change, and see if any problem if you just change the tests

@TomAugspurger
Copy link
Contributor

@abrakababra thanks for the report.

@jreback agreed about the API change. I'll fix it this afternoon.

@TomAugspurger
Copy link
Contributor

@jreback Had a problem come up with changing the downcast to None. If you start out with a DataFrame like

In [14]: df = DataFrame({'A': [1, 2, np.nan, 4, 5, np.nan, 7],
   ....:                 'C': [1, 2, 3, 5, 8, 13, 21]})

In [15]: df.dtypes
Out[15]: 
A    float64
C      int64
dtype: object

So one float, one int, with no NaNs in the int. We apply the interpolation method to each block, so it gets applied to the int block/column. The interpolation will generally return a float dtype.

In [18]: df.interpolate(method='cubic', downcast=None).dtypes
Out[18]: 
A    float64
C    float64
dtype: object

So the int column was changed to float. With downcast='infer' we'd get 2 int columns. Thoughts? I suppose I could only apply the interpolation method to columns with at least one null value.

@jreback
Copy link
Contributor

jreback commented Feb 6, 2014

@TomAugspurger

could 'skip' columns that don't need interpolation (e.g. don't have nans), sounds ok to me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Bug Dtype Conversions Unexpected or buggy dtype conversions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants