Changing columns with `+=` gives UnboundLocalError #4996

wcbeard · 2013-09-26T16:27:34Z

In [1]: import numpy as np
In [2]: import pandas as pd
In [3]: np.random.seed(10)
In [4]: a = np.random.randint(0, 20, (6, 5))
In [5]: df = pd.DataFrame(a, columns=list('abcde'))
In [6]: df
Out[6]:
    a   b   c   d   e
0   9   4  15   0  17
1  16  17   8   9   0
2  10   8   4  19  16
3   4  15  11  11   1
4   8   4  14  17  19
5  13   5  13  19  13

Changing columns with + is fine

In [7]: df.columns = df.columns + '_x'
In [8]: df
Out[8]:
   a_x  b_x  c_x  d_x  e_x
0    9    4   15    0   17
1   16   17    8    9    0
2   10    8    4   19   16
3    4   15   11   11    1
4    8    4   14   17   19
5   13    5   13   19   13

The += operator on the other hand correctly changes the columns,

In [9]: df.columns += '_y'
In [10]: df.columns
Out[10]: Index([u'a_x_y', u'b_x_y', u'c_x_y', u'd_x_y', u'e_x_y'], dtype=object)

but gives an error when trying to display the whole dataframe:

In [11]: df
Out[11]: ---------------------------------------------------------------------------
UnboundLocalError                         Traceback (most recent call last)

Traceback is here. Not sure if it's an ipython or pandas issue. pd.__version__ == '0.12.0-371-g1434776'

The text was updated successfully, but these errors were encountered:

jreback · 2013-09-26T17:32:45Z

this is a trivial fix, just define __iadd__ = __add__ in core/index.py

need a test, and I think it makes sense to disable methods that don't make sense on an index

mul,trudev,floordiv,pow
imul,itruediv,iflloordiv,ipow

disabling is easy (just set them like incore/base.py/FrozenNDArray )

do a PR?

wcbeard · 2013-09-26T17:43:40Z

do a PR?

I'd love to, but I'll probably have to wait till after work tomorrow if there's no rush.

jreback · 2013-09-26T17:45:08Z

no problem...

wcbeard · 2013-09-28T22:43:46Z

@jreback I think disabling __floordiv__ breaks pandas.tools.tests.test_pivot.TestPivotTable (there's a data.index // 5 line in there). How about disabling all but that one?

jreback · 2013-09-28T22:49:25Z

just make that

data.index.to_series() // 5

that is an incorrect usage (and the reason these should be disabled!)

jreback · 2013-09-28T23:12:21Z

pls also add a note in doc/source/release.rst (API section about this change, referencing the issue)..thanks

wcbeard · 2013-09-28T23:18:48Z

about the __iadd__ fix, or just the disabled methods?

jreback · 2013-09-28T23:35:00Z

you fixed iadd? I meant also isub too (__isub__ = __sub__

wcbeard · 2013-09-29T00:05:14Z

Well, fixed insofar as I aliased to add. I'll do isub too.

Sent from my iPhone

On Sep 28, 2013, at 7:35 PM, jreback notifications@github.com wrote:

you fixed iadd? I meant also isub too (isub = sub

—
Reply to this email directly or view it on GitHub.

jreback · 2013-09-29T00:07:22Z

@d10genes sorry..didn't mean the ?. I saw that you fixed iadd, wanted you to add isub (which you just said you are doing)...so good!

wcbeard · 2013-09-29T01:49:51Z

Also, just so I understand you, redefining __isub__ this way will only make a difference for set difference between the index and another iterable (since there's not really a string - analogue to string concatenation with +)?

__isub__ currently breaks for set differences like this, and your suggestion would fix it, just want to make sure this was your intention (I didn't know about the set +/- operations with Index before...)

jreback · 2013-09-29T02:19:17Z

yep....+/- so set ops on Index (with a single string being coerced to an index). Possibly an odd design, but it does make sense for this type of object.

wcbeard · 2013-09-29T02:38:56Z

Ok, I understand, but I don't think subtracting a single string coerces it to an index:

In [1]: import pandas as pd

In [2]: a = pd.Index(list('abc'))

In [3]: a
Out[3]: Index([u'a', u'b', u'c'], dtype=object)

In [4]: a - 'a'

gives a Exception: Input must be iterable! (Index.diff does a check for __iter__ attribute at the beginning). Should this behavior (not coercing a string to index for -) be filed separately? Or did you mean something else?

jreback · 2013-09-29T02:43:43Z

that's a bug.....I think __sub__,__and__,__or__ should all be similar to __add__ in that they do the op if its an index, other wise wrap and try again.

go ahead and put it in here (add some tests as well)

you can make these wrapped methods if you want to avoid some boilerplate (e.g. have a function factory)

lmk

wcbeard · 2013-09-29T03:15:48Z

Ok, but how should the wrapping go? Currently, for my example above you can't do pd.Index('a') (TypeError: Index(...) must be called with a collection of some kind, 'a' was passed). Should we do something like coercing to index with Index(list('a')) (which would work like expected)?

If so, it seems preferable to do an if isinstance(arg, Index) check, rather than a try: method(arg); except: method(Index(list(arg))). Do you agree?

jreback · 2013-09-29T03:31:41Z

no...what I mean is make all of the methods like __add__ is now, when I say wrap, something like

def _wrap_op(f):

     def _wrapped(self, other):
         if not isinstance(other, Index):
              other = Index([ other ])
         return getattr(self,f)(other)
      return _wrapped

__add__ = _wrap_op('union')
__sub__ = _wrap_op('intersection')

wcbeard · 2013-09-29T03:47:11Z

Ok, I see what you mean. For the Index conversion part, __add__ uses Index(self.view(np.ndarray) + other). Don't fully understand it, but I'll try that for the wrapper.

wcbeard · 2013-09-29T04:56:24Z

Something implicit in that wrapper method is that Int64Index (and numerical indices) will no longer do addition the same way. Currently,

b = pd.Index([1, 2, 3])
b + 1

gives Int64Index([2, 3, 4], dtype=int64). This new way would instead convert to Index([1]) and then do a set union. (you'd need an explicit .to_series() conversion first for elementwise addition).

Is that how it should work?

wcbeard · 2013-09-29T05:30:54Z

Also, this would get rid of the string concatenating behavior that inspired this fix in the first place. Index's __add__ uses numpy array addition if the second arg isn't an Index, so that's how it does both string concatenation and integer/float addition (both of which would be bypassed by our wrapper method).

jreback · 2013-09-29T16:46:30Z

numpy is doing string summation so that makes sense. I guess do this. Leave __iadd__/__add__ with the conversion to an Index. In the other methods, convert to an Index if its not one (Index([ other ])) and then perform the op. disable __isub__,__iand__,__ior__.

you can test what happens in the test suite if you make __add__ do only set addition (rather than distribution), .e.g

pd.Index([1,2,3]) +1 == pd.Index([1,2,3])
pd.Index(['foo','bar']) + 'a' == pd.Index(['foo','bar','a'])

which actually make more sense

see if lots of things break

jreback · 2013-09-29T17:52:50Z

@d10genes after some discussion off-line.....let's leave the existing behavior....its used in places...

so just make the change to iadd and that's it for now....

jtratner · 2013-09-29T19:00:29Z

@jreback the other thing we need to do in that exception is to set i=None before everything
because if enumerate fails both times in the generator, i is never bound, so it's not defined there. So instead of this:

                    if hasattr(e, 'args'):
                        k = res_index[i]
                        e.args = e.args + ('occurred at index %s' %
                                           com.pprint_thing(k),)

Should do

                    if hasattr(e, 'args'):
                        if i is not None:
                            k = res_index[i]
                            e.args = e.args + ('occurred at index %s' %
                                               com.pprint_thing(k),)

and change a few lines earlier to:

        i = None
        keys = []
        results = {}
        if ignore_failures:
            successes = []
            for i, v in enumerate(series_gen):

That way you can always use i

jtratner · 2013-09-29T19:00:55Z

(which, I now understand, is what you meant about the generator producing an error)

wcbeard · 2013-09-29T19:49:11Z

@jreback Ok, just so I'm clear

iadd = add: do
isub = sub: do
disable those other magic methods: don't do
wrap sub, or, and with Index conversion: don't do

Also, hope it's ok that I'm not sure what @jtratner is talking about...

jreback · 2013-09-29T19:55:03Z

@jtratner can do that in another PR

iadd = add
disable all other i* methods (including isub)

leave everything else

jtratner · 2013-09-29T20:00:24Z

@jreback why do you want to disable them? if they don't exist it'll just reassign the variable to the new object, right?

jreback · 2013-09-29T20:05:14Z

well, they dont' do the right thing if they are not converted to Index (as __add__ does), so either we fix that or disable (as they will raise that exception evenutally showing up in .apply). But I think they should simply be disabled unless/until we proactively decide otherwise.

wcbeard mentioned this issue Sep 30, 2013

BUG: fix Index's __iadd__ methods #5053

Merged

jreback closed this as completed in c66a5fd Oct 2, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Changing columns with `+=` gives UnboundLocalError #4996

Changing columns with `+=` gives UnboundLocalError #4996

wcbeard commented Sep 26, 2013

jreback commented Sep 26, 2013

wcbeard commented Sep 26, 2013

jreback commented Sep 26, 2013

wcbeard commented Sep 28, 2013

jreback commented Sep 28, 2013

jreback commented Sep 28, 2013

wcbeard commented Sep 28, 2013

jreback commented Sep 28, 2013

wcbeard commented Sep 29, 2013

jreback commented Sep 29, 2013

wcbeard commented Sep 29, 2013

jreback commented Sep 29, 2013

wcbeard commented Sep 29, 2013

jreback commented Sep 29, 2013

wcbeard commented Sep 29, 2013

jreback commented Sep 29, 2013

wcbeard commented Sep 29, 2013

wcbeard commented Sep 29, 2013

wcbeard commented Sep 29, 2013

jreback commented Sep 29, 2013

jreback commented Sep 29, 2013

jtratner commented Sep 29, 2013

jtratner commented Sep 29, 2013

wcbeard commented Sep 29, 2013

jreback commented Sep 29, 2013

jtratner commented Sep 29, 2013

jreback commented Sep 29, 2013

Changing columns with += gives UnboundLocalError #4996

Changing columns with += gives UnboundLocalError #4996

Comments

wcbeard commented Sep 26, 2013

jreback commented Sep 26, 2013

wcbeard commented Sep 26, 2013

jreback commented Sep 26, 2013

wcbeard commented Sep 28, 2013

jreback commented Sep 28, 2013

jreback commented Sep 28, 2013

wcbeard commented Sep 28, 2013

jreback commented Sep 28, 2013

wcbeard commented Sep 29, 2013

jreback commented Sep 29, 2013

wcbeard commented Sep 29, 2013

jreback commented Sep 29, 2013

wcbeard commented Sep 29, 2013

jreback commented Sep 29, 2013

wcbeard commented Sep 29, 2013

jreback commented Sep 29, 2013

wcbeard commented Sep 29, 2013

wcbeard commented Sep 29, 2013

wcbeard commented Sep 29, 2013

jreback commented Sep 29, 2013

jreback commented Sep 29, 2013

jtratner commented Sep 29, 2013

jtratner commented Sep 29, 2013

wcbeard commented Sep 29, 2013

jreback commented Sep 29, 2013

jtratner commented Sep 29, 2013

jreback commented Sep 29, 2013

Changing columns with `+=` gives UnboundLocalError #4996

Changing columns with `+=` gives UnboundLocalError #4996