Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changing columns with += gives UnboundLocalError #4996

Closed
wcbeard opened this issue Sep 26, 2013 · 27 comments · Fixed by #5053
Closed

Changing columns with += gives UnboundLocalError #4996

wcbeard opened this issue Sep 26, 2013 · 27 comments · Fixed by #5053
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves
Milestone

Comments

@wcbeard
Copy link
Contributor

wcbeard commented Sep 26, 2013

In [1]: import numpy as np
In [2]: import pandas as pd
In [3]: np.random.seed(10)
In [4]: a = np.random.randint(0, 20, (6, 5))
In [5]: df = pd.DataFrame(a, columns=list('abcde'))
In [6]: df
Out[6]:
    a   b   c   d   e
0   9   4  15   0  17
1  16  17   8   9   0
2  10   8   4  19  16
3   4  15  11  11   1
4   8   4  14  17  19
5  13   5  13  19  13

Changing columns with + is fine

In [7]: df.columns = df.columns + '_x'
In [8]: df
Out[8]:
   a_x  b_x  c_x  d_x  e_x
0    9    4   15    0   17
1   16   17    8    9    0
2   10    8    4   19   16
3    4   15   11   11    1
4    8    4   14   17   19
5   13    5   13   19   13

The += operator on the other hand correctly changes the columns,

In [9]: df.columns += '_y'
In [10]: df.columns
Out[10]: Index([u'a_x_y', u'b_x_y', u'c_x_y', u'd_x_y', u'e_x_y'], dtype=object)

but gives an error when trying to display the whole dataframe:

In [11]: df
Out[11]: ---------------------------------------------------------------------------
UnboundLocalError                         Traceback (most recent call last)

Traceback is here. Not sure if it's an ipython or pandas issue. pd.__version__ == '0.12.0-371-g1434776'

@jreback
Copy link
Contributor

jreback commented Sep 26, 2013

this is a trivial fix, just define __iadd__ = __add__ in core/index.py

need a test, and I think it makes sense to disable methods that don't make sense on an index

mul,trudev,floordiv,pow
imul,itruediv,iflloordiv,ipow

disabling is easy (just set them like incore/base.py/FrozenNDArray )

do a PR?

@wcbeard
Copy link
Contributor Author

wcbeard commented Sep 26, 2013

do a PR?

I'd love to, but I'll probably have to wait till after work tomorrow if there's no rush.

@jreback
Copy link
Contributor

jreback commented Sep 26, 2013

no problem...

@wcbeard
Copy link
Contributor Author

wcbeard commented Sep 28, 2013

@jreback I think disabling __floordiv__ breaks pandas.tools.tests.test_pivot.TestPivotTable (there's a data.index // 5 line in there). How about disabling all but that one?

@jreback
Copy link
Contributor

jreback commented Sep 28, 2013

just make that

data.index.to_series() // 5

that is an incorrect usage (and the reason these should be disabled!)

@jreback
Copy link
Contributor

jreback commented Sep 28, 2013

pls also add a note in doc/source/release.rst (API section about this change, referencing the issue)..thanks

@wcbeard
Copy link
Contributor Author

wcbeard commented Sep 28, 2013

about the __iadd__ fix, or just the disabled methods?

@jreback
Copy link
Contributor

jreback commented Sep 28, 2013

you fixed iadd? I meant also isub too (__isub__ = __sub__

@wcbeard
Copy link
Contributor Author

wcbeard commented Sep 29, 2013

Well, fixed insofar as I aliased to add. I'll do isub too.

Sent from my iPhone

On Sep 28, 2013, at 7:35 PM, jreback notifications@github.com wrote:

you fixed iadd? I meant also isub too (isub = sub


Reply to this email directly or view it on GitHub.

@jreback
Copy link
Contributor

jreback commented Sep 29, 2013

@d10genes sorry..didn't mean the ?. I saw that you fixed iadd, wanted you to add isub (which you just said you are doing)...so good!

@wcbeard
Copy link
Contributor Author

wcbeard commented Sep 29, 2013

Also, just so I understand you, redefining __isub__ this way will only make a difference for set difference between the index and another iterable (since there's not really a string - analogue to string concatenation with +)?

__isub__ currently breaks for set differences like this, and your suggestion would fix it, just want to make sure this was your intention (I didn't know about the set +/- operations with Index before...)

@jreback
Copy link
Contributor

jreback commented Sep 29, 2013

yep....+/- so set ops on Index (with a single string being coerced to an index). Possibly an odd design, but it does make sense for this type of object.

@wcbeard
Copy link
Contributor Author

wcbeard commented Sep 29, 2013

Ok, I understand, but I don't think subtracting a single string coerces it to an index:

In [1]: import pandas as pd

In [2]: a = pd.Index(list('abc'))

In [3]: a
Out[3]: Index([u'a', u'b', u'c'], dtype=object)

In [4]: a - 'a'

gives a Exception: Input must be iterable! (Index.diff does a check for __iter__ attribute at the beginning). Should this behavior (not coercing a string to index for -) be filed separately? Or did you mean something else?

@jreback
Copy link
Contributor

jreback commented Sep 29, 2013

that's a bug.....I think __sub__,__and__,__or__ should all be similar to __add__ in that they do the op if its an index, other wise wrap and try again.

go ahead and put it in here (add some tests as well)

you can make these wrapped methods if you want to avoid some boilerplate (e.g. have a function factory)

lmk

@wcbeard
Copy link
Contributor Author

wcbeard commented Sep 29, 2013

Ok, but how should the wrapping go? Currently, for my example above you can't do pd.Index('a') (TypeError: Index(...) must be called with a collection of some kind, 'a' was passed). Should we do something like coercing to index with Index(list('a')) (which would work like expected)?

If so, it seems preferable to do an if isinstance(arg, Index) check, rather than a try: method(arg); except: method(Index(list(arg))). Do you agree?

@jreback
Copy link
Contributor

jreback commented Sep 29, 2013

no...what I mean is make all of the methods like __add__ is now, when I say wrap, something like

def _wrap_op(f):

     def _wrapped(self, other):
         if not isinstance(other, Index):
              other = Index([ other ])
         return getattr(self,f)(other)
      return _wrapped

__add__ = _wrap_op('union')
__sub__ = _wrap_op('intersection')

@wcbeard
Copy link
Contributor Author

wcbeard commented Sep 29, 2013

Ok, I see what you mean. For the Index conversion part, __add__ uses Index(self.view(np.ndarray) + other). Don't fully understand it, but I'll try that for the wrapper.

@wcbeard
Copy link
Contributor Author

wcbeard commented Sep 29, 2013

Something implicit in that wrapper method is that Int64Index (and numerical indices) will no longer do addition the same way. Currently,

b = pd.Index([1, 2, 3])
b + 1

gives Int64Index([2, 3, 4], dtype=int64). This new way would instead convert to Index([1]) and then do a set union. (you'd need an explicit .to_series() conversion first for elementwise addition).

Is that how it should work?

@wcbeard
Copy link
Contributor Author

wcbeard commented Sep 29, 2013

Also, this would get rid of the string concatenating behavior that inspired this fix in the first place. Index's __add__ uses numpy array addition if the second arg isn't an Index, so that's how it does both string concatenation and integer/float addition (both of which would be bypassed by our wrapper method).

@jreback
Copy link
Contributor

jreback commented Sep 29, 2013

numpy is doing string summation so that makes sense. I guess do this. Leave __iadd__/__add__ with the conversion to an Index. In the other methods, convert to an Index if its not one (Index([ other ])) and then perform the op. disable __isub__,__iand__,__ior__.

you can test what happens in the test suite if you make __add__ do only set addition (rather than distribution), .e.g

pd.Index([1,2,3]) +1 == pd.Index([1,2,3])
pd.Index(['foo','bar']) + 'a' == pd.Index(['foo','bar','a'])

which actually make more sense

see if lots of things break

@jreback
Copy link
Contributor

jreback commented Sep 29, 2013

@d10genes after some discussion off-line.....let's leave the existing behavior....its used in places...

so just make the change to iadd and that's it for now....

@jtratner
Copy link
Contributor

@jreback the other thing we need to do in that exception is to set i=None before everything
because if enumerate fails both times in the generator, i is never bound, so it's not defined there. So instead of this:

                    if hasattr(e, 'args'):
                        k = res_index[i]
                        e.args = e.args + ('occurred at index %s' %
                                           com.pprint_thing(k),)

Should do

                    if hasattr(e, 'args'):
                        if i is not None:
                            k = res_index[i]
                            e.args = e.args + ('occurred at index %s' %
                                               com.pprint_thing(k),)

and change a few lines earlier to:

        i = None
        keys = []
        results = {}
        if ignore_failures:
            successes = []
            for i, v in enumerate(series_gen):

That way you can always use i

@jtratner
Copy link
Contributor

(which, I now understand, is what you meant about the generator producing an error)

@wcbeard
Copy link
Contributor Author

wcbeard commented Sep 29, 2013

@jreback Ok, just so I'm clear

  • iadd = add: do
  • isub = sub: do
  • disable those other magic methods: don't do
  • wrap sub, or, and with Index conversion: don't do

Also, hope it's ok that I'm not sure what @jtratner is talking about...

@jreback
Copy link
Contributor

jreback commented Sep 29, 2013

@jtratner can do that in another PR

  • iadd = add
  • disable all other i* methods (including isub)

leave everything else

@jtratner
Copy link
Contributor

@jreback why do you want to disable them? if they don't exist it'll just reassign the variable to the new object, right?

@jreback
Copy link
Contributor

jreback commented Sep 29, 2013

well, they dont' do the right thing if they are not converted to Index (as __add__ does), so either we fix that or disable (as they will raise that exception evenutally showing up in .apply). But I think they should simply be disabled unless/until we proactively decide otherwise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants