Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataframe rename issue. #4403

Closed
halleygithub opened this issue Jul 30, 2013 · 31 comments · Fixed by #4410

Comments

@halleygithub
Copy link

commented Jul 30, 2013

I just upgrage from 0.11 to 0.12 version. And meet dataframe rename error caused by upgrading. (It worked well in 0.11) .

>>> df4
                 TClose      RT    TExg
STK_ID RPT_Date                        
600809 20130331   22.02  0.0454  0.0422

>>> df5
                 STK_ID  RPT_Date STK_Name  TClose
STK_ID RPT_Date                                   
600809 20120930  600809  20120930     山西汾酒   38.05
       20121231  600809  20121231     山西汾酒   41.66
       20130331  600809  20130331     山西汾酒   30.01

>>> k=pd.merge(df4, df5, how='inner', left_index=True, right_index=True)
>>> k
                 TClose_x      RT    TExg  STK_ID  RPT_Date STK_Name  TClose_y
STK_ID RPT_Date                                                               
600809 20130331     22.02  0.0454  0.0422  600809  20130331     山西汾酒     30.01

>>> k.rename(columns={'TClose_x':'TClose', 'TClose_y':'QT_Close'})
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "d:\Python27\lib\site-packages\pandas\core\base.py", line 40, in __repr__
    return str(self)
  File "d:\Python27\lib\site-packages\pandas\core\base.py", line 20, in __str__
    return self.__bytes__()
  File "d:\Python27\lib\site-packages\pandas\core\base.py", line 32, in __bytes__
    return self.__unicode__().encode(encoding, 'replace')
  File "d:\Python27\lib\site-packages\pandas\core\frame.py", line 668, in __unicode__
    self.to_string(buf=buf)
  File "d:\Python27\lib\site-packages\pandas\core\frame.py", line 1556, in to_string
    formatter.to_string()
  File "d:\Python27\lib\site-packages\pandas\core\format.py", line 294, in to_string
    strcols = self._to_str_columns()
  File "d:\Python27\lib\site-packages\pandas\core\format.py", line 239, in _to_str_columns
    str_columns = self._get_formatted_column_labels()
  File "d:\Python27\lib\site-packages\pandas\core\format.py", line 435, in _get_formatted_column_labels
    dtypes = self.frame.dtypes
  File "d:\Python27\lib\site-packages\pandas\core\frame.py", line 1696, in dtypes
    return self.apply(lambda x: x.dtype)
  File "d:\Python27\lib\site-packages\pandas\core\frame.py", line 4416, in apply
    return self._apply_standard(f, axis)
  File "d:\Python27\lib\site-packages\pandas\core\frame.py", line 4491, in _apply_standard
    raise e
TypeError: ("'NoneType' object is not iterable", u'occurred at index TExg')

>>> df4.dtypes
TClose    float64
RT        float64
TExg      float64
dtype: object

>>> df5.dtypes
STK_ID       object
RPT_Date     object
STK_Name     object
TClose      float64
dtype: object
>>> 
@jreback

This comment has been minimized.

Copy link
Contributor

commented Jul 30, 2013

can you supply a reproducible for these initial frames (e.g. a function which does it exactly)

e.g. something that can be evaled to created it because need to reproduce the unicode characters
(this is a unicode error), just happens to show up in the dtype printing

DataFrame([['foo',1.0....])

@cpcloud

This comment has been minimized.

Copy link
Member

commented Jul 30, 2013

i think that's a possibly spurious raise there...it should probably be a bare raise since NoneType not being iterable is not informative

@cpcloud

This comment has been minimized.

Copy link
Member

commented Jul 30, 2013

i can repro this using the above frames

@halleygithub please supply some code to create the above frames.

there's a bug in icol or BlockManager.iget

@cpcloud

This comment has been minimized.

Copy link
Member

commented Jul 30, 2013

ahh duplicate TExg block somehow...

@cpcloud

This comment has been minimized.

Copy link
Member

commented Jul 30, 2013

we really need to remove that raise e there that's only way i was able to figure out this was in internals

@jreback

This comment has been minimized.

Copy link
Contributor

commented Jul 30, 2013

no that raise is correct

just str(df)

@cpcloud

This comment has been minimized.

Copy link
Member

commented Jul 30, 2013

huh? the raise doesn't show the correct location of the exception because it catches everything

here's part of the traceback

/home/phillip/Documents/code/py/pandas/pandas/core/frame.pyc in dtypes(self)
   1685     @property
   1686     def dtypes(self):
-> 1687         return self.apply(lambda x: x.dtype)
   1688
   1689     def convert_objects(self, convert_dates=True, convert_numeric=False, copy=True):

/home/phillip/Documents/code/py/pandas/pandas/core/frame.pyc in apply(self, func, axis, broadcast, raw, args, **kwds)
   4397                     return self._apply_raw(f, axis)
   4398                 else:
-> 4399                     return self._apply_standard(f, axis)
   4400             else:
   4401                 return self._apply_broadcast(f, axis)

/home/phillip/Documents/code/py/pandas/pandas/core/frame.pyc in _apply_standard(self, func, axis, ignore_failures)
   4472                     # no k defined yet
   4473                     pass
-> 4474                 raise e
   4475
   4476

TypeError: ("'NoneType' object is not iterable", u'occurred at index TExg')

this doesn't tell me anything about the location of the raise except that it was somewhere in looping thru series_gen

only when i removed e did the full traceback show up

maybe there's a way to show that without removing the e...

how would it be different anyway? would the possibly caught NameError / UnboundLocalError be raised instead?

@cpcloud

This comment has been minimized.

Copy link
Member

commented Jul 30, 2013

In [4]: df4 = DataFrame({'TClose': [22.02], 'RT': [0.0454], 'TExg': [0.0422]}, index=MultiIndex.from_tuples([(600809, 20130331)], names=['STK_ID', 'RPT_Date']))

In [5]: df5 = DataFrame({'STK_ID': [600809] * 3, 'RPT_Date': [20120930,20121231,20130331], 'STK_Name': [u'饡驦', u'饡驦', u'饡驦'], 'TClose': [38.05, 41.66, 30.01]},index=MultiIndex.from_tuples([(600809, 20120930
), (600809, 20121231),(600809,20130331)], names=['STK_ID', 'RPT_Date']))

In [6]: k = merge(df4,df5,how='inner',left_index=True,right_index=True)

different characters but same error results.

@cpcloud

This comment has been minimized.

Copy link
Member

commented Jul 30, 2013

curiously if you type store k then restart ipython, type store -r k and then

k.rename(columns={'TClose_x':'TClose'})

the error does not show up 😠

@jreback

This comment has been minimized.

Copy link
Contributor

commented Jul 30, 2013

I think there is a pr out there to take out the e

but regardless the apply hits the error but its really in the construction

can u post your creation example?

@cpcloud

This comment has been minimized.

Copy link
Member

commented Jul 30, 2013

it's there

@cpcloud

This comment has been minimized.

Copy link
Member

commented Jul 30, 2013

this seems fishy

ipdb> self.items
Index([u'RT', u'TClose', u'TExg', u'RPT_Date', u'STK_ID', u'STK_Name', u'TClose_y'], dtype=object)
ipdb> self.blocks
[ObjectBlock: [TExg], 1 x 1, dtype object, IntBlock: [RT, TClose], 2 x 1, dtype int64, FloatBlock: [RT, TClose, TExg, TClose_y], 4 x 1, dtype float64]

where is RPT_Date in the blocks?

@jreback

This comment has been minimized.

Copy link
Contributor

commented Jul 31, 2013

@halleygithub thanks for the report
turned out to be a very subtle issue

@halleygithub

This comment has been minimized.

Copy link
Author

commented Jul 31, 2013

I attach the cPickle dump file of (df4, df5) here : http://ajqznkugcw.l25.yunpan.cn/lk/QnPqhJRCMdspq

So if you want, you can download it to take a check .

It seems that the issue is solved. So How can I resolve my probelm ? Can I have the latest daily development builds of the pandas windows binaries from http://pandas.pydata.org/pandas-build/dev/ ?

My application did meet several issues after upgrading and need to test one by one. Thanks.

@halleygithub

This comment has been minimized.

Copy link
Author

commented Jul 31, 2013

OK. I manually revise the merge.py and get thing run. Still expect binary builds. Thanks,

@jreback

This comment has been minimized.

Copy link
Contributor

commented Jul 31, 2013

Great
periodically check back for the dev builds

@smcinerney

This comment has been minimized.

Copy link

commented Aug 15, 2013

Can you please add this as a known-issue in the 0.12 whatsnew? along with the DeprecationWarnings?

@cpcloud

This comment has been minimized.

Copy link
Member

commented Aug 15, 2013

We can add it in the dev docs, but i'm pretty sure things are "frozen" for 0.12 stuff

@cpcloud

This comment has been minimized.

Copy link
Member

commented Aug 15, 2013

Would you like to submit a pull request?

@jtratner

This comment has been minimized.

Copy link
Contributor

commented Aug 15, 2013

@cpcloud Could pandas do a point release? (maybe before @jreback Series' refactor).

@cpcloud

This comment has been minimized.

Copy link
Member

commented Aug 15, 2013

possibly, although i'm not sure if we ever came to a consensus there

cc @y-p since he suggested it on the dev mailing list a little before 0.12 came out

i'm 👍 on doing a point release

@wesm ?

@wesm

This comment has been minimized.

Copy link
Member

commented Aug 15, 2013

What's the status of master? Do we need to create a maintenance branch and start backporting bug fixes?

@jreback

This comment has been minimized.

Copy link
Contributor

commented Aug 15, 2013

this is fixed in 0.12 IIRC

@jreback

This comment has been minimized.

Copy link
Contributor

commented Aug 15, 2013

actually master at this point is ok if u really wanted to release

@smcinerney

This comment has been minimized.

Copy link

commented Aug 15, 2013

merge() is broken in the 0.12 macports release I got yesterday

@cpcloud

This comment has been minimized.

Copy link
Member

commented Aug 15, 2013

Can you be a bit more specific than just "broken"? Please open an issue if you can.

@jtratner

This comment has been minimized.

Copy link
Contributor

commented Aug 15, 2013

@jreback this is not fixed in 0.12. checkout of v0.12.0 and running this (on OSX) still causes the failure described above.

@smcinerney

This comment has been minimized.

Copy link

commented Aug 15, 2013

I'm saying that this issue 4403 (merge breaks on indexing) is still in the 0.12 release on macports. People will hit this and at minimum it needs to documented as a known-issue in the whatsnew, or some such. I had to manually edit the changes of pull request 4410.

@jreback

This comment has been minimized.

Copy link
Contributor

commented Aug 16, 2013

@jtratner I stand corrected this was fixed early 0.13
but remains this is actually pretty hard to reproduce
you have to do very specific things to create it

IMHO this is not worth a 0.12.1 at this point
lets figure out a timeline for 0.13

@smcinerney

This comment has been minimized.

Copy link

commented Aug 16, 2013

@jreback, does it not occur on (any?) df merge with a non-unique index?

Separate to the timeline for merging the fix, I'm suggesting this be noted in the 0.12 whatsnew.

@jreback

This comment has been minimized.

Copy link
Contributor

commented Aug 16, 2013

@smcinerney no this only occurs after a merge with a non unique index after the merge that u then rename

I am not averse to posting something in the docs. though I have found that people usually just ask on so, mailing list or post an issue

since everyone is now aware I think we can respond pretty easily

(even issues that have really big and bold warnings are often ignored in the docs :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants
You can’t perform that action at this time.