Stack overflow on applying numpy functions to DataFrame with duplicated column entries. #11611

skycaptain · 2015-11-16T11:05:38Z

Applying a numpy function, such as np.round, to a DataFrame with duplicated column indicies can cause an unrecoverable stack overflow error (Fatal Python error: Cannot recover from stack overflow.), which causes e.g. a ipython kernel to crash. E.g. take the following example, where python crashes at line 4:

x = pd.DataFrame(np.random.randn(3,3))
y = pd.DataFrame(np.random.randn(3,3))
z = pd.concat((x, y), axis=1)
print(np.round(z))

However, removing the duplicate column entries, works as expected:

...
z = pd.concat((x, y), axis=1, ignore_index=True)
print(np.round(z))

python 3.5.0, numpy 1.10.1, pandas 0.17.0

The text was updated successfully, but these errors were encountered:

jreback · 2015-11-16T12:55:11Z

this is specifically with np.round, which ends up calling DataFrame.round, which does not handle duplicates properly.

iteration needs to use .iteritems() which correctly handles duplicate iteration, rather than column selection

pull-requests to fix are welcome

BUG: decimals must be unique indexed, #11618 BUG: Added test, added whatsnew entry, #11618 TST: move round testing to test_format.py

jreback · 2015-11-20T13:55:35Z

closed by #11618

Version 0.17.1 * tag 'v0.17.1': (168 commits) add nbviewer link Revert "DOC: fix sponsor notice" DOC: a few touchups DOC: fix sponsor notice DOC: warnings and remove HTML COMPAT: compat of scalars on all platforms, xref pandas-dev#11638 DOC: fix build errors/warnings DOC: whatsnew edits DOC: fix link syntax DOC: update release.rst / whatsnew edits BUG: fix col iteration in DataFrame.round, pandas-dev#11611 DOC: Clarify foramtting BUG: pandas-dev#11638 return correct dtype for int and float BUG: pandas-dev#11637 fix to_csv incorrect output. DOC: sponsor notice BUG: indexing with a range , pandas-dev#11652 Fix link to numexpr ENH: fixup tilde expansion, xref pandas-dev#11438 ENH: tilde expansion for write output formatting functions, pandas-dev#11438 DOC: fix up doc-string creations in generic.py ...

jreback added Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode Difficulty Novice Compat pandas objects compatability with Numpy or Python functions labels Nov 16, 2015

jreback added this to the Next Major Release milestone Nov 16, 2015

skycaptain mentioned this issue Nov 16, 2015

BUG: fix col iteration in DataFrame.round, #11611 #11618

Closed

jreback modified the milestones: 0.17.1, Next Major Release Nov 20, 2015

jreback pushed a commit that referenced this issue Nov 20, 2015

BUG: fix col iteration in DataFrame.round, #11611

80a2d53

BUG: decimals must be unique indexed, #11618 BUG: Added test, added whatsnew entry, #11618 TST: move round testing to test_format.py

jreback closed this as completed Nov 20, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stack overflow on applying numpy functions to DataFrame with duplicated column entries. #11611

Stack overflow on applying numpy functions to DataFrame with duplicated column entries. #11611

skycaptain commented Nov 16, 2015

jreback commented Nov 16, 2015

jreback commented Nov 20, 2015

Stack overflow on applying numpy functions to DataFrame with duplicated column entries. #11611

Stack overflow on applying numpy functions to DataFrame with duplicated column entries. #11611

Comments

skycaptain commented Nov 16, 2015

jreback commented Nov 16, 2015

jreback commented Nov 20, 2015