Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stack overflow on applying numpy functions to DataFrame with duplicated column entries. #11611

Closed
skycaptain opened this issue Nov 16, 2015 · 2 comments
Labels
Bug Compat pandas objects compatability with Numpy or Python functions Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Milestone

Comments

@skycaptain
Copy link
Contributor

Applying a numpy function, such as np.round, to a DataFrame with duplicated column indicies can cause an unrecoverable stack overflow error (Fatal Python error: Cannot recover from stack overflow.), which causes e.g. a ipython kernel to crash. E.g. take the following example, where python crashes at line 4:

x = pd.DataFrame(np.random.randn(3,3))
y = pd.DataFrame(np.random.randn(3,3))
z = pd.concat((x, y), axis=1)
print(np.round(z))

However, removing the duplicate column entries, works as expected:

...
z = pd.concat((x, y), axis=1, ignore_index=True)
print(np.round(z))

python 3.5.0, numpy 1.10.1, pandas 0.17.0

@jreback
Copy link
Contributor

jreback commented Nov 16, 2015

this is specifically with np.round, which ends up calling DataFrame.round, which does not handle duplicates properly.

iteration needs to use .iteritems() which correctly handles duplicate iteration, rather than column selection

pull-requests to fix are welcome

@jreback jreback added Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode Difficulty Novice Compat pandas objects compatability with Numpy or Python functions labels Nov 16, 2015
@jreback jreback added this to the Next Major Release milestone Nov 16, 2015
@jreback jreback modified the milestones: 0.17.1, Next Major Release Nov 20, 2015
jreback pushed a commit that referenced this issue Nov 20, 2015
BUG: decimals must be unique indexed, #11618

BUG: Added test, added whatsnew entry, #11618

TST: move round testing to test_format.py
@jreback
Copy link
Contributor

jreback commented Nov 20, 2015

closed by #11618

@jreback jreback closed this as completed Nov 20, 2015
yarikoptic added a commit to neurodebian/pandas that referenced this issue Dec 3, 2015
Version 0.17.1

* tag 'v0.17.1': (168 commits)
  add nbviewer link
  Revert "DOC: fix sponsor notice"
  DOC: a few touchups
  DOC: fix sponsor notice
  DOC: warnings and remove HTML
  COMPAT: compat of scalars on all platforms, xref pandas-dev#11638
  DOC: fix build errors/warnings
  DOC: whatsnew edits
  DOC: fix link syntax
  DOC: update release.rst / whatsnew edits
  BUG: fix col iteration in DataFrame.round, pandas-dev#11611
  DOC: Clarify foramtting
  BUG: pandas-dev#11638 return correct dtype for int and float
  BUG: pandas-dev#11637 fix to_csv incorrect output.
  DOC: sponsor notice
  BUG: indexing with a range , pandas-dev#11652
  Fix link to numexpr
  ENH: fixup tilde expansion, xref pandas-dev#11438
  ENH: tilde expansion for write output formatting functions, pandas-dev#11438
  DOC: fix up doc-string creations in generic.py
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Compat pandas objects compatability with Numpy or Python functions Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

No branches or pull requests

2 participants