New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stack overflow on applying numpy functions to DataFrame with duplicated column entries. #11611

Closed
skycaptain opened this Issue Nov 16, 2015 · 2 comments

Comments

Projects
None yet
2 participants
@skycaptain
Contributor

skycaptain commented Nov 16, 2015

Applying a numpy function, such as np.round, to a DataFrame with duplicated column indicies can cause an unrecoverable stack overflow error (Fatal Python error: Cannot recover from stack overflow.), which causes e.g. a ipython kernel to crash. E.g. take the following example, where python crashes at line 4:

x = pd.DataFrame(np.random.randn(3,3))
y = pd.DataFrame(np.random.randn(3,3))
z = pd.concat((x, y), axis=1)
print(np.round(z))

However, removing the duplicate column entries, works as expected:

...
z = pd.concat((x, y), axis=1, ignore_index=True)
print(np.round(z))

python 3.5.0, numpy 1.10.1, pandas 0.17.0

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Nov 16, 2015

Contributor

this is specifically with np.round, which ends up calling DataFrame.round, which does not handle duplicates properly.

iteration needs to use .iteritems() which correctly handles duplicate iteration, rather than column selection

pull-requests to fix are welcome

Contributor

jreback commented Nov 16, 2015

this is specifically with np.round, which ends up calling DataFrame.round, which does not handle duplicates properly.

iteration needs to use .iteritems() which correctly handles duplicate iteration, rather than column selection

pull-requests to fix are welcome

@jreback jreback added this to the Next Major Release milestone Nov 16, 2015

@jreback jreback modified the milestones: 0.17.1, Next Major Release Nov 20, 2015

jreback added a commit that referenced this issue Nov 20, 2015

BUG: fix col iteration in DataFrame.round, #11611
BUG: decimals must be unique indexed, #11618

BUG: Added test, added whatsnew entry, #11618

TST: move round testing to test_format.py
@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Nov 20, 2015

Contributor

closed by #11618

Contributor

jreback commented Nov 20, 2015

closed by #11618

@jreback jreback closed this Nov 20, 2015

yarikoptic added a commit to neurodebian/pandas that referenced this issue Dec 3, 2015

Merge tag 'v0.17.1' into debian
Version 0.17.1

* tag 'v0.17.1': (168 commits)
  add nbviewer link
  Revert "DOC: fix sponsor notice"
  DOC: a few touchups
  DOC: fix sponsor notice
  DOC: warnings and remove HTML
  COMPAT: compat of scalars on all platforms, xref pandas-dev#11638
  DOC: fix build errors/warnings
  DOC: whatsnew edits
  DOC: fix link syntax
  DOC: update release.rst / whatsnew edits
  BUG: fix col iteration in DataFrame.round, pandas-dev#11611
  DOC: Clarify foramtting
  BUG: pandas-dev#11638 return correct dtype for int and float
  BUG: pandas-dev#11637 fix to_csv incorrect output.
  DOC: sponsor notice
  BUG: indexing with a range , pandas-dev#11652
  Fix link to numexpr
  ENH: fixup tilde expansion, xref pandas-dev#11438
  ENH: tilde expansion for write output formatting functions, pandas-dev#11438
  DOC: fix up doc-string creations in generic.py
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment