Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: maximum recursion error when replacing empty lists #22083

Merged
merged 22 commits into from Aug 7, 2018

Conversation

Projects
None yet
4 participants
@minggli
Copy link
Contributor

commented Jul 27, 2018

  • closes #21977
  • closes #19266
  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

Internal mechanism in replace method is intertwined and can cause RecursionError (not very helpful). Discussed in #21977, there doesn't appear to be first class support for ExtensionArray yet and this issue is right now treated as error handling, having investigated treating it as new use case but not easily co-exist without large change.

The RecursionError is caused by Block.replace and ObjectBlock.replace calling each other under certain circumstances such as in the linked issues.

New approach is now to prevent re-casting of blocks to object blocks if it is already an ObjectBlock and instead raise the error caught as it is, because if it's an ObjectBlock already then ValueError and TypeError raised won't go away when re-casted to object and re-try, resulting in infinite loop and RecursionError.

@minggli minggli force-pushed the minggli:bugfix/replace_recursion branch from 621485b to c97b2ee Jul 27, 2018

@jreback
Copy link
Contributor

left a comment

can you add the example as a test

@@ -1000,8 +999,8 @@ def putmask(self, mask, new, align=True, inplace=False, axis=0,
if not (mask.shape[-1] == len(new) or
mask[mask].shape[-1] == len(new) or
len(new) == 1):
raise ValueError("cannot assign mismatch "

This comment has been minimized.

Copy link
@jreback

jreback Jul 28, 2018

Contributor

does this need to change? (or is this what is actually causing the issue); this is not exposed to the user, correct?

This comment has been minimized.

Copy link
@minggli

minggli Jul 28, 2018

Author Contributor

Yes this ValueError caused Block.replace and ObjectBlock.replace to form infinite loop and ultimately raise RecursionError. It's because Block.replace catches exception include ValueError and if it fails, delegate replace to ObjectBlock. But ObjectBlock.replace also uses super(ObjectBlock, self), hence the loop. The exception handling includes TypeError and ValueError and they are both depended on (as appear to me) by other casting cases.

Open to suggestion, not sure a better way to handle, at the moment this change is needed to address the issue in question.

This can be exposed to user when:

import pandas as pd

s = pd.Series(range(10))
mask = s > 5

s[mask] = [5, 4, 3, 2, 1]

This comment has been minimized.

Copy link
@jreback

jreback Aug 2, 2018

Contributor

do you know why we are catching ValueError at a higher level, maybe can get away with only catching TypeError and let ValueError buble up?

This comment has been minimized.

Copy link
@minggli

minggli Aug 2, 2018

Author Contributor

Good point and it appears below test (only this test) fails when we only catch TypeError:

_________________ TestDataFrameReplace.test_replace_datetimetz _________________
# coerce to object
result = df.copy()
result.iloc[1, 0] = np.nan
result = result.replace(
{'A': pd.NaT}, Timestamp('20130104', tz='US/Pacific'))

pandas/tests/frame/test_replace.py:1060:


    elif isinstance(other, (np.datetime64, datetime, date)):
        other = tslibs.Timestamp(other)
        tz = getattr(other, 'tz', None)

        # test we can have an equal time zone
        if tz is None or str(tz) != str(self.values.tz):
         raise ValueError("incompatible or non tz-aware value")

E ValueError: incompatible or non tz-aware value

pandas/core/internals/blocks.py:2898: ValueError

It fails because of above ValueError which is passed on to ObjectBlock.replace as part of error handling in Block.replace. so it appears it is expected to be handled after the block is casted as object and try again.

This comment has been minimized.

Copy link
@minggli

minggli Aug 6, 2018

Author Contributor

@jreback made some changes now we don't touch above error but instead make sure we don't enter infinite loop.

@minggli minggli force-pushed the minggli:bugfix/replace_recursion branch from 60de53c to 4747c98 Jul 28, 2018

@minggli minggli force-pushed the minggli:bugfix/replace_recursion branch from 9506cc4 to 001c3a0 Jul 28, 2018

@codecov

This comment has been minimized.

Copy link

commented Jul 28, 2018

Codecov Report

Merging #22083 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #22083      +/-   ##
==========================================
+ Coverage   92.06%   92.06%   +<.01%     
==========================================
  Files         169      169              
  Lines       50694    50698       +4     
==========================================
+ Hits        46671    46675       +4     
  Misses       4023     4023
Flag Coverage Δ
#multiple 90.47% <100%> (ø) ⬆️
#single 42.32% <25%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/missing.py 91.7% <100%> (+0.04%) ⬆️
pandas/core/internals/blocks.py 94.45% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2156431...2bbd097. Read the comment docs.

@minggli

This comment has been minimized.

Copy link
Contributor Author

commented Jul 28, 2018

added test cases too from two issues linked. 🚀

@jreback gentle nudge.

@jreback jreback added this to the 0.24.0 milestone Aug 1, 2018

# GH 19266 and GH 21977
# ValueError triggers try except block in Block.replace
# causing RecursionError
raise Exception("cannot assign mismatch length "

This comment has been minimized.

Copy link
@jreback

jreback Aug 1, 2018

Contributor

can you make this a bit more informative from a user POV?

This comment has been minimized.

Copy link
@minggli

minggli Aug 2, 2018

Author Contributor

sure. this change here is exposed to user so when one tries to assign an array of n-dimension to mask of m-dimension, when m != n, this error throws.

>>> import pandas as pd
>>> s = pd.Series(range(10))
>>> mask = s > 5

>>> s[mask].shape
(4,)

>>> s[mask] = [5, 4, 3, 2, 1]

  File "/Users/mingli/.virtualenvs/cust-value/lib/python3.5/site-packages/pandas/core/internals.py", line 1006, in putmask
    raise ValueError("cannot assign mismatch "
ValueError: cannot assign mismatch length to masked array

This is also used internally by replace and ValueError it raises is causing infinite loop as per linked issues. Whilst changing that to not ValueError will solve it, it also makes it less clear and potentially breaks users' codes if they expect to catch ValueError running above code snippet.

Do you think it's ok to use Exception here as it would solve the issues linked but users might have to change their code?

This comment has been minimized.

Copy link
@minggli

minggli Aug 6, 2018

Author Contributor

this is reverted.

@minggli minggli closed this Aug 2, 2018

@minggli minggli reopened this Aug 2, 2018

@minggli minggli closed this Aug 5, 2018

@minggli minggli reopened this Aug 5, 2018

@minggli minggli force-pushed the minggli:bugfix/replace_recursion branch from 53bfe6d to 9e27a38 Aug 5, 2018

@minggli minggli force-pushed the minggli:bugfix/replace_recursion branch from fa1a1e5 to 167a3b7 Aug 6, 2018

@pep8speaks

This comment has been minimized.

Copy link

commented Aug 6, 2018

Hello @minggli! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on August 07, 2018 at 13:23 Hours UTC

@minggli minggli force-pushed the minggli:bugfix/replace_recursion branch from 3db333f to c8f4f72 Aug 6, 2018

minggli added some commits Aug 6, 2018

Merge remote-tracking branch 'upstream/master' into bugfix/replace_re…
…cursion

* upstream/master:
  Added 0.23.4 whatsnew [ci skip] (#22214)
@minggli

This comment has been minimized.

Copy link
Contributor Author

commented Aug 6, 2018

@jreback @WillAyd I think PR is done, comments welcome.

@@ -603,6 +603,20 @@ def test_replace_list(self):

assert_frame_equal(res, expec)

def test_replace_with_empty_list(self):
# GH 21977
s = pd.Series([['a', 'b'], np.nan, [1]])

This comment has been minimized.

Copy link
@WillAyd

WillAyd Aug 6, 2018

Member

Did you mean to include an empty list in this test case (exists in the series module one)

This comment has been minimized.

Copy link
@minggli

minggli Aug 6, 2018

Author Contributor

good point!

@@ -667,7 +667,8 @@ Reshaping
- Bug in :meth:`DataFrame.replace` raises RecursionError when converting OutOfBounds ``datetime64[ns, tz]`` (:issue:`20380`)
- :func:`pandas.core.groupby.GroupBy.rank` now raises a ``ValueError`` when an invalid value is passed for argument ``na_option`` (:issue:`22124`)
- Bug in :func:`get_dummies` with Unicode attributes in Python 2 (:issue:`22084`)
-
- Bug in :meth:`DataFrame.replace` raises RecursionError when replace empty lists (:issue:`22083`)

This comment has been minimized.

Copy link
@WillAyd

WillAyd Aug 6, 2018

Member

Very minor nits but put RecursionError in double backticks and change replace to replacing

This comment has been minimized.

Copy link
@minggli

minggli Aug 6, 2018

Author Contributor

thanks! changed.

@WillAyd

WillAyd approved these changes Aug 6, 2018

Copy link
Member

left a comment

lgtm

@jreback
Copy link
Contributor

left a comment

minor comments, ping on green.

convert=convert)
# GH 22083, TypeError or ValueError occurred within error handling
# causes infinite loop. Cast and retry only if not objectblock.
if self.dtype == 'object':

This comment has been minimized.

Copy link
@jreback

jreback Aug 7, 2018

Contributor

can you use is_object_dtype

# causes infinite loop. Cast and retry only if not objectblock.
if self.dtype == 'object':
raise
else:

This comment has been minimized.

Copy link
@jreback

jreback Aug 7, 2018

Contributor

don't need this else here

@jreback

jreback approved these changes Aug 7, 2018

@jreback jreback merged commit 96b4873 into pandas-dev:master Aug 7, 2018

3 checks passed

ci/circleci Your tests passed on CircleCI!
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@jreback

This comment has been minimized.

Copy link
Contributor

commented Aug 7, 2018

thanks @minggli

@minggli minggli deleted the minggli:bugfix/replace_recursion branch Aug 7, 2018

alimcmaster1 added a commit to alimcmaster1/pandas that referenced this pull request Aug 12, 2018

Sup3rGeo added a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.