Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Fix error in replace with strings that are large numbers (#25616) #25644

Merged

Conversation

ArtificialQualia
Copy link
Contributor

See discussion in #25616.

When .replace saw a value that looks like an int, it would try to convert it even if it caused an OverflowError. This issue is only happening in newer versions of pandas due to the addition of coerce_to_target_dtype in _replace_coerce. coerce_to_target_dtype is required to fix a lot of other issues, so the fix here was to prevent a coercion to an int that would cause an OverflowError by catching that exception, allowing the values to remain as objects.

I tried to play around with coerce_to_target_dtype as well (moving it until after the replace, only doing it when covert is True, etc.) but this caused various other coercion and replace tests to fail, so I left that untouched.

Tests have been added for both cases where I found OverflowError could occur with replace.

@@ -32,6 +32,7 @@ Fixed Regressions
- Fixed regression in creating a period-dtype array from a read-only NumPy array of period objects. (:issue:`25403`)
- Fixed regression in :class:`Categorical`, where constructing it from a categorical ``Series`` and an explicit ``categories=`` that differed from that in the ``Series`` created an invalid object which could trigger segfaults. (:issue:`25318`)
- Fixed pip installing from source into an environment without NumPy (:issue:`25193`)
- Fixed regression in :func:`replace` where large strings of numbers would be coerced into int, causing an ``OverflowError`` (:issue:`25616`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use DataFrame.replace as what you have won't render. use double-backticks around int.

pandas/core/internals/blocks.py Show resolved Hide resolved
@@ -181,6 +181,20 @@ def check_replace(to_rep, val, expected):
tr, v = [3, 4], [3.5, True]
check_replace(tr, v, e)

# GH 25616
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make a new test

@jreback jreback added the Numeric Operations Arithmetic, Comparison, and Logical operations label Mar 10, 2019
@pep8speaks
Copy link

pep8speaks commented Mar 10, 2019

Hello @ArtificialQualia! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-03-12 20:45:17 UTC

@codecov
Copy link

codecov bot commented Mar 10, 2019

Codecov Report

Merging #25644 into master will decrease coverage by 49.54%.
The diff coverage is 0%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master   #25644       +/-   ##
===========================================
- Coverage   91.26%   41.71%   -49.55%     
===========================================
  Files         173      173               
  Lines       52968    52968               
===========================================
- Hits        48339    22096    -26243     
- Misses       4629    30872    +26243
Flag Coverage Δ
#multiple ?
#single 41.71% <0%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/internals/blocks.py 51.92% <0%> (-42.16%) ⬇️
pandas/io/formats/latex.py 0% <0%> (-100%) ⬇️
pandas/core/categorical.py 0% <0%> (-100%) ⬇️
pandas/io/sas/sas_constants.py 0% <0%> (-100%) ⬇️
pandas/tseries/plotting.py 0% <0%> (-100%) ⬇️
pandas/tseries/converter.py 0% <0%> (-100%) ⬇️
pandas/io/formats/html.py 0% <0%> (-99.36%) ⬇️
pandas/core/groupby/categorical.py 0% <0%> (-95.46%) ⬇️
pandas/io/sas/sas7bdat.py 0% <0%> (-91.17%) ⬇️
pandas/io/sas/sas_xport.py 0% <0%> (-90.15%) ⬇️
... and 131 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 16edaaf...4e16b9a. Read the comment docs.

@codecov
Copy link

codecov bot commented Mar 10, 2019

Codecov Report

Merging #25644 into master will increase coverage by 49.55%.
The diff coverage is 100%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master   #25644       +/-   ##
===========================================
+ Coverage   41.73%   91.29%   +49.55%     
===========================================
  Files         173      173               
  Lines       52967    52961        -6     
===========================================
+ Hits        22106    48350    +26244     
+ Misses      30861     4611    -26250
Flag Coverage Δ
#multiple 89.86% <100%> (?)
#single 41.73% <0%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/internals/blocks.py 94.08% <100%> (+42.15%) ⬆️
pandas/core/computation/pytables.py 90.54% <0%> (+0.3%) ⬆️
pandas/io/pytables.py 90.11% <0%> (+0.95%) ⬆️
pandas/util/_test_decorators.py 90.54% <0%> (+4.05%) ⬆️
pandas/compat/__init__.py 58.03% <0%> (+8.23%) ⬆️
pandas/core/config_init.py 99.24% <0%> (+9.84%) ⬆️
pandas/io/formats/terminal.py 32.53% <0%> (+10.84%) ⬆️
pandas/compat/numpy/__init__.py 93.33% <0%> (+13.33%) ⬆️
pandas/io/formats/console.py 86.27% <0%> (+13.72%) ⬆️
pandas/core/api.py 100% <0%> (+13.79%) ⬆️
... and 131 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a8fad16...f32163c. Read the comment docs.

Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed the merge conflict in the whatsnew if we want to do this for 0.24.2 @jorisvandenbossche (it's not tagged right now).

@ArtificialQualia
Copy link
Contributor Author

Fixed merge conflicts again

@jorisvandenbossche jorisvandenbossche added this to the 0.24.2 milestone Mar 12, 2019
@jorisvandenbossche jorisvandenbossche merged commit 12fd316 into pandas-dev:master Mar 12, 2019
@lumberbot-app
Copy link

lumberbot-app bot commented Mar 12, 2019

Owee, I'm MrMeeseeks, Look at me.

There seem to be a conflict, please backport manually. Here are approximate instructions:

  1. Checkout backport branch and update it.
$ git checkout 0.24.x
$ git pull
  1. Cherry pick the first parent branch of the this PR on top of the older branch:
$ git cherry-pick -m1 12fd316de829b994d6e3d1fc14c59d8e8bf34500
  1. You will likely have some merge/cherry-pick conflict here, fix them and commit:
$ git commit -am 'Backport PR #25644: BUG: Fix error in replace with strings that are large numbers (#25616)'
  1. Push to a named branch :
git push YOURFORK 0.24.x:auto-backport-of-pr-25644-on-0.24.x
  1. Create a PR against branch 0.24.x, I would have named this PR:

"Backport PR #25644 on branch 0.24.x"

And apply the correct labels and milestones.

Congratulation you did some good work ! Hopefully your backport PR will be tested by the continuous integration and merged soon!

If these instruction are inaccurate, feel free to suggest an improvement.

@jorisvandenbossche
Copy link
Member

@ArtificialQualia Thanks a lot !

jorisvandenbossche pushed a commit to jorisvandenbossche/pandas that referenced this pull request Mar 12, 2019
@jorisvandenbossche
Copy link
Member

Manually backported in f4e1127

sighingnow added a commit to sighingnow/pandas that referenced this pull request Mar 14, 2019
* master: (22 commits)
  Fixturize tests/frame/test_operators.py (pandas-dev#25641)
  Update ValueError message in corr (pandas-dev#25729)
  DOC: fix some grammar and inconsistency issues in the User Guide (pandas-dev#25728)
  ENH: Add public start, stop, and step attributes to RangeIndex (pandas-dev#25720)
  Make Rolling.apply documentation clearer (pandas-dev#25712)
  pandas-dev#25707 - Fixed flakiness in stata write test (pandas-dev#25714)
  Json normalize nan support (pandas-dev#25619)
  TST: resolve issues with test_constructor_dtype_datetime64 (pandas-dev#24868)
  DEPR: Deprecate box kwarg for to_timedelta and to_datetime (pandas-dev#24486)
  BUG: Preserve name in DatetimeIndex.snap (pandas-dev#25585)
  Fix concat not respecting order of OrderedDict (pandas-dev#25224)
  CLN: remove pandas.core.categorical (pandas-dev#25655)
  TST/CLN: Remove more Panel tests (pandas-dev#25675)
  Pinned pycodestyle (pandas-dev#25701)
  DOC: update date of 0.24.2 release notes (pandas-dev#25699)
  BUG: Fix error in replace with strings that are large numbers (pandas-dev#25616) (pandas-dev#25644)
  BUG: fix usage of na_sentinel with sort=True in factorize() (pandas-dev#25592)
  BUG: Fix to_string output when using header (pandas-dev#16718) (pandas-dev#25602)
  CLN: Remove unused test code (pandas-dev#25670)
  CLN: remove Panel from concat error message (pandas-dev#25676)
  ...

# Conflicts:
#	doc/source/whatsnew/v0.25.0.rst
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pandas.DataFrame.replace seems taking number string as integer and run into overflow error
5 participants