Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: avoid overflow in Bday generate_range, closes #24252 #26651

Merged
merged 14 commits into from
Jun 21, 2019

Conversation

jbrockmendel
Copy link
Member

@codecov
Copy link

codecov bot commented Jun 4, 2019

Codecov Report

Merging #26651 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #26651      +/-   ##
==========================================
- Coverage   91.87%   91.87%   -0.01%     
==========================================
  Files         174      174              
  Lines       50692    50696       +4     
==========================================
+ Hits        46575    46576       +1     
- Misses       4117     4120       +3
Flag Coverage Δ
#multiple 90.41% <100%> (ø) ⬆️
#single 41.8% <50%> (-0.07%) ⬇️
Impacted Files Coverage Δ
pandas/tseries/offsets.py 96.7% <100%> (+0.01%) ⬆️
pandas/io/gbq.py 78.94% <0%> (-10.53%) ⬇️
pandas/core/frame.py 97% <0%> (-0.12%) ⬇️
pandas/util/testing.py 90.91% <0%> (+0.1%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c07d71d...e200f32. Read the comment docs.

@codecov
Copy link

codecov bot commented Jun 4, 2019

Codecov Report

Merging #26651 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #26651      +/-   ##
==========================================
- Coverage   91.87%   91.86%   -0.02%     
==========================================
  Files         180      180              
  Lines       50746    50751       +5     
==========================================
- Hits        46623    46621       -2     
- Misses       4123     4130       +7
Flag Coverage Δ
#multiple 90.45% <100%> (-0.01%) ⬇️
#single 41.09% <33.33%> (-0.1%) ⬇️
Impacted Files Coverage Δ
pandas/tseries/offsets.py 96.36% <100%> (-0.33%) ⬇️
pandas/io/gbq.py 88.88% <0%> (-11.12%) ⬇️
pandas/core/frame.py 96.89% <0%> (-0.12%) ⬇️
pandas/util/testing.py 90.94% <0%> (+0.1%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cfd65e9...b06440c. Read the comment docs.

@topper-123
Copy link
Contributor

This doesn't solve the overflow type:

>>> date = pd.Timestamp.max.floor("D").to_pydatetime().date()  # datetime.date(2262, 4, 11)
>>> freq = 'B'
>>> pd.date_range(date, periods=2, freq=freq)  # see periods=2
OverflowError: int too big to convert

I would guess this should be a OutOfBoundsDatetime and not a OverflowError?

@jbrockmendel
Copy link
Member Author

Yah, it should probably be an OutOfBoundsDatetime instead of OverflowError. Will update.

@gfyoung gfyoung added Error Reporting Incorrect or improved errors from pandas Frequency DateOffsets Timezones Timezone data dtype labels Jun 4, 2019
@jreback jreback added this to the 0.25.0 milestone Jun 6, 2019
pandas/tseries/offsets.py Show resolved Hide resolved
ts = convert_to_tsobject(ts_input, tz, unit, 0, 0, nanosecond or 0)
try:
ts = convert_to_tsobject(ts_input, tz, unit, 0, 0, nanosecond or 0)
except OverflowError:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you update the doc-string of convert_ot_tsobject. Why are we not catching the OverflowError inside?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put this here because I wanted to do this error-catching in python-space instead of c-space so that convert_to_tsobject could have better optimizations, but it looks like convert_to_tsobject already has some raising cases, so this can be moved there. will update.

@jreback
Copy link
Contributor

jreback commented Jun 8, 2019

lgtm. ping when green.

@jreback
Copy link
Contributor

jreback commented Jun 10, 2019

code & tests lgtm. can you add a whatsnew. ping on green.

@jbrockmendel
Copy link
Member Author

ping

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small comment, otherwise lgtm.

@@ -24,6 +24,7 @@ from pandas._libs.tslibs.conversion cimport (
from pandas._libs.tslibs.nattype cimport NPY_NAT, c_NaT as NaT
from pandas._libs.tslibs.np_datetime cimport (
check_dts_bounds, npy_datetimestruct, dt64_to_dtstruct)
from pandas._libs.tslibs.np_datetime import OutOfBoundsDatetime
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this used here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, removed.

I'm holding out hope that one day we'll be able to lint these files.

@jbrockmendel
Copy link
Member Author

In one Travis build I'm seeing an error I also get locally on many branches:

_______ TestDataFrameConstructors.test_constructor_maskedrecarray_dtype ________
[gw1] linux -- Python 3.7.3 /home/travis/miniconda3/envs/pandas-dev/bin/python
self = <pandas.tests.frame.test_constructors.TestDataFrameConstructors object at 0x7fa6ace49f98>
    def test_constructor_maskedrecarray_dtype(self):
        # Ensure constructor honors dtype
        data = np.ma.array(
            np.ma.zeros(5, dtype=[('date', '<f8'), ('price', '<f8')]),
            mask=[False] * 5)
>       data = data.view(ma.mrecords.mrecarray)
E       AttributeError: module 'numpy.ma' has no attribute 'mrecords'

Any idea what this is all about?

@TomAugspurger
Copy link
Contributor

Not sure, but numpy.ma.mrecords doesn't seem to be imported by default. Would need import numpy.ma.mrecords before using ma.mrecords.

@jbrockmendel
Copy link
Member Author

Looks like we import ma.mrecords as a runtime import in the DataFrame constructor, so this ma.mrecords will not exist if we run these tests before that runtime import gets hit. (I'm guessing we started randomizing test order recently?). Just added the import in the appropriate place in the test so it shouldn't matter anymore.

ts = <int64_t>ts
except OverflowError:
# GH#26651 re-raise as OutOfBoundsDatetime
raise OutOfBoundsDatetime
Copy link
Contributor

@TomAugspurger TomAugspurger Jun 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is including the bad ts in the exception message useful here? I imagine it's good for Timestamp(too_big_int), but not so much for other inputs (if they hit this).

@jreback jreback merged commit 388d22c into pandas-dev:master Jun 21, 2019
@jreback
Copy link
Contributor

jreback commented Jun 21, 2019

thanks @jbrockmendel nice patch!

@jbrockmendel jbrockmendel deleted the oob branch June 21, 2019 02:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Error Reporting Incorrect or improved errors from pandas Frequency DateOffsets Timezones Timezone data dtype
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: pd.date_range gives wrong overflow error type when freq='B'
5 participants