Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] from_durations seem to be overflowing while converting to a string #11794

Closed
galipremsagar opened this issue Sep 27, 2022 · 5 comments
Closed
Labels
bug Something isn't working libcudf Affects libcudf (C++/CUDA) code.

Comments

@galipremsagar
Copy link
Contributor

galipremsagar commented Sep 27, 2022

Describe the bug
from_durations appears to be overflowing while translating the value to a human readable format, i.e., converting to a string column.

Steps/Code to reproduce bug

In [7]: s = cudf.Series([13645765432432, 134736784, 245345345, -223432411, 999992343241, 3634548734, 23234], dtype='timedelta64[ms]')

In [3]: s.astype('str')
Out[3]: 
0    -55566 days 21:10:41.277551616
1         1 days 13:25:36.784000000
2         2 days 20:09:05.345000000
3        -2 days 14:03:52.411000000
4     11573 days 23:39:03.241000000
5        42 days 01:35:48.734000000
6         0 days 00:00:23.234000000
dtype: object

In [9]: s.to_pandas()
Out[9]: 
0   -55567 days +02:49:18.722448384            #pandas too will overflow but to a different value, probably because they support only `ns` resolution.
1            1 days 13:25:36.784000
2            2 days 20:09:05.345000
3          -3 days +09:56:07.589000
4        11573 days 23:39:03.241000
5           42 days 01:35:48.734000
6            0 days 00:00:23.234000
dtype: timedelta64[ns]

In [10]: s.to_pandas().astype('int')
Out[10]: 
0   -4800978641277551616
1        134736784000000
2        245345345000000
3       -223432411000000
4     999992343241000000
5       3634548734000000
6            23234000000
dtype: int64

In [19]: s.astype('int64')
Out[19]: 
0    13645765432432
1         134736784
2         245345345
3        -223432411
4      999992343241
5        3634548734
6             23234
dtype: int64


In [12]: import numpy as np

In [14]: def delta_format(delta: np.timedelta64) -> str:      # A method to calculate part of `timedelta` i.e., days & hours.
    ...:     days = delta.astype("timedelta64[D]") / np.timedelta64(1, 'D')
    ...:     hours = int(delta.astype("timedelta64[h]") / np.timedelta64(1, 'h') % 24)
    ...: 
    ...:     if days > 0 and hours > 0:
    ...:         return f"{days:.0f} d, {hours:.0f} h"
    ...:     elif days > 0:
    ...:         return f"{days:.0f} d"
    ...:     else:
    ...:         return f"{hours:.0f} h"
    ...: 


In [20]: s[0]   # The scalar value is correctly preserved.
Out[20]: numpy.timedelta64(13645765432432,'ms')

In [15]: delta_format(s[0])
Out[15]: '157937 d, 2 h'             # Expecting this value while we convert to string's at index s[0]

In [16]: delta_format(s[1])
Out[16]: '1 d, 13 h'

Expected behavior

Does not overflow in strings i.e., return 157937 d, 2hrs.. at index 0 of s.

Environment overview (please complete the following information)

  • Environment location: [Bare-metal]
  • Method of cuDF install: [from source]

Additional context
This bug surfaced in: #11749

@galipremsagar galipremsagar added bug Something isn't working Needs Triage Need team to review and classify libcudf Affects libcudf (C++/CUDA) code. labels Sep 27, 2022
@galipremsagar
Copy link
Contributor Author

cc: @davidwendt

@davidwendt
Copy link
Contributor

The duration value is eventually passed to the cuda::std::chrono to be dissected into components.
We generally do not do data validation in libcudf. Bad data in gives bad data out.
General discussion comes from #5505

@GregoryKimball
Copy link
Contributor

Is this issue still open? The repro doesn't appear to overflow on 22.12.

>>> s = cudf.Series([13645765432432, 134736784, 245345345, -223432411, 999992343241, 3634548734, 23234], dtype='timedelta64[ms]')
>>> s.astype('str')
0    157937 days 02:23:52.432
1         1 days 13:25:36.784
2         2 days 20:09:05.345
3        -2 days 14:03:52.411
4     11573 days 23:39:03.241
5        42 days 01:35:48.734
6         0 days 00:00:23.234
dtype: object
>>> 

@GregoryKimball GregoryKimball added 0 - Backlog In queue waiting for assignment and removed Needs Triage Need team to review and classify labels Oct 21, 2022
@galipremsagar
Copy link
Contributor Author

galipremsagar commented Oct 21, 2022

IDK what change fixed this issue in 22.12(works for me too), @davidwendt shall we close this issue or keep it open if you think this isn't actually fixed?

@davidwendt
Copy link
Contributor

My assertion was that there is nothing for us to fix per our developer guidelines https://github.com/rapidsai/cudf/blob/branch-22.12/cpp/doxygen/developer_guide/DEVELOPER_GUIDE.md#libcudf-does-not-introspect-data
So I'm ok to close this.

@galipremsagar galipremsagar removed the 0 - Backlog In queue waiting for assignment label Oct 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working libcudf Affects libcudf (C++/CUDA) code.
Projects
None yet
Development

No branches or pull requests

3 participants