Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] ORC read/write is wrong in day values in pre-1582 datetime values #11691

Open
ttnghia opened this issue Sep 13, 2022 · 4 comments
Open
Assignees
Labels
0 - Backlog In queue waiting for assignment bug Something isn't working cuIO cuIO issue libcudf Affects libcudf (C++/CUDA) code.

Comments

@ttnghia
Copy link
Contributor

ttnghia commented Sep 13, 2022

Similar to #11525 that has just been fixed, I discovered new failures with ORC reader/writer.

Note that these failures are wrong days, not wrong seconds like previously reported in #11525.

cpu = datetime.datetime(740, 7, 19, 10, 16, 58, 929621)
gpu = datetime.datetime(740, 7, 23, 10, 16, 58, 929621)

cpu = datetime.datetime(487, 4, 15, 11, 13, 37, 361058)
gpu = datetime.datetime(487, 4, 16, 11, 13, 37, 361058)

cpu = datetime.datetime(740, 7, 19, 10, 16, 58, 929621)
gpu = datetime.datetime(740, 7, 23, 10, 16, 58, 929621)

cpu = datetime.datetime(1132, 8, 4, 22, 50, 7, 153267)
gpu = datetime.datetime(1132, 8, 11, 22, 50, 7, 153267)

cpu = datetime.datetime(1348, 1, 31, 3, 21, 2, 422651)
gpu = datetime.datetime(1348, 2, 8, 3, 21, 2, 422651)

cpu = datetime.datetime(487, 4, 15, 11, 13, 37, 361058)
gpu = datetime.datetime(487, 4, 16, 11, 13, 37, 361058)

cpu = datetime.datetime(1348, 1, 31, 3, 21, 2, 422651)
gpu = datetime.datetime(1348, 2, 8, 3, 21, 2, 422651)

cpu = datetime.datetime(1132, 8, 4, 22, 50, 7, 153267)
gpu = datetime.datetime(1132, 8, 11, 22, 50, 7, 153267)

cpu = datetime.datetime(487, 4, 15, 11, 13, 37, 361058)
gpu = datetime.datetime(487, 4, 16, 11, 13, 37, 361058)

cpu = datetime.datetime(740, 7, 19, 10, 16, 58, 929621)
gpu = datetime.datetime(740, 7, 23, 10, 16, 58, 929621)

cpu = datetime.datetime(1132, 8, 4, 22, 50, 7, 153267)
gpu = datetime.datetime(1132, 8, 11, 22, 50, 7, 153267)

cpu = datetime.datetime(1348, 1, 31, 3, 21, 2, 422651)
gpu = datetime.datetime(1348, 2, 8, 3, 21, 2, 422651)

cpu = datetime.datetime(740, 7, 19, 10, 16, 58, 929621)
gpu = datetime.datetime(740, 7, 23, 10, 16, 58, 929621)

cpu = datetime.datetime(740, 7, 19, 10, 16, 58, 929621)
gpu = datetime.datetime(740, 7, 23, 10, 16, 58, 929621)

cpu = datetime.datetime(1132, 8, 4, 22, 50, 7, 153267)
gpu = datetime.datetime(1132, 8, 11, 22, 50, 7, 153267)

cpu = datetime.datetime(740, 7, 19, 10, 16, 58, 929621)
gpu = datetime.datetime(740, 7, 23, 10, 16, 58, 929621)

cpu = datetime.datetime(487, 4, 15, 11, 13, 37, 361058)
gpu = datetime.datetime(487, 4, 16, 11, 13, 37, 361058)

cpu = datetime.datetime(740, 7, 19, 10, 16, 58, 929621)
gpu = datetime.datetime(740, 7, 23, 10, 16, 58, 929621)

cpu = datetime.datetime(1348, 1, 31, 3, 21, 2, 422651)
gpu = datetime.datetime(1348, 2, 8, 3, 21, 2, 422651)

cpu = datetime.datetime(487, 4, 15, 11, 13, 37, 361058)
gpu = datetime.datetime(487, 4, 16, 11, 13, 37, 361058)

cpu = datetime.datetime(487, 4, 15, 11, 13, 37, 361058)
gpu = datetime.datetime(487, 4, 16, 11, 13, 37, 361058)

cpu = datetime.datetime(487, 4, 15, 11, 13, 37, 361058)
gpu = datetime.datetime(487, 4, 16, 11, 13, 37, 361058)

cpu = datetime.datetime(1348, 1, 31, 3, 21, 2, 422651)
gpu = datetime.datetime(1348, 2, 8, 3, 21, 2, 422651)

cpu = datetime.datetime(740, 7, 19, 10, 16, 58, 929621)
gpu = datetime.datetime(740, 7, 23, 10, 16, 58, 929621)

cpu = datetime.datetime(833, 6, 4, 10, 18, 10, 135672)
gpu = datetime.datetime(833, 6, 8, 10, 18, 10, 135672)

cpu = datetime.datetime(1348, 1, 31, 3, 21, 2, 422651)
gpu = datetime.datetime(1348, 2, 8, 3, 21, 2, 422651)

cpu = datetime.datetime(740, 7, 19, 10, 16, 58, 929621)
gpu = datetime.datetime(740, 7, 23, 10, 16, 58, 929621)

cpu = datetime.datetime(487, 4, 15, 11, 13, 37, 361058)
gpu = datetime.datetime(487, 4, 16, 11, 13, 37, 361058)

cpu = datetime.datetime(487, 4, 15, 11, 13, 37, 361058)
gpu = datetime.datetime(487, 4, 16, 11, 13, 37, 361058)

cpu = datetime.datetime(740, 7, 19, 10, 16, 58, 929621)
gpu = datetime.datetime(740, 7, 23, 10, 16, 58, 929621)

cpu = datetime.datetime(487, 4, 15, 11, 13, 37, 361058)
gpu = datetime.datetime(487, 4, 16, 11, 13, 37, 361058)

cpu = datetime.datetime(833, 6, 4, 10, 18, 10, 135672)
gpu = datetime.datetime(833, 6, 8, 10, 18, 10, 135672)

cpu = datetime.datetime(833, 6, 4, 10, 18, 10, 135672)
gpu = datetime.datetime(833, 6, 8, 10, 18, 10, 135672)

cpu = datetime.datetime(1348, 1, 31, 3, 21, 2, 422651)
gpu = datetime.datetime(1348, 2, 8, 3, 21, 2, 422651)

cpu = datetime.datetime(833, 6, 4, 10, 18, 10, 135672)
gpu = datetime.datetime(833, 6, 8, 10, 18, 10, 135672)

cpu = datetime.datetime(1132, 8, 4, 22, 50, 7, 153267)
gpu = datetime.datetime(1132, 8, 11, 22, 50, 7, 153267)

cpu = datetime.datetime(740, 7, 19, 10, 16, 58, 929621)
gpu = datetime.datetime(740, 7, 23, 10, 16, 58, 929621)

cpu = datetime.datetime(1348, 1, 31, 3, 21, 2, 422651)
gpu = datetime.datetime(1348, 2, 8, 3, 21, 2, 422651)

cpu = datetime.datetime(1348, 1, 31, 3, 21, 2, 422651)
gpu = datetime.datetime(1348, 2, 8, 3, 21, 2, 422651)

cpu = datetime.datetime(1132, 8, 4, 22, 50, 7, 153267)
gpu = datetime.datetime(1132, 8, 11, 22, 50, 7, 153267)

cpu = datetime.datetime(487, 4, 15, 11, 13, 37, 361058)
gpu = datetime.datetime(487, 4, 16, 11, 13, 37, 361058)

cpu = datetime.datetime(1348, 1, 31, 3, 21, 2, 422651)
gpu = datetime.datetime(1348, 2, 8, 3, 21, 2, 422651)

cpu = datetime.datetime(1348, 1, 31, 3, 21, 2, 422651)
gpu = datetime.datetime(1348, 2, 8, 3, 21, 2, 422651)

cpu = datetime.datetime(833, 6, 4, 10, 18, 10, 135672)
gpu = datetime.datetime(833, 6, 8, 10, 18, 10, 135672)

cpu = datetime.datetime(740, 7, 19, 10, 16, 58, 929621)
gpu = datetime.datetime(740, 7, 23, 10, 16, 58, 929621)

cpu = datetime.datetime(487, 4, 15, 11, 13, 37, 361058)
gpu = datetime.datetime(487, 4, 16, 11, 13, 37, 361058)

cpu = datetime.datetime(833, 6, 4, 10, 18, 10, 135672)
gpu = datetime.datetime(833, 6, 8, 10, 18, 10, 135672)

cpu = datetime.datetime(1348, 1, 31, 3, 21, 2, 422651)
gpu = datetime.datetime(1348, 2, 8, 3, 21, 2, 422651)
@ttnghia ttnghia added bug Something isn't working Needs Triage Need team to review and classify labels Sep 13, 2022
@github-actions github-actions bot added this to Needs prioritizing in Bug Squashing Sep 13, 2022
@ttnghia
Copy link
Contributor Author

ttnghia commented Sep 13, 2022

@vuule @GregoryKimball Can you test with pandas for the examples above, please? cpu =... should be the ground true.

@ttnghia ttnghia changed the title [BUG] ORC read/write is wrong in some corner case(s) [BUG] ORC read/write is wrong in days values in some corner case(s) Sep 13, 2022
@ttnghia ttnghia changed the title [BUG] ORC read/write is wrong in days values in some corner case(s) [BUG] ORC read/write is wrong in day values in some corner case(s) Sep 13, 2022
@jlowe
Copy link
Member

jlowe commented Sep 13, 2022

Curious, does this only happen for dates before the Julian-Gregorian calendar transition in 1582?

@ttnghia
Copy link
Contributor Author

ttnghia commented Sep 13, 2022

Yeah the tests failed only for years before that year.

@vuule vuule changed the title [BUG] ORC read/write is wrong in day values in some corner case(s) [BUG] ORC read/write is wrong in day values in pre-1582 datetime values Sep 13, 2022
@wence-
Copy link
Contributor

wence- commented Sep 14, 2022

Yeah the tests failed only for years before that year.

This is probably more complicated because the transition took ballpark 300 years, so really one needs to interpret the time relative to a location and government to be precise (timezone is not enough if it is only numeric because, for example, Finland has the same timezone as Moscow, but switched calendar 200 years earlier).

@vuule vuule added this to Issue-Needs prioritizing in v22.12 Release via automation Sep 19, 2022
@vuule vuule self-assigned this Sep 22, 2022
@GregoryKimball GregoryKimball added 0 - Backlog In queue waiting for assignment cuIO cuIO issue and removed Needs Triage Need team to review and classify labels Oct 21, 2022
@vuule vuule removed this from Issue-Needs prioritizing in v22.12 Release Nov 8, 2022
@vuule vuule added this to Issue-Needs prioritizing in v23.02 Release via automation Nov 8, 2022
@vuule vuule moved this from Issue-Needs prioritizing to Issue-P1 in v23.02 Release Nov 16, 2022
@GregoryKimball GregoryKimball added the libcudf Affects libcudf (C++/CUDA) code. label Apr 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0 - Backlog In queue waiting for assignment bug Something isn't working cuIO cuIO issue libcudf Affects libcudf (C++/CUDA) code.
Projects
No open projects
Bug Squashing
Needs prioritizing
v23.02 Release
  
Issue-P1
Development

No branches or pull requests

5 participants