Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix!: POSIXct without timezone and naive time handling #878

Merged
merged 17 commits into from
Mar 2, 2024

Conversation

eitsupi
Copy link
Collaborator

@eitsupi eitsupi commented Mar 1, 2024

Fix #875

The latest version (0.14.1):

Sys.setenv(TZ = "America/New_York")

rdf = data.frame(
  non_exsitent_time = as.POSIXct("2020-03-08 02:00:00"),
  ambiguous_time = as.POSIXct("2020-11-01 01:00:00")
)
rdf
#>     non_exsitent_time      ambiguous_time
#> 1 2020-03-08 01:00:00 2020-11-01 01:00:00

pldf = polars::as_polars_df(rdf)
pldf
#> shape: (1, 2)
#> ┌─────────────────────┬─────────────────────┐
#> │ non_exsitent_time   ┆ ambiguous_time      │
#> │ ---                 ┆ ---                 │
#> │ datetime[ms]        ┆ datetime[ms]        │
#> ╞═════════════════════╪═════════════════════╡
#> │ 2020-03-08 06:00:00 ┆ 2020-11-01 05:00:00 │
#> └─────────────────────┴─────────────────────┘

as.data.frame(pldf)
#>     non_exsitent_time      ambiguous_time
#> 1 2020-03-08 01:00:00 2020-11-01 01:00:00

This PR:

Sys.setenv(TZ = "America/New_York")

rdf = data.frame(
  non_exsitent_time = as.POSIXct("2020-03-08 02:00:00"),
  ambiguous_time = as.POSIXct("2020-11-01 01:00:00")
)
rdf
#>     non_exsitent_time      ambiguous_time
#> 1 2020-03-08 01:00:00 2020-11-01 01:00:00

pldf = polars::as_polars_df(rdf)
pldf
#> shape: (1, 2)
#> ┌─────────────────────┬─────────────────────┐
#> │ non_exsitent_time   ┆ ambiguous_time      │
#> │ ---                 ┆ ---                 │
#> │ datetime[ms]        ┆ datetime[ms]        │
#> ╞═════════════════════╪═════════════════════╡
#> │ 2020-03-08 01:00:00 ┆ 2020-11-01 01:00:00 │
#> └─────────────────────┴─────────────────────┘

In the above example, an error occurs when converting Polars to R.
Perhaps something similar to the Int64 type conversion option should be added to the arguments and global options.

@eitsupi eitsupi changed the title fix: naive time handling fix!: naive time handling Mar 1, 2024
@eitsupi eitsupi added this to the 0.15 milestone Mar 2, 2024
@eitsupi
Copy link
Collaborator Author

eitsupi commented Mar 2, 2024

Perhaps the POSIXct to Polars conversion also needs to be modified.
Currently (0.14.1) I am seeing some rather strange behavior when specifying a time that does not exist.

Sys.setenv(TZ = "America/New_York")
datetime = as.POSIXct("2020-03-08 02:00:00")
datetime
#> [1] "2020-03-08 01:00:00 EST"

clock::as_naive_time(datetime)
#> <naive_time<second>[1]>
#> [1] "2020-03-08T01:00:00"

s = polars::as_polars_series(datetime)
s
#> polars Series: shape: (1,)
#> Series: '' [datetime[ms]]
#> [
#>  2020-03-08 06:00:00
#> ]

as.vector(s)
#> [1] "2020-03-08 01:00:00 EST"

Created on 2024-03-02 with reprex v2.1.0

Perhaps it needs to behave like clock::as_naive_time() here.

@eitsupi
Copy link
Collaborator Author

eitsupi commented Mar 2, 2024

Perhaps it needs to behave like clock::as_naive_time() here.

I think the following process is needed.

polars::as_polars_series(datetime)$dt$replace_time_zone(
  "UTC"
)$dt$convert_time_zone(
  Sys.timezone()
)$dt$replace_time_zone(NULL)

@eitsupi eitsupi changed the title fix!: naive time handling fix!: POSIXct without timezone and naive time handling Mar 2, 2024
@eitsupi eitsupi marked this pull request as ready for review March 2, 2024 11:01
@etiennebacher
Copy link
Collaborator

Note that for the example in your first post, this is what I get with this PR:

Sys.setenv(TZ = "America/New_York")

rdf = data.frame(
  non_exsitent_time = as.POSIXct("2020-03-08 02:00:00"),
  ambiguous_time = as.POSIXct("2020-11-01 01:00:00")
)
rdf

  non_exsitent_time      ambiguous_time
1        2020-03-08 2020-11-01 01:00:00

pldf = polars::as_polars_df(rdf)
pldf

shape: (1, 2)
┌─────────────────────┬─────────────────────┐
│ non_exsitent_timeambiguous_time      │
│ ------                 │
│ datetime[ms]        ┆ datetime[ms]        │
╞═════════════════════╪═════════════════════╡
│ 2020-03-08 00:00:002020-11-01 01:00:00 │
└─────────────────────┴─────────────────────┘

The column ambiguous_time is the same but not non_exsitent_time.

Copy link
Collaborator

@etiennebacher etiennebacher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message following as.data.frame(pldf) says Please use `ambiguous` to tell how it should be localized but it's not clear to me where I should specify this arg

NEWS.md Show resolved Hide resolved
@eitsupi
Copy link
Collaborator Author

eitsupi commented Mar 2, 2024

The column ambiguous_time is the same but not non_exsitent_time.

Interesting. Perhaps the environment changes the display of non-existent time?

@eitsupi
Copy link
Collaborator Author

eitsupi commented Mar 2, 2024

The error message following as.data.frame(pldf) says Please use `ambiguous` to tell how it should be localized but it's not clear to me where I should specify this arg

As I mentioned in the PR description, I would like to do this in a follow-up PR. It is also related to updates in other parts, for example, there is no option in as.vector() to instruct how to convert Int64.

@etiennebacher
Copy link
Collaborator

Perhaps the environment changes the display of non-existent time?

How can I test that? The timezone is manually set so what else could change the display?

@eitsupi
Copy link
Collaborator Author

eitsupi commented Mar 2, 2024

How can I test that? The timezone is manually set so what else could change the display?

It may not be possible to get consistent results on different platforms as this behavior varies from platform to platform.
https://stackoverflow.com/questions/39526527/is-there-a-reliable-way-to-detect-posixlt-objects-representing-a-time-which-does

@eitsupi eitsupi marked this pull request as draft March 2, 2024 13:26
@eitsupi eitsupi marked this pull request as ready for review March 2, 2024 13:27
R/dataframe__frame.R Outdated Show resolved Hide resolved
R/dataframe__frame.R Outdated Show resolved Hide resolved
R/dataframe__frame.R Show resolved Hide resolved
R/dataframe__frame.R Outdated Show resolved Hide resolved
Copy link
Collaborator

@etiennebacher etiennebacher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, just one last comment to clarify why this datetime doesn't exist

eitsupi and others added 2 commits March 3, 2024 01:42
Co-authored-by: Etienne Bacher <52219252+etiennebacher@users.noreply.github.com>
@eitsupi
Copy link
Collaborator Author

eitsupi commented Mar 2, 2024

Thanks for the detailed review!

@eitsupi eitsupi merged commit 62fb7aa into main Mar 2, 2024
24 checks passed
@eitsupi eitsupi deleted the naive-time-handling branch March 2, 2024 16:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Tests for clock conversion fail
2 participants