Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

str.to_datetime - should support date format "%Y.%m.%d" # 2023.12.31 #16115

Closed
lmocsi opened this issue May 8, 2024 · 9 comments · Fixed by #16634
Closed

str.to_datetime - should support date format "%Y.%m.%d" # 2023.12.31 #16115

lmocsi opened this issue May 8, 2024 · 9 comments · Fixed by #16634
Labels
accepted Ready for implementation enhancement New feature or an improvement of an existing feature P-low Priority: low

Comments

@lmocsi
Copy link

lmocsi commented May 8, 2024

Description

I'd like to see
%Y.%m.%d" # 2023.12.31
date format added to polars.Expr.str.to_datetime.

This would then support the date notation in Hungary (see also Wikipedia article).

(The time format is similar to other countries, eg %H:%M:%S, %H:%M.)

@lmocsi lmocsi added the enhancement New feature or an improvement of an existing feature label May 8, 2024
@JulianCologne
Copy link
Contributor

JulianCologne commented May 8, 2024

hmmm 🤔

not sure how to make this possible without confusion.

according to wiki there are also countries with

  • YYYY.dd.mm (Kazakhstan)

If we have a date like 2020.02.01 this would be valid for

  • YYYY.dd.mm: 1st Feb
  • YYYY.mm.dd: 2nd Jan

How to resolve this?? ❓

Also according to wiki the format is

  • NOT YYYY.mm.dd
  • but instead has some "whitespace" and an extra dot (.): YYYY. mm. (d)d.

@alexander-beedie
Copy link
Collaborator

alexander-beedie commented May 8, 2024

Note that str.to_datetime does support that format if you supply it explicitly; are you asking if we can add it as one of the automatically-inferred formats? 🇭🇺

df = pl.DataFrame({
    "dt_string": [
        "2023.12.31 10:30:45",
        "1999.01.20 00:01:02",
        "1967.07.05 23:59:59",
    ],
})
df.with_columns(
    dt=pl.col("dt_string").str.to_datetime("%Y.%m.%d %H:%M:%S"),
)
# shape: (3, 2)
# ┌─────────────────────┬─────────────────────┐
# │ dt_string           ┆ dt                  │
# │ ---                 ┆ ---                 │
# │ str                 ┆ datetime[μs]        │
# ╞═════════════════════╪═════════════════════╡
# │ 2023.12.31 10:30:45 ┆ 2023-12-31 10:30:45 │
# │ 1999.01.20 00:01:02 ┆ 1999-01-20 00:01:02 │
# │ 1967.07.05 23:59:59 ┆ 1967-07-05 23:59:59 │
# └─────────────────────┴─────────────────────┘

@lmocsi
Copy link
Author

lmocsi commented May 8, 2024

How do you currently resolve 01.02.2020?
This is valid for:

  • dd.mm.YYYY: 1st Feb
  • mm.dd.YYYY: 2nd Jan

This is exactly the same problem, but with the year at the beginning.
(Or you can replace the dots with slashes and you get to the controversion between UK and US date formats: https://english.stackexchange.com/questions/68844/date-format-in-uk-vs-us)

And yes, I'm thinking of adding it to the automatically-inferred formats.

Actually if looking just logically at the patterns.rs file it is definitely missing from there: in the first block you have all 3 separators (dot, dash, slash), but only two of them in the second block (dash and slash), and I'm asking for the dot there, as well (in the name of all Hungarians :) ).

pub(super) static DATE_D_M_Y: &[&str] = &[
"%d.%m.%Y", // 31.12.2021
"%d-%m-%Y", // 31-12-2021
"%d/%m/%Y", // 31/12/2021
];

pub(super) static DATE_Y_M_D: &[&str] = &[
"%Y/%m/%d", // 2021/12/31
"%Y-%m-%d", // 2021-12-31
];

@MarcoGorelli
Copy link
Collaborator

looks fine to add, want to make a PR (with test)?

@lmocsi
Copy link
Author

lmocsi commented May 8, 2024

Not really, never done that.

@MarcoGorelli MarcoGorelli added the P-low Priority: low label May 8, 2024
@JulianCologne
Copy link
Contributor

How do you currently resolve 01.02.2020?

This is valid for:

  • dd.mm.YYYY: 1st Feb

  • mm.dd.YYYY: 2nd Jan

I added dd.mm.YYYY a few days back. According to my research this is unambiguous because there is no mm.dd.YYYY. Formats with month at the start have the "dash" separator.

I can add this format and a test in the coming days 😉

@alexander-beedie
Copy link
Collaborator

I can add this format and a test in the coming days 😉

Nice 😎 Looks well worth having to me - we should probably ensure that all of our common/default inference patterns work with -, /, and . separators...

@alexander-beedie alexander-beedie added the accepted Ready for implementation label May 9, 2024
@lmocsi
Copy link
Author

lmocsi commented May 28, 2024

When is this expected to land in deployment?

@JulianCologne
Copy link
Contributor

sorry for the delay, was on vacation 😜
Should have something ready this week 😉
Once / if the PR is accepted it will be in the following release. So I guess ~2 weeks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted Ready for implementation enhancement New feature or an improvement of an existing feature P-low Priority: low
Projects
Archived in project
4 participants