Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify what date formats are inferred for CSV (4 digits year, big and small endian only) #17426

Open
sm-Fifteen opened this issue Jul 4, 2024 · 0 comments
Labels
A-timeseries Area: date/time functionality documentation Improvements or additions to documentation

Comments

@sm-Fifteen
Copy link

Description

The try_parse_dates parameter of pl.read_csv is documented by saying "Most ISO8601-like formats can be inferred, as well as a handful of others.", with no further clarification of what it may or may not try to recognize. The user guide also mentions it, but doesn't explain what is covered either.

I believe format support should be in Polars's doc. If you look at Pandas in general, one its main pitfalls is that it tries to be automatically helpful, which is good for interactive use, but may lead to incorrect results when you're not looking. For dates especially, a wrong guess can end up completely mangling your data. Polars instead explicitly restricts what it will automatically parse to completely unambiguous formats, which is good, but if that safety isn't communicated to users, they might shy away from trying to use it.

In an unrelated discussion on the Pandas tracker with @MarcoGorelli, regarding Pandas' date handling, I found out that Polars only tried (at the time) a handful of big-endian, 4-digits-year date formats. In the time since, it has been modified to also try a number of little-endian 4-digits-year formats. This is also not currently mentionned anywhere I could find.

Link

https://docs.pola.rs/api/python/stable/reference/api/polars.read_csv.html#polars-read-csv

@sm-Fifteen sm-Fifteen added the documentation Improvements or additions to documentation label Jul 4, 2024
@MarcoGorelli MarcoGorelli added the A-timeseries Area: date/time functionality label Jul 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-timeseries Area: date/time functionality documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

2 participants