-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SQL dataset does not parse strings as datetimes #459
Comments
I think the primary issue going on here is the ambiguous behavior of For example, with the dataset you provide and a Further, you're definitely right that there's no negative case datetime test for min/max, so that's an easy win to get the behavior well documented. |
Unfortunately, it looks to me like the difference is definitely implementation-specific. With a postgres backend, the current behavior is actually also as-expected (i.e. the return of min(col) is 10/1/2015, even for a text column). I think the right option may be to disallow parse_strings_as_datetimes for cases where the column is not already a datetime column. Since I know you're working on spark -- would that work in that case? |
Ah I see. Pretty sure that would be possible in Spark; we can discuss further in our call tomorrow. Thanks! |
Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward? This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. |
SQL (unlike other datasets) does not currently apply the parsing function before computing mins/maxes. That means that from a column containing the following data, the following mins are selected:
["2/1/2016", "2/2/2016", "2/2/2016", "10/1/2016", "1/2/2017", "10/1/2015"]
Pandas min: "10/1/2015"
SQL min: "1/2/2017"
This appears to not be captured in the current tests.
The text was updated successfully, but these errors were encountered: