SQL dataset does not parse strings as datetimes #459

cselig · 2019-05-02T22:01:35Z

SQL (unlike other datasets) does not currently apply the parsing function before computing mins/maxes. That means that from a column containing the following data, the following mins are selected:

["2/1/2016", "2/2/2016", "2/2/2016", "10/1/2016", "1/2/2017", "10/1/2015"]
Pandas min: "10/1/2015"
SQL min: "1/2/2017"

This appears to not be captured in the current tests.

jcampbell · 2019-05-07T13:48:58Z

I think the primary issue going on here is the ambiguous behavior of parse_strings_as_datetimes also discussed in #422

For example, with the dataset you provide and a datetime-type column, the current behavior is as expected; the problem only occurs when processing a text column that you want GE to convert. Since that is the semantics of parse_strings_as_datetimes in pandas, I think it's reasonable to change, but may not be straightforward across different sql implementations. I'll look at that now.

Further, you're definitely right that there's no negative case datetime test for min/max, so that's an easy win to get the behavior well documented.

jcampbell · 2019-05-07T14:29:40Z

Unfortunately, it looks to me like the difference is definitely implementation-specific. With a postgres backend, the current behavior is actually also as-expected (i.e. the return of min(col) is 10/1/2015, even for a text column).

I think the right option may be to disallow parse_strings_as_datetimes for cases where the column is not already a datetime column. Since I know you're working on spark -- would that work in that case?

cselig · 2019-05-08T05:09:40Z

Ah I see. Pretty sure that would be possible in Spark; we can discuss further in our call tomorrow. Thanks!

stale · 2020-03-11T11:54:36Z

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

jcampbell added a commit that referenced this issue May 8, 2019

Add guard around parse_strings_as_datetimes in sqlalchemy (#459)

c109b69

eugmandel mentioned this issue Jan 17, 2020

Fixed expect_column_min_to_be_between and expect_column_max_to_be_between to handle datetimes #993

Merged

stale bot added the wontfix label Mar 11, 2020

Aylr removed the wontfix label Mar 13, 2020

Aylr added the stale Stale issues and PRs label May 16, 2020

github-actions bot closed this as completed May 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SQL dataset does not parse strings as datetimes #459

SQL dataset does not parse strings as datetimes #459

cselig commented May 2, 2019 •

edited by Aylr

jcampbell commented May 7, 2019

jcampbell commented May 7, 2019

cselig commented May 8, 2019

stale bot commented Mar 11, 2020

SQL dataset does not parse strings as datetimes #459

SQL dataset does not parse strings as datetimes #459

Comments

cselig commented May 2, 2019 • edited by Aylr

jcampbell commented May 7, 2019

jcampbell commented May 7, 2019

cselig commented May 8, 2019

stale bot commented Mar 11, 2020

cselig commented May 2, 2019 •

edited by Aylr