Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQL dataset does not parse strings as datetimes #459

Closed
cselig opened this issue May 2, 2019 · 4 comments
Closed

SQL dataset does not parse strings as datetimes #459

cselig opened this issue May 2, 2019 · 4 comments
Labels
stale Stale issues and PRs

Comments

@cselig
Copy link
Contributor

cselig commented May 2, 2019

SQL (unlike other datasets) does not currently apply the parsing function before computing mins/maxes. That means that from a column containing the following data, the following mins are selected:

["2/1/2016", "2/2/2016", "2/2/2016", "10/1/2016", "1/2/2017", "10/1/2015"]
Pandas min: "10/1/2015"
SQL min: "1/2/2017"

This appears to not be captured in the current tests.

@jcampbell
Copy link
Member

I think the primary issue going on here is the ambiguous behavior of parse_strings_as_datetimes also discussed in #422

For example, with the dataset you provide and a datetime-type column, the current behavior is as expected; the problem only occurs when processing a text column that you want GE to convert. Since that is the semantics of parse_strings_as_datetimes in pandas, I think it's reasonable to change, but may not be straightforward across different sql implementations. I'll look at that now.

Further, you're definitely right that there's no negative case datetime test for min/max, so that's an easy win to get the behavior well documented.

@jcampbell
Copy link
Member

Unfortunately, it looks to me like the difference is definitely implementation-specific. With a postgres backend, the current behavior is actually also as-expected (i.e. the return of min(col) is 10/1/2015, even for a text column).

I think the right option may be to disallow parse_strings_as_datetimes for cases where the column is not already a datetime column. Since I know you're working on spark -- would that work in that case?

@cselig
Copy link
Contributor Author

cselig commented May 8, 2019

Ah I see. Pretty sure that would be possible in Spark; we can discuss further in our call tomorrow. Thanks!

@stale
Copy link

stale bot commented Mar 11, 2020

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

@stale stale bot added the wontfix label Mar 11, 2020
@Aylr Aylr removed the wontfix label Mar 13, 2020
@Aylr Aylr added the stale Stale issues and PRs label May 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale Stale issues and PRs
Projects
None yet
Development

No branches or pull requests

3 participants