Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: converting to dataframe with out of bounds timestamps #209

Merged
merged 2 commits into from Aug 15, 2020

Conversation

@plamut
Copy link
Contributor

@plamut plamut commented Aug 1, 2020

Fixes #168.

This PR fixes the problem when converting query results to Pandas with pyarrow when data contains timestamps that would fall out of pyarrow's nanoseconds precision.

The fix requires pyarrow>=1.0.0, thus it only works on Python 3.

PR checklist

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)
@plamut plamut requested a review from tswast Aug 1, 2020
@google-cla google-cla bot added the cla: yes label Aug 1, 2020
@plamut
Copy link
Contributor Author

@plamut plamut commented Aug 1, 2020

@tswast There is inconsistency with the existing date_as_object option that is exposed to the users, while the timestamp_as_object option is hidden. Let me know if you want to unify these two approaches to a similar problem.

Loading

Copy link
Contributor

@tswast tswast left a comment

Thanks!

Regarding date_as_object, it's a little different in that case, because it doesn't throw an error for dates. They just come back as strings if it's not set.

If we do provide timestamp_as_object, I think it needs to be 3 states:

  • (default) the behavior in this fix
  • (explicitly false) let the error happen, since they want to use native pandas Timestamp (maybe for performance reasons)
  • (explicitly true) always convert to datetime objects

Loading

google/cloud/bigquery/table.py Outdated Show resolved Hide resolved
Loading
@plamut plamut marked this pull request as ready for review Aug 6, 2020
@plamut plamut requested review from tswast and shollyman Aug 6, 2020
@plamut
Copy link
Contributor Author

@plamut plamut commented Aug 6, 2020

Let me know if I should also add an explicit timestamp_as_object parameter as envisioned by @tswast, or should we leave it out from this fix and (maybe) add it in a separate feature PR.

Loading

tswast
tswast approved these changes Aug 6, 2020
Copy link
Contributor

@tswast tswast left a comment

Thanks!

I think we can wait for a separate PR for the timestamp_as_object parameter feature.

Loading

@gcf-merge-on-green gcf-merge-on-green bot merged commit 8209203 into googleapis:master Aug 15, 2020
10 checks passed
Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

2 participants