New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bigqueryNullFloat64 add +Inf, -Inf and NaN support #2574
Conversation
Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please visit https://cla.developers.google.com/ to sign. Once you've signed (or fixed any issues), please reply here with What to do if you already signed the CLAIndividual signers
Corporate signers
ℹ️ Googlers: Go here for more info. |
@googlebot I signed it! |
@googlebot I signed it! |
@shollyman I corrected the original commit not having the proper commit email and ensured the email is also registered with my GitHub account. Not sure why it's still saying CLA is not signed, perhaps needs to be force ran again? Please let me know if anything else is required that I've missed so I can correct it :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for putting this together, apologies on my slowness getting to it.
I left a discussion comment in the change itself. Assuming you agree, we could simplify your list of literals and remove some of the ambiguous matching.
bigquery/nulls.go
Outdated
@@ -181,6 +196,12 @@ func (n *NullFloat64) UnmarshalJSON(b []byte) error { | |||
n.Float64 = 0 | |||
if bytes.Equal(b, jsonNull) { | |||
return nil | |||
} else if bytes.HasPrefix(b, posInf) || bytes.HasPrefix(b, posInf2) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was initially confused by the HasPrefix, but I'm guessing it's because you're trying to anticipate both the short and long forms (Inf / Infinity)?
Confusingly, BQ returns the string-quoted value "Infinity", "-Infinity", and "Nan" in the api responses for the related float values within row data, but documents the short (and uncapitalized) cases in the sql reference, which wouldn't be captured with this prefix logic: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#floating_point_types
In order to avoid getting into parsing confusion and anticipating all the corner cases, my thinking is that this change should only codify the values that are safe for round-tripping as both input and outputs, e.g. the ones that come out of the row data. You're defining the contract for how NullFloat64 is encoded and decoded, but users can still do their own encoding with a custom type/marshaler if they so choose (and want to rely on BigQuery converting as appropriate).
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @shollyman
I agree that HasPrefix
might lead to corner cases and makes sense to change that to equality checks instead :) will do that shortly.
I was also using HasPrefix
for two other reasons:
- When Go itself prints values for
Infinity
,-Infinity
it does so using the short form+Inf
and-Inf
(which the BigQuery api accepts happily) - The source of some data I'm parsing was also using the short form that was produced by Go.
To avoid a heap of customization in my, and others, code base' I was hoping to keep the unmarshaling short form support.
What do you think about :
- Changing this to equality checks only, no
HasPrefix
to avoid any corned cases. - Change to always Marshal the long form of these values
Infinity
,-Infinity
andNaN
- Keep the Unmarshal support for the short form equality checks as well as the long form. Doing this will avoid people having to create a custom type to support the way Go itself, and other languages, output this kind of data.
I can't see any harm also supporting the short form for unmarshaling purposes only, thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems reasonable. It's definitely the case that BQ is following the robustness principle in terms of parsing the variation, so it's reasonable we include that when unmarshalling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks will make the changes and ping you in the PR once done.
- Updated PR for equality checks instead of HasPefix. - Added dedicated NullFloat64 tests.
CLAs look good, thanks! ℹ️ Googlers: Go here for more info. |
@shollyman I updated the PR based on our conversation and added dedicated tests for please let me know if there's anything else I can do to get this merged :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for putting this together. I'll go ahead and get this submitted.
One last item: can you edit your PR to allow edits by maintainers? Neither I nor our automation bots can update the branch to merge it currently. |
@shollyman I had to create a new PR because GitHub doesn't allow setting edits by maintainers on organization forks, only user forks. (may want to mention this in your contributing guidelines) Will close this one and the new PR is here just merged them to my user fork #2618 |
Adds an explicit mention to enable "Allow edits by maintainers" when setting up the pull request. In PR googleapis#2574 (external contribution) it was discovered this cannot be added after the fact for org-based forks, only user-based. As there's not a great reference link for this in the github docs that doesn't descend into a bunch of organization permissions management, I'm not calling this specific circumstance explicitly.
This is a new PR for closed PR #2574 because you can only allow edits for maintainers on user-forks and NOT organization forks.
@shollyman can you please explain why this gets reviewed and merged and not #2408? I first submitted this change a year ago (https://code-review.googlesource.com/c/gocloud/+/44750) and even then left with no review — and was the first to submit anything when you migrated to GitHub. Feels pretty demotivating to come up with any further changes if they just get ignored! |
Adds an explicit mention to enable "Allow edits by maintainers" when setting up the pull request. In PR #2574 (external contribution) it was discovered this cannot be added after the fact for org-based forks, only user-based. As there's not a great reference link for this in the github docs that doesn't descend into a bunch of organization permissions management, I'm not calling this specific circumstance explicitly.
@buyology My apologies for overlooking your PR (twice, sadly in this case). I recently modified how I consume github notifications, hence my attention on a newer PR. If I've overlooked other feedback/contributions from you, feel free to reach out via email (same username @google.com) to discuss. |
This is a new PR for closed PR googleapis#2574 because you can only allow edits for maintainers on user-forks and NOT organization forks.
Adds an explicit mention to enable "Allow edits by maintainers" when setting up the pull request. In PR googleapis#2574 (external contribution) it was discovered this cannot be added after the fact for org-based forks, only user-based. As there's not a great reference link for this in the github docs that doesn't descend into a bunch of organization permissions management, I'm not calling this specific circumstance explicitly.
The following is a PR to fix issue #2575 to enable support for
+Inf
,-Inf
andNan
which BigQuery supports but Go's json package does not.This is related to PR #2408 but implements the fix slightly differently supporting short form
+Inf
and-Inf
so less bytes have to go across the wire.