-
Notifications
You must be signed in to change notification settings - Fork 432
fix(bigquery): Job Library fixes for InsertJob api #12746
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(bigquery): Job Library fixes for InsertJob api #12746
Conversation
Codecov ReportAll modified lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #12746 +/- ##
==========================================
- Coverage 93.60% 93.60% -0.01%
==========================================
Files 2075 2075
Lines 182377 182417 +40
==========================================
+ Hits 170715 170750 +35
- Misses 11662 11667 +5
☔ View full report in Codecov by Sentry. |
dbolduc
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is unclear to me how we plan to use the filter keys. If the filter key business was in a separate PR, I could approve the other changes. 🤷
| // Have this redundant check due to compile error that | ||
| // depth parameter is not being used. | ||
| if (depth < 0) return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Can you say int /* depth */, ... in the signature?
If not, I'd change this line to (void)depth;
| to_json(json_payload, request.job()); | ||
|
|
||
| auto filtered_json = RemoveJsonKeysAndEmptyFields(json_payload.dump(), | ||
| request.json_filter_keys()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for the record, I don't think we ever request.set_json_filter_keys(...) so we are not removing any keys in this PR.
| } | ||
|
|
||
| nlohmann::json RemoveJsonKeysAndEmptyFields(std::string const& json_payload, | ||
| std::vector<std::string> keys) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm so we are passing these keys in by value, but using them as a reference.
The time complexity is O(N*M) where N is number of keys, and M is number of fields.
I do not know how many keys you plan to skip (and whether they show up multiple times or not).
If the keys only show up once and you know their order, you could use something like a stack, where you discard the top element once you have found it. That would only be O(M).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is N like... 2 or 3?
| if (depth < 0) return false; | ||
|
|
||
| if (event == nlohmann::json::parse_event_t::key) { | ||
| auto const key_found = std::any_of( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This loop makes me nervous. See above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also I might rename key_found to something like skip_key
| if (event == nlohmann::json::parse_event_t::key) { | ||
| auto const key_found = std::any_of( | ||
| keys.begin(), keys.end(), | ||
| [&parsed](std::string const& key) { return (parsed == key); }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: remove parentheses around the comparison
| if (event == nlohmann::json::parse_event_t::object_end) { | ||
| if (parsed == nullptr || parsed.empty()) return false; | ||
| } | ||
| if (event == nlohmann::json::parse_event_t::array_end) { | ||
| if (parsed.empty()) return false; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be clearer to always return from each event-type block, rather than allowing any fall-through.
So, in the object_end case, for example, ...
return parsed != nullptr && !parsed.empty();
jsrinnn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the usage is in the benchmark PR which I'll be sending next.
I was given the feedback to separate out the benchmark changes and core library changes. Hence two separate PRs.
Addressed all feedback. PTAL.
Thank you Darren and Bradley!
Reviewable status: 0 of 10 files reviewed, 4 unresolved discussions (waiting on @dbolduc and @devbww)
google/cloud/bigquery/v2/minimal/internal/job_rest_stub.cc line 68 at r1 (raw file):
Previously, dbolduc (Darren Bolduc) wrote…
for the record, I don't think we ever
request.set_json_filter_keys(...)so we are not removing any keys in this PR.
That is being done in the benchmark tests. I am separating out the benchmark and the core library changes as was requested in the previous PR. I'll send the benchmark PR after this one.
google/cloud/bigquery/v2/minimal/internal/json_utils.cc line 89 at r1 (raw file):
Previously, dbolduc (Darren Bolduc) wrote…
Is N like... 2 or 3?
N is less than 10 and is usually not repeated. I do not know their order.
I have modified the signature to not pass by value.
google/cloud/bigquery/v2/minimal/internal/json_utils.cc line 95 at r1 (raw file):
Previously, dbolduc (Darren Bolduc) wrote…
nit: Can you say
int /* depth */, ...in the signature?If not, I'd change this line to
(void)depth;
Done
Ah...nice. Thank you. I wasn't aware of the void syntax
google/cloud/bigquery/v2/minimal/internal/json_utils.cc line 98 at r1 (raw file):
Previously, dbolduc (Darren Bolduc) wrote…
Also I might rename
key_foundto something likeskip_key
The benchmark testing results were comparable to the current version (of the driver) with the same input parameters. But I'll kepp this in mind in case we need to optimize.
key_found is the output of std::any_of which according to the predicate indicates whether key was found or not.
If the key is found we want to return false (i.e skip it)
I can add a new variable here called skip_key = !key_found and return that so the semantics are clear that we are skipping the key incase key is found. The act of skipping the key involves returning false.
If you have a better alternative keeping the above semantics do let me know.
google/cloud/bigquery/v2/minimal/internal/json_utils.cc line 100 at r1 (raw file):
Previously, dbolduc (Darren Bolduc) wrote…
nit: remove parentheses around the comparison
Done
google/cloud/bigquery/v2/minimal/internal/json_utils.cc line 108 at r1 (raw file):
Previously, devbww (Bradley White) wrote…
It would be clearer to always return from each event-type block, rather than allowing any fall-through.
So, in the
object_endcase, for example, ...return parsed != nullptr && !parsed.empty();
Done.
Thank you. Made the change here and to the next branch.
ea8165e to
5581524
Compare
dbolduc
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 7 of 10 files at r1, 2 of 3 files at r2, all commit messages.
Reviewable status: 9 of 11 files reviewed, 5 unresolved discussions (waiting on @devbww and @jsrinnn)
google/cloud/bigquery/v2/minimal/benchmarks/job_insert_benchmark_programs.cc line 0 at r2 (raw file):
I think we should still leave this out. I did not review it.
google/cloud/bigquery/v2/minimal/internal/job_rest_stub.cc line 68 at r1 (raw file):
Previously, jsrinnn wrote…
That is being done in the benchmark tests. I am separating out the benchmark and the core library changes as was requested in the previous PR. I'll send the benchmark PR after this one.
Ohhh... duh.... these requests come from the application. Sorry!
google/cloud/bigquery/v2/minimal/internal/json_utils.cc line 89 at r1 (raw file):
Previously, jsrinnn wrote…
N is less than 10 and is usually not repeated. I do not know their order.
I have modified the signature to not pass by value.
Thanks, and SGTM.
google/cloud/bigquery/v2/minimal/internal/json_utils.cc line 98 at r1 (raw file):
Previously, jsrinnn wrote…
The benchmark testing results were comparable to the current version (of the driver) with the same input parameters. But I'll kepp this in mind in case we need to optimize.
key_foundis the output ofstd::any_ofwhich according to the predicate indicates whether key was found or not.If the key is found we want to return false (i.e skip it)
I can add a new variable here called
skip_key = !key_foundand return that so the semantics are clear that we are skipping the key incase key is found. The act of skipping the key involves returning false.If you have a better alternative keeping the above semantics do let me know.
The naming is backwards now. As written, if skip_key is true, we keep the key. Let me try again with the naming....
I think we should just say:
if (event == nlohmann::json::parse_event_t::key) {
auto const discard = std::any_of(
keys.begin(), keys.end(),
[&parsed](std::string const& key) { return parsed == key; });
return !discard;
}My naming comment (nit) was about remembering what we are trying to do when a key is found. Someone reading the code will know that std::any_of returns true when a match is found.
The only context is the name of the function, which says "RemoveJsonKeys...". We can add a little bit more context in the variable name.
google/cloud/bigquery/v2/minimal/internal/json_utils_test.cc line 210 at r2 (raw file):
TEST(JsonUtilsTest, RemoveKeys) { std::vector<std::string> keys = {"start_time", "dataset_id"}; auto const* json_text =
optional style nit: I think we tend to say auto constexpr kJsonText = ... instead of using auto const*. I don't actually know the difference.
jsrinnn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed all feedback. Thank you, Darren!
Reviewable status: 9 of 11 files reviewed, 5 unresolved discussions (waiting on @dbolduc and @devbww)
google/cloud/bigquery/v2/minimal/benchmarks/job_insert_benchmark_programs.cc line at r2 (raw file):
Previously, dbolduc (Darren Bolduc) wrote…
I think we should still leave this out. I did not review it.
Done.
Whoops. That got included with the latest push by mistake, when I was testing the PR with the latest changes. Removed.
Thank you for pointing it out!!
google/cloud/bigquery/v2/minimal/internal/json_utils.cc line 89 at r1 (raw file):
Previously, dbolduc (Darren Bolduc) wrote…
Thanks, and SGTM.
Ack.
google/cloud/bigquery/v2/minimal/internal/json_utils.cc line 98 at r1 (raw file):
Previously, dbolduc (Darren Bolduc) wrote…
The naming is backwards now. As written, if
skip_keyis true, we keep the key. Let me try again with the naming....I think we should just say:
if (event == nlohmann::json::parse_event_t::key) { auto const discard = std::any_of( keys.begin(), keys.end(), [&parsed](std::string const& key) { return parsed == key; }); return !discard; }My naming comment (nit) was about remembering what we are trying to do when a key is found. Someone reading the code will know that
std::any_ofreturns true when a match is found.The only context is the name of the function, which says "RemoveJsonKeys...". We can add a little bit more context in the variable name.
Done.
Correct. if skip_key is true then we need to skip the key but the return statement becomes confusing with return !skip_key. That is what I was trying to initally point out.
Regardless, I have made the change as I understand your pov as well.
google/cloud/bigquery/v2/minimal/internal/json_utils.cc line 108 at r1 (raw file):
Previously, jsrinnn wrote…
Done.
Thank you. Made the change here and to the next branch.
Just clarifying the comment above. I meant the next "if" condition with array_end and not any code branch.
google/cloud/bigquery/v2/minimal/internal/json_utils_test.cc line 210 at r2 (raw file):
Previously, dbolduc (Darren Bolduc) wrote…
optional style nit: I think we tend to say
auto constexpr kJsonText = ...instead of usingauto const*. I don't actually know the difference.
Done.
made changes here and everwhere else in this file.
jsrinnn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed all feedback. Thank you, Darren!
Reviewable status: 8 of 11 files reviewed, 5 unresolved discussions (waiting on @dbolduc and @devbww)
This PR includes the following fixes for the core Job library, found during benchmark testing of the InsertJob api
This is especially needed for insertjob api as it errors out if json fields with empty values are supplied.
For read-only apis we do want the json payload to return all fields.
BuildRestRequestmethod for InsertJobRquest. The assumptions before were not valid.The above has been tested end to end with a benchmark program. I'll send the benchmark program changes after the above is merged.
This change is