fix(bigquery): Job Library fixes for InsertJob api #12746

jsrinnn · 2023-09-27T21:28:36Z

This PR includes the following fixes for the core Job library, found during benchmark testing of the InsertJob api

Fixed 64-bit parsing of the "maximum_bytes_billed" field
Added support for removing keys from the Json payload that are not relevant for the api.
This is especially needed for insertjob api as it errors out if json fields with empty values are supplied.
For read-only apis we do want the json payload to return all fields.
Modified validation logic in BuildRestRequest method for InsertJobRquest. The assumptions before were not valid.
Modified InsertJob() rest stub to filter json payload to only keys that are needed for that api.
Corresponding unit test changes for all the above changes.

The above has been tested end to end with a benchmark program. I'll send the benchmark program changes after the above is merged.

This change is

codecov · 2023-09-27T21:50:20Z

Codecov Report

All modified lines are covered by tests ✅

Comparison is base (b312cb4) 93.60% compared to head (e0550e6) 93.60%.
Report is 1 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main   #12746      +/-   ##
==========================================
- Coverage   93.60%   93.60%   -0.01%     
==========================================
  Files        2075     2075              
  Lines      182377   182417      +40     
==========================================
+ Hits       170715   170750      +35     
- Misses      11662    11667       +5

Files	Coverage Δ
...ery/v2/minimal/internal/job_configuration_query.cc	`100.00% <100.00%> (ø)`
.../cloud/bigquery/v2/minimal/internal/job_request.cc	`85.89% <100.00%> (ø)`
...e/cloud/bigquery/v2/minimal/internal/job_request.h	`98.51% <100.00%> (+0.01%)`	⬆️
...d/bigquery/v2/minimal/internal/job_request_test.cc	`99.72% <100.00%> (ø)`
.../bigquery/v2/minimal/internal/job_response_test.cc	`100.00% <ø> (ø)`
...loud/bigquery/v2/minimal/internal/job_rest_stub.cc	`100.00% <100.00%> (ø)`
...gle/cloud/bigquery/v2/minimal/internal/job_test.cc	`100.00% <ø> (ø)`
...e/cloud/bigquery/v2/minimal/internal/json_utils.cc	`100.00% <100.00%> (ø)`
...le/cloud/bigquery/v2/minimal/internal/json_utils.h	`100.00% <ø> (ø)`
...ud/bigquery/v2/minimal/internal/json_utils_test.cc	`100.00% <100.00%> (ø)`

... and 3 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

dbolduc

It is unclear to me how we plan to use the filter keys. If the filter key business was in a separate PR, I could approve the other changes. 🤷

dbolduc · 2023-09-28T03:05:08Z

google/cloud/bigquery/v2/minimal/internal/json_utils.cc

+        // Have this redundant check due to compile error that
+        // depth parameter is not being used.
+        if (depth < 0) return false;


nit: Can you say int /* depth */, ... in the signature?

If not, I'd change this line to (void)depth;

dbolduc · 2023-09-28T03:18:32Z

google/cloud/bigquery/v2/minimal/internal/job_rest_stub.cc

  to_json(json_payload, request.job());

+  auto filtered_json = RemoveJsonKeysAndEmptyFields(json_payload.dump(),
+                                                    request.json_filter_keys());


for the record, I don't think we ever request.set_json_filter_keys(...) so we are not removing any keys in this PR.

dbolduc · 2023-09-28T03:22:49Z

google/cloud/bigquery/v2/minimal/internal/json_utils.cc

 }

+nlohmann::json RemoveJsonKeysAndEmptyFields(std::string const& json_payload,
+                                            std::vector<std::string> keys) {


Hm so we are passing these keys in by value, but using them as a reference.

The time complexity is O(N*M) where N is number of keys, and M is number of fields.

I do not know how many keys you plan to skip (and whether they show up multiple times or not).

If the keys only show up once and you know their order, you could use something like a stack, where you discard the top element once you have found it. That would only be O(M).

Is N like... 2 or 3?

dbolduc · 2023-09-28T03:23:48Z

google/cloud/bigquery/v2/minimal/internal/json_utils.cc

+        if (depth < 0) return false;
+
+        if (event == nlohmann::json::parse_event_t::key) {
+          auto const key_found = std::any_of(


This loop makes me nervous. See above.

Also I might rename key_found to something like skip_key

dbolduc · 2023-09-28T03:24:43Z

google/cloud/bigquery/v2/minimal/internal/json_utils.cc

+        if (event == nlohmann::json::parse_event_t::key) {
+          auto const key_found = std::any_of(
+              keys.begin(), keys.end(),
+              [&parsed](std::string const& key) { return (parsed == key); });


nit: remove parentheses around the comparison

devbww · 2023-09-28T03:59:18Z

google/cloud/bigquery/v2/minimal/internal/json_utils.cc

+        if (event == nlohmann::json::parse_event_t::object_end) {
+          if (parsed == nullptr || parsed.empty()) return false;
+        }
+        if (event == nlohmann::json::parse_event_t::array_end) {
+          if (parsed.empty()) return false;
+        }


It would be clearer to always return from each event-type block, rather than allowing any fall-through.

So, in the object_end case, for example, ...

return parsed != nullptr && !parsed.empty();

jsrinnn

Yes, the usage is in the benchmark PR which I'll be sending next.

I was given the feedback to separate out the benchmark changes and core library changes. Hence two separate PRs.

Addressed all feedback. PTAL.

Thank you Darren and Bradley!

Reviewable status: 0 of 10 files reviewed, 4 unresolved discussions (waiting on @dbolduc and @devbww)

google/cloud/bigquery/v2/minimal/internal/job_rest_stub.cc line 68 at r1 (raw file):

Previously, dbolduc (Darren Bolduc) wrote…

for the record, I don't think we ever request.set_json_filter_keys(...) so we are not removing any keys in this PR.

That is being done in the benchmark tests. I am separating out the benchmark and the core library changes as was requested in the previous PR. I'll send the benchmark PR after this one.

google/cloud/bigquery/v2/minimal/internal/json_utils.cc line 89 at r1 (raw file):

Previously, dbolduc (Darren Bolduc) wrote…

Is N like... 2 or 3?

N is less than 10 and is usually not repeated. I do not know their order.

I have modified the signature to not pass by value.

google/cloud/bigquery/v2/minimal/internal/json_utils.cc line 95 at r1 (raw file):

Previously, dbolduc (Darren Bolduc) wrote…

nit: Can you say int /* depth */, ... in the signature?

If not, I'd change this line to (void)depth;

Done

Ah...nice. Thank you. I wasn't aware of the void syntax

google/cloud/bigquery/v2/minimal/internal/json_utils.cc line 98 at r1 (raw file):

Previously, dbolduc (Darren Bolduc) wrote…

Also I might rename key_found to something like skip_key

The benchmark testing results were comparable to the current version (of the driver) with the same input parameters. But I'll kepp this in mind in case we need to optimize.

key_found is the output of std::any_of which according to the predicate indicates whether key was found or not.

If the key is found we want to return false (i.e skip it)

I can add a new variable here called skip_key = !key_found and return that so the semantics are clear that we are skipping the key incase key is found. The act of skipping the key involves returning false.

If you have a better alternative keeping the above semantics do let me know.

google/cloud/bigquery/v2/minimal/internal/json_utils.cc line 100 at r1 (raw file):

Previously, dbolduc (Darren Bolduc) wrote…

nit: remove parentheses around the comparison

Done

google/cloud/bigquery/v2/minimal/internal/json_utils.cc line 108 at r1 (raw file):

Previously, devbww (Bradley White) wrote…

It would be clearer to always return from each event-type block, rather than allowing any fall-through.

So, in the object_end case, for example, ...
    return parsed != nullptr && !parsed.empty();

Done.

Thank you. Made the change here and to the next branch.

dbolduc

Reviewed 7 of 10 files at r1, 2 of 3 files at r2, all commit messages.
Reviewable status: 9 of 11 files reviewed, 5 unresolved discussions (waiting on @devbww and @jsrinnn)

google/cloud/bigquery/v2/minimal/benchmarks/job_insert_benchmark_programs.cc line 0 at r2 (raw file):
I think we should still leave this out. I did not review it.

google/cloud/bigquery/v2/minimal/internal/job_rest_stub.cc line 68 at r1 (raw file):

Previously, jsrinnn wrote…

That is being done in the benchmark tests. I am separating out the benchmark and the core library changes as was requested in the previous PR. I'll send the benchmark PR after this one.

Ohhh... duh.... these requests come from the application. Sorry!

google/cloud/bigquery/v2/minimal/internal/json_utils.cc line 89 at r1 (raw file):

Previously, jsrinnn wrote…

N is less than 10 and is usually not repeated. I do not know their order.

I have modified the signature to not pass by value.

Thanks, and SGTM.

google/cloud/bigquery/v2/minimal/internal/json_utils.cc line 98 at r1 (raw file):

Previously, jsrinnn wrote…

The benchmark testing results were comparable to the current version (of the driver) with the same input parameters. But I'll kepp this in mind in case we need to optimize.

key_found is the output of std::any_of which according to the predicate indicates whether key was found or not.

If the key is found we want to return false (i.e skip it)

I can add a new variable here called skip_key = !key_found and return that so the semantics are clear that we are skipping the key incase key is found. The act of skipping the key involves returning false.

If you have a better alternative keeping the above semantics do let me know.

The naming is backwards now. As written, if skip_key is true, we keep the key. Let me try again with the naming....

I think we should just say:

        if (event == nlohmann::json::parse_event_t::key) {
          auto const discard = std::any_of(
              keys.begin(), keys.end(),
              [&parsed](std::string const& key) { return parsed == key; });
          return !discard;
        }

My naming comment (nit) was about remembering what we are trying to do when a key is found. Someone reading the code will know that std::any_of returns true when a match is found.

The only context is the name of the function, which says "RemoveJsonKeys...". We can add a little bit more context in the variable name.

google/cloud/bigquery/v2/minimal/internal/json_utils_test.cc line 210 at r2 (raw file):

TEST(JsonUtilsTest, RemoveKeys) {
  std::vector<std::string> keys = {"start_time", "dataset_id"};
  auto const* json_text =

optional style nit: I think we tend to say auto constexpr kJsonText = ... instead of using auto const*. I don't actually know the difference.

jsrinnn

Addressed all feedback. Thank you, Darren!

Reviewable status: 9 of 11 files reviewed, 5 unresolved discussions (waiting on @dbolduc and @devbww)

google/cloud/bigquery/v2/minimal/benchmarks/job_insert_benchmark_programs.cc line at r2 (raw file):

Previously, dbolduc (Darren Bolduc) wrote…

I think we should still leave this out. I did not review it.

Done.

Whoops. That got included with the latest push by mistake, when I was testing the PR with the latest changes. Removed.

Thank you for pointing it out!!

google/cloud/bigquery/v2/minimal/internal/json_utils.cc line 89 at r1 (raw file):

Previously, dbolduc (Darren Bolduc) wrote…

Thanks, and SGTM.

Ack.

google/cloud/bigquery/v2/minimal/internal/json_utils.cc line 98 at r1 (raw file):

Previously, dbolduc (Darren Bolduc) wrote…

The naming is backwards now. As written, if skip_key is true, we keep the key. Let me try again with the naming....

I think we should just say:
        if (event == nlohmann::json::parse_event_t::key) {
          auto const discard = std::any_of(
              keys.begin(), keys.end(),
              [&parsed](std::string const& key) { return parsed == key; });
          return !discard;
        }
My naming comment (nit) was about remembering what we are trying to do when a key is found. Someone reading the code will know that std::any_of returns true when a match is found.

The only context is the name of the function, which says "RemoveJsonKeys...". We can add a little bit more context in the variable name.

Done.

Correct. if skip_key is true then we need to skip the key but the return statement becomes confusing with return !skip_key. That is what I was trying to initally point out.

Regardless, I have made the change as I understand your pov as well.

google/cloud/bigquery/v2/minimal/internal/json_utils.cc line 108 at r1 (raw file):

Previously, jsrinnn wrote…

Done.

Thank you. Made the change here and to the next branch.

Just clarifying the comment above. I meant the next "if" condition with array_end and not any code branch.

google/cloud/bigquery/v2/minimal/internal/json_utils_test.cc line 210 at r2 (raw file):

Previously, dbolduc (Darren Bolduc) wrote…

optional style nit: I think we tend to say auto constexpr kJsonText = ... instead of using auto const*. I don't actually know the difference.

Done.

made changes here and everwhere else in this file.

jsrinnn

Addressed all feedback. Thank you, Darren!

Reviewable status: 8 of 11 files reviewed, 5 unresolved discussions (waiting on @dbolduc and @devbww)

jsrinnn requested a review from a team as a code owner September 27, 2023 21:28

jsrinnn temporarily deployed to internal September 27, 2023 21:28 — with GitHub Actions Inactive

product-auto-label bot added the api: bigquery Issues related to the BigQuery API. label Sep 27, 2023

dbolduc reviewed Sep 28, 2023

View reviewed changes

devbww reviewed Sep 28, 2023

View reviewed changes

jsrinnn commented Sep 28, 2023

View reviewed changes

jsrinnn temporarily deployed to internal September 28, 2023 15:07 — with GitHub Actions Inactive

jsrinnn added 2 commits September 28, 2023 15:07

fix(bigquery): Job Library fixes for InsertJob api

910526c

fix(bigquery): Addressed code review feedback

5581524

jsrinnn force-pushed the bigquery_v2_minimal_apis_benchmarks_insert_job branch from ea8165e to 5581524 Compare September 28, 2023 15:08

jsrinnn temporarily deployed to internal September 28, 2023 15:08 — with GitHub Actions Inactive

dbolduc requested changes Sep 28, 2023

View reviewed changes

fix(bigquery): Addressed more feedback

e0550e6

jsrinnn commented Sep 28, 2023

View reviewed changes

jsrinnn temporarily deployed to internal September 28, 2023 17:04 — with GitHub Actions Inactive

jsrinnn commented Sep 28, 2023

View reviewed changes

dbolduc approved these changes Sep 28, 2023

View reviewed changes

jsrinnn enabled auto-merge (squash) September 28, 2023 17:11

devbww approved these changes Sep 28, 2023

View reviewed changes

jsrinnn merged commit 6ed7976 into googleapis:main Sep 28, 2023

jsrinnn deleted the bigquery_v2_minimal_apis_benchmarks_insert_job branch September 28, 2023 20:53

fix(bigquery): Job Library fixes for InsertJob api #12746

fix(bigquery): Job Library fixes for InsertJob api #12746

Uh oh!

Conversation

jsrinnn commented Sep 27, 2023 • edited by coryan Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Sep 27, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

dbolduc left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jsrinnn left a comment

Choose a reason for hiding this comment

Uh oh!

dbolduc left a comment

Choose a reason for hiding this comment

Uh oh!

jsrinnn left a comment

Choose a reason for hiding this comment

Uh oh!

jsrinnn left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jsrinnn commented Sep 27, 2023 •

edited by coryan

Loading

codecov bot commented Sep 27, 2023 •

edited

Loading