Skip to content

Conversation

@jsrinnn
Copy link
Contributor

@jsrinnn jsrinnn commented Sep 27, 2023

This PR includes the following fixes for the core Job library, found during benchmark testing of the InsertJob api

  • Fixed 64-bit parsing of the "maximum_bytes_billed" field
  • Added support for removing keys from the Json payload that are not relevant for the api.
    This is especially needed for insertjob api as it errors out if json fields with empty values are supplied.
    For read-only apis we do want the json payload to return all fields.
  • Modified validation logic in BuildRestRequest method for InsertJobRquest. The assumptions before were not valid.
  • Modified InsertJob() rest stub to filter json payload to only keys that are needed for that api.
  • Corresponding unit test changes for all the above changes.

The above has been tested end to end with a benchmark program. I'll send the benchmark program changes after the above is merged.


This change is Reviewable

@jsrinnn jsrinnn requested a review from a team as a code owner September 27, 2023 21:28
@jsrinnn jsrinnn temporarily deployed to internal September 27, 2023 21:28 — with GitHub Actions Inactive
@product-auto-label product-auto-label bot added the api: bigquery Issues related to the BigQuery API. label Sep 27, 2023
@codecov
Copy link

codecov bot commented Sep 27, 2023

Codecov Report

All modified lines are covered by tests ✅

Comparison is base (b312cb4) 93.60% compared to head (e0550e6) 93.60%.
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #12746      +/-   ##
==========================================
- Coverage   93.60%   93.60%   -0.01%     
==========================================
  Files        2075     2075              
  Lines      182377   182417      +40     
==========================================
+ Hits       170715   170750      +35     
- Misses      11662    11667       +5     
Files Coverage Δ
...ery/v2/minimal/internal/job_configuration_query.cc 100.00% <100.00%> (ø)
.../cloud/bigquery/v2/minimal/internal/job_request.cc 85.89% <100.00%> (ø)
...e/cloud/bigquery/v2/minimal/internal/job_request.h 98.51% <100.00%> (+0.01%) ⬆️
...d/bigquery/v2/minimal/internal/job_request_test.cc 99.72% <100.00%> (ø)
.../bigquery/v2/minimal/internal/job_response_test.cc 100.00% <ø> (ø)
...loud/bigquery/v2/minimal/internal/job_rest_stub.cc 100.00% <100.00%> (ø)
...gle/cloud/bigquery/v2/minimal/internal/job_test.cc 100.00% <ø> (ø)
...e/cloud/bigquery/v2/minimal/internal/json_utils.cc 100.00% <100.00%> (ø)
...le/cloud/bigquery/v2/minimal/internal/json_utils.h 100.00% <ø> (ø)
...ud/bigquery/v2/minimal/internal/json_utils_test.cc 100.00% <100.00%> (ø)

... and 3 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@dbolduc dbolduc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is unclear to me how we plan to use the filter keys. If the filter key business was in a separate PR, I could approve the other changes. 🤷

Comment on lines 93 to 95
// Have this redundant check due to compile error that
// depth parameter is not being used.
if (depth < 0) return false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Can you say int /* depth */, ... in the signature?

If not, I'd change this line to (void)depth;

to_json(json_payload, request.job());

auto filtered_json = RemoveJsonKeysAndEmptyFields(json_payload.dump(),
request.json_filter_keys());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the record, I don't think we ever request.set_json_filter_keys(...) so we are not removing any keys in this PR.

}

nlohmann::json RemoveJsonKeysAndEmptyFields(std::string const& json_payload,
std::vector<std::string> keys) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm so we are passing these keys in by value, but using them as a reference.

The time complexity is O(N*M) where N is number of keys, and M is number of fields.

I do not know how many keys you plan to skip (and whether they show up multiple times or not).

If the keys only show up once and you know their order, you could use something like a stack, where you discard the top element once you have found it. That would only be O(M).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is N like... 2 or 3?

if (depth < 0) return false;

if (event == nlohmann::json::parse_event_t::key) {
auto const key_found = std::any_of(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This loop makes me nervous. See above.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I might rename key_found to something like skip_key

if (event == nlohmann::json::parse_event_t::key) {
auto const key_found = std::any_of(
keys.begin(), keys.end(),
[&parsed](std::string const& key) { return (parsed == key); });
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove parentheses around the comparison

Comment on lines 103 to 107
if (event == nlohmann::json::parse_event_t::object_end) {
if (parsed == nullptr || parsed.empty()) return false;
}
if (event == nlohmann::json::parse_event_t::array_end) {
if (parsed.empty()) return false;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be clearer to always return from each event-type block, rather than allowing any fall-through.

So, in the object_end case, for example, ...

    return parsed != nullptr && !parsed.empty();

Copy link
Contributor Author

@jsrinnn jsrinnn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the usage is in the benchmark PR which I'll be sending next.

I was given the feedback to separate out the benchmark changes and core library changes. Hence two separate PRs.

Addressed all feedback. PTAL.

Thank you Darren and Bradley!

Reviewable status: 0 of 10 files reviewed, 4 unresolved discussions (waiting on @dbolduc and @devbww)


google/cloud/bigquery/v2/minimal/internal/job_rest_stub.cc line 68 at r1 (raw file):

Previously, dbolduc (Darren Bolduc) wrote…

for the record, I don't think we ever request.set_json_filter_keys(...) so we are not removing any keys in this PR.

That is being done in the benchmark tests. I am separating out the benchmark and the core library changes as was requested in the previous PR. I'll send the benchmark PR after this one.


google/cloud/bigquery/v2/minimal/internal/json_utils.cc line 89 at r1 (raw file):

Previously, dbolduc (Darren Bolduc) wrote…

Is N like... 2 or 3?

N is less than 10 and is usually not repeated. I do not know their order.

I have modified the signature to not pass by value.


google/cloud/bigquery/v2/minimal/internal/json_utils.cc line 95 at r1 (raw file):

Previously, dbolduc (Darren Bolduc) wrote…

nit: Can you say int /* depth */, ... in the signature?

If not, I'd change this line to (void)depth;

Done

Ah...nice. Thank you. I wasn't aware of the void syntax


google/cloud/bigquery/v2/minimal/internal/json_utils.cc line 98 at r1 (raw file):

Previously, dbolduc (Darren Bolduc) wrote…

Also I might rename key_found to something like skip_key

The benchmark testing results were comparable to the current version (of the driver) with the same input parameters. But I'll kepp this in mind in case we need to optimize.

key_found is the output of std::any_of which according to the predicate indicates whether key was found or not.

If the key is found we want to return false (i.e skip it)

I can add a new variable here called skip_key = !key_found and return that so the semantics are clear that we are skipping the key incase key is found. The act of skipping the key involves returning false.

If you have a better alternative keeping the above semantics do let me know.


google/cloud/bigquery/v2/minimal/internal/json_utils.cc line 100 at r1 (raw file):

Previously, dbolduc (Darren Bolduc) wrote…

nit: remove parentheses around the comparison

Done


google/cloud/bigquery/v2/minimal/internal/json_utils.cc line 108 at r1 (raw file):

Previously, devbww (Bradley White) wrote…

It would be clearer to always return from each event-type block, rather than allowing any fall-through.

So, in the object_end case, for example, ...

    return parsed != nullptr && !parsed.empty();

Done.

Thank you. Made the change here and to the next branch.

@jsrinnn jsrinnn temporarily deployed to internal September 28, 2023 15:07 — with GitHub Actions Inactive
@jsrinnn jsrinnn force-pushed the bigquery_v2_minimal_apis_benchmarks_insert_job branch from ea8165e to 5581524 Compare September 28, 2023 15:08
@jsrinnn jsrinnn temporarily deployed to internal September 28, 2023 15:08 — with GitHub Actions Inactive
Copy link
Member

@dbolduc dbolduc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 7 of 10 files at r1, 2 of 3 files at r2, all commit messages.
Reviewable status: 9 of 11 files reviewed, 5 unresolved discussions (waiting on @devbww and @jsrinnn)


google/cloud/bigquery/v2/minimal/benchmarks/job_insert_benchmark_programs.cc line 0 at r2 (raw file):
I think we should still leave this out. I did not review it.


google/cloud/bigquery/v2/minimal/internal/job_rest_stub.cc line 68 at r1 (raw file):

Previously, jsrinnn wrote…

That is being done in the benchmark tests. I am separating out the benchmark and the core library changes as was requested in the previous PR. I'll send the benchmark PR after this one.

Ohhh... duh.... these requests come from the application. Sorry!


google/cloud/bigquery/v2/minimal/internal/json_utils.cc line 89 at r1 (raw file):

Previously, jsrinnn wrote…

N is less than 10 and is usually not repeated. I do not know their order.

I have modified the signature to not pass by value.

Thanks, and SGTM.


google/cloud/bigquery/v2/minimal/internal/json_utils.cc line 98 at r1 (raw file):

Previously, jsrinnn wrote…

The benchmark testing results were comparable to the current version (of the driver) with the same input parameters. But I'll kepp this in mind in case we need to optimize.

key_found is the output of std::any_of which according to the predicate indicates whether key was found or not.

If the key is found we want to return false (i.e skip it)

I can add a new variable here called skip_key = !key_found and return that so the semantics are clear that we are skipping the key incase key is found. The act of skipping the key involves returning false.

If you have a better alternative keeping the above semantics do let me know.

The naming is backwards now. As written, if skip_key is true, we keep the key. Let me try again with the naming....

I think we should just say:

        if (event == nlohmann::json::parse_event_t::key) {
          auto const discard = std::any_of(
              keys.begin(), keys.end(),
              [&parsed](std::string const& key) { return parsed == key; });
          return !discard;
        }

My naming comment (nit) was about remembering what we are trying to do when a key is found. Someone reading the code will know that std::any_of returns true when a match is found.

The only context is the name of the function, which says "RemoveJsonKeys...". We can add a little bit more context in the variable name.


google/cloud/bigquery/v2/minimal/internal/json_utils_test.cc line 210 at r2 (raw file):

TEST(JsonUtilsTest, RemoveKeys) {
  std::vector<std::string> keys = {"start_time", "dataset_id"};
  auto const* json_text =

optional style nit: I think we tend to say auto constexpr kJsonText = ... instead of using auto const*. I don't actually know the difference.

Copy link
Contributor Author

@jsrinnn jsrinnn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed all feedback. Thank you, Darren!

Reviewable status: 9 of 11 files reviewed, 5 unresolved discussions (waiting on @dbolduc and @devbww)


google/cloud/bigquery/v2/minimal/benchmarks/job_insert_benchmark_programs.cc line at r2 (raw file):

Previously, dbolduc (Darren Bolduc) wrote…

I think we should still leave this out. I did not review it.

Done.

Whoops. That got included with the latest push by mistake, when I was testing the PR with the latest changes. Removed.

Thank you for pointing it out!!


google/cloud/bigquery/v2/minimal/internal/json_utils.cc line 89 at r1 (raw file):

Previously, dbolduc (Darren Bolduc) wrote…

Thanks, and SGTM.

Ack.


google/cloud/bigquery/v2/minimal/internal/json_utils.cc line 98 at r1 (raw file):

Previously, dbolduc (Darren Bolduc) wrote…

The naming is backwards now. As written, if skip_key is true, we keep the key. Let me try again with the naming....

I think we should just say:

        if (event == nlohmann::json::parse_event_t::key) {
          auto const discard = std::any_of(
              keys.begin(), keys.end(),
              [&parsed](std::string const& key) { return parsed == key; });
          return !discard;
        }

My naming comment (nit) was about remembering what we are trying to do when a key is found. Someone reading the code will know that std::any_of returns true when a match is found.

The only context is the name of the function, which says "RemoveJsonKeys...". We can add a little bit more context in the variable name.

Done.

Correct. if skip_key is true then we need to skip the key but the return statement becomes confusing with return !skip_key. That is what I was trying to initally point out.

Regardless, I have made the change as I understand your pov as well.


google/cloud/bigquery/v2/minimal/internal/json_utils.cc line 108 at r1 (raw file):

Previously, jsrinnn wrote…

Done.

Thank you. Made the change here and to the next branch.

Just clarifying the comment above. I meant the next "if" condition with array_end and not any code branch.


google/cloud/bigquery/v2/minimal/internal/json_utils_test.cc line 210 at r2 (raw file):

Previously, dbolduc (Darren Bolduc) wrote…

optional style nit: I think we tend to say auto constexpr kJsonText = ... instead of using auto const*. I don't actually know the difference.

Done.

made changes here and everwhere else in this file.

@jsrinnn jsrinnn temporarily deployed to internal September 28, 2023 17:04 — with GitHub Actions Inactive
Copy link
Contributor Author

@jsrinnn jsrinnn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed all feedback. Thank you, Darren!

Reviewable status: 8 of 11 files reviewed, 5 unresolved discussions (waiting on @dbolduc and @devbww)

@jsrinnn jsrinnn enabled auto-merge (squash) September 28, 2023 17:11
@jsrinnn jsrinnn merged commit 6ed7976 into googleapis:main Sep 28, 2023
@jsrinnn jsrinnn deleted the bigquery_v2_minimal_apis_benchmarks_insert_job branch September 28, 2023 20:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api: bigquery Issues related to the BigQuery API.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants