BQ update error: Cannot add required fields to an existing schema. (field: application), invalid #63

whd · 2019-08-16T21:44:54Z

* google_bigquery_table.telemetry_first_shutdown_v4_live: googleapi: Error 400: Provided Schema does not match Table moz-fx-data-shar-nonprod-efed:telemetry_live.first_shutdown_v4. Cannot add required fields to an existing schema. (field: application), invalid
* google_bigquery_table.telemetry_first_shutdown_v4_stable: googleapi: Error 400: Provided Schema does not match Table moz-fx-data-shar-nonprod-efed:telemetry_stable.first_shutdown_v4. Cannot add required fields to an existing schema. (field: application), invalid

This is in the process of going from an "empty" schema to one copied via #58. I haven't seen this error before and I'm not sure what's up and how to resolve it. Generally if the upgrade path from "schemaless" to "schemad" pings involves an incompatible schema change that would be no bueno, but it may be specific to this case.

The text was updated successfully, but these errors were encountered:

acmiyaguchi · 2019-08-16T21:52:49Z

This is related to the following documentation: https://cloud.google.com/bigquery/docs/managing-table-schemas#manually_adding_an_empty_column

Apparently, it's not possible to add a required column to an existing table. This actually makes sense -- fields that are currently not in the schema can't possible be added as a required field down the line because the existing pings would fail to include those required fields.

This makes migrating the first_shutdown ping from the empty schema to the full main ping schema tricky, it probably needs to be blown away unfortunately. We don't have a good story to deal with this error.

whd · 2019-08-16T21:57:51Z

This sounds awful (but does make some sense), because it means the upgrade path for going from schema-less processing to schema'd processing will usually involve an incompatible schema change. We should mention this in the Monday technical sync but perhaps the number of cases where we actually end up creating schemas for previously schema-less pings will be rather small.

The other outstanding one I believe is the sync ping.

jklukas · 2019-08-19T18:43:03Z

I filed Filed https://enterprise.google.com/supportcenter/managecases#Case/001000000040sBR/U-20348571 to inquire about any performance implications of relaxing all fields to NULLABLE.

It is worth noting that we already do not support REQUIRED columns for tables populated via queries in bigquery-etl. AFAICT, when a new table is created as a result of a query job with createDisposition CREATE_IF_NEEDED, all fields are nullable. I am not aware of any way to mark fields in a query as required.

Thus, if we decide to drop REQUIRED mode for all tables populated by the pipeline, we will be no worse off than our derived tables. Also, I believe we would be able to proceed with that change without removing any existing tables, since BQ allows relaxing of mode from REQUIRED to NULLABLE for existing fields.

whd · 2019-08-19T18:52:10Z

I filed mozilla/jsonschema-transpiler#84 for this feature.

Also, I believe we would be able to proceed with that change without removing any existing tables, since BQ allows relaxing of mode from REQUIRED to NULLABLE for existing fields.

This is true, but the implementation of mozilla/jsonschema-transpiler#84 does then block the deployment of BQ tables in the interim.

jklukas · 2019-08-20T20:04:27Z

Got a response back from the google support case, which makes it sound unlikely that we'll suffer any performance issues from making everything nullable. I'm inclined to move ahead with the change.

The Engineering team has advised me that the difference in query performance between 'REQUIRED' and 'NULLABLE' is expected to be very small.

In intensive filtering queries, the REQUIRED MODE would be slightly more optimal than NULLABLE since it contains one less state--NULL. In conclusion, it would really depend on how much weight your query is placing on the NULL state.

FYI, BigQuery permits you to relax the schema field MODE from 'REQUIRED' to 'NULLABLE' and not vice versa. Therefore initializing your schema with 'REQUIRED' would be a safe choice for your upcoming roll-out because its modifiable anytime to NULLABLE (and not vice versa.)

Lastly, you can run a benchmarking test by running two queries and then comparing the query performance information as outlined in [1].

In general, it is recommended to follow the documented best practices [2] in order to achieve optimal performance. As long as you are not using any of the documented anti-patterns, you should not have to worry about BigQuery’s query planning performance.

I hope the information I have provided you with has fulfilled your inquiries. I will be holding this case open for you for the next few days to give you time to investigate the above. If you have any additional questions or comments concerning the inquiry I will be happy to assist you further.

I hope this message finds you well, and that you have a wonderful day.

[1] https://cloud.google.com/bigquery/query-plan-explanation
[2] https://cloud.google.com/bigquery/docs/best-practices-performance-patterns

whd · 2019-08-20T23:03:01Z

We've deployed this to stage and the first shutdown ping looks plenty complicated now, so this looks to have been effective.

whd mentioned this issue Aug 19, 2019

Add option to make all fields nullable for BQ schemas mozilla/jsonschema-transpiler#84

Closed

acmiyaguchi mentioned this issue Aug 20, 2019

Bump jsonschema-transpiler to v1.4.0 and force nullable schemas #66

Merged

whd closed this as completed Aug 20, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BQ update error: Cannot add required fields to an existing schema. (field: application), invalid #63

BQ update error: Cannot add required fields to an existing schema. (field: application), invalid #63

whd commented Aug 16, 2019

acmiyaguchi commented Aug 16, 2019

whd commented Aug 16, 2019

jklukas commented Aug 19, 2019

whd commented Aug 19, 2019

jklukas commented Aug 20, 2019

whd commented Aug 20, 2019

BQ update error: Cannot add required fields to an existing schema. (field: application), invalid #63

BQ update error: Cannot add required fields to an existing schema. (field: application), invalid #63

Comments

whd commented Aug 16, 2019

acmiyaguchi commented Aug 16, 2019

whd commented Aug 16, 2019

jklukas commented Aug 19, 2019

whd commented Aug 19, 2019

jklukas commented Aug 20, 2019

whd commented Aug 20, 2019