-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BQ update error: Cannot add required fields to an existing schema. (field: application), invalid #63
Comments
This is related to the following documentation: https://cloud.google.com/bigquery/docs/managing-table-schemas#manually_adding_an_empty_column Apparently, it's not possible to add a required column to an existing table. This actually makes sense -- fields that are currently not in the schema can't possible be added as a required field down the line because the existing pings would fail to include those required fields. This makes migrating the first_shutdown ping from the empty schema to the full main ping schema tricky, it probably needs to be blown away unfortunately. We don't have a good story to deal with this error. |
This sounds awful (but does make some sense), because it means the upgrade path for going from schema-less processing to schema'd processing will usually involve an incompatible schema change. We should mention this in the Monday technical sync but perhaps the number of cases where we actually end up creating schemas for previously schema-less pings will be rather small. The other outstanding one I believe is the sync ping. |
I filed Filed https://enterprise.google.com/supportcenter/managecases#Case/001000000040sBR/U-20348571 to inquire about any performance implications of relaxing all fields to NULLABLE. It is worth noting that we already do not support REQUIRED columns for tables populated via queries in bigquery-etl. AFAICT, when a new table is created as a result of a query job with createDisposition CREATE_IF_NEEDED, all fields are nullable. I am not aware of any way to mark fields in a query as required. Thus, if we decide to drop REQUIRED mode for all tables populated by the pipeline, we will be no worse off than our derived tables. Also, I believe we would be able to proceed with that change without removing any existing tables, since BQ allows relaxing of mode from REQUIRED to NULLABLE for existing fields. |
I filed mozilla/jsonschema-transpiler#84 for this feature.
This is true, but the implementation of mozilla/jsonschema-transpiler#84 does then block the deployment of BQ tables in the interim. |
Got a response back from the google support case, which makes it sound unlikely that we'll suffer any performance issues from making everything nullable. I'm inclined to move ahead with the change.
[1] https://cloud.google.com/bigquery/query-plan-explanation |
We've deployed this to stage and the first shutdown ping looks plenty complicated now, so this looks to have been effective. |
This is in the process of going from an "empty" schema to one copied via #58. I haven't seen this error before and I'm not sure what's up and how to resolve it. Generally if the upgrade path from "schemaless" to "schemad" pings involves an incompatible schema change that would be no bueno, but it may be specific to this case.
The text was updated successfully, but these errors were encountered: