BigQuery table schema update fails/tries to replace the table if a column gets removed #12441

nbali · 2022-08-31T18:38:24Z

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request.
Please do not leave +1 or me too comments, they generate extra noise for issue followers and do not help prioritize the request.
If you are interested in working on this issue or have submitted a pull request, please leave a comment.
If an issue is assigned to the modular-magician user, it is either in the process of being autogenerated, or is planned to be autogenerated soon. If an issue is assigned to a user, that user is claiming responsibility for the issue. If an issue is assigned to hashibot, a community member has claimed the issue already.

Terraform Version

$ terraform -v
Terraform v1.2.8
on windows_amd64
+ provider registry.terraform.io/hashicorp/google v4.34.0

Affected Resource(s)

google_bigquery_table

Terraform Configuration Files

original config

resource "google_bigquery_table" "test_table" {
  dataset_id = "dataset_id"
  table_id   = "table_id"
  schema = <<-EOF
  [
    {
      "name": "field1",
      "type": "STRING"
    },
    {
      "name": "field2",
      "type": "STRING"
    },
    {
      "name": "field3",
      "type": "STRING"
    }
  ]
  EOF
}

modified config

resource "google_bigquery_table" "test_table" {
  dataset_id = "dataset_id"
  table_id   = "table_id"
  schema = <<-EOF
  [
    {
      "name": "field1",
      "type": "STRING"
    },
    {
      "name": "field3",
      "type": "STRING"
    }
  ]
  EOF
}

Debug Output

Terraform used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
-/+ destroy and then create replacement

Terraform will perform the following actions:

  # *.google_bigquery_table.* must be replaced
-/+ resource "google_bigquery_table" "*" {
      ~ creation_time       = * -> (known after apply)
      ~ etag                = "*" -> (known after apply)
      ~ expiration_time     = 0 -> (known after apply)
      ~ id                  = "*" -> (known after apply)
      - labels              = {} -> null
      ~ last_modified_time  = * -> (known after apply)
      ~ location            = "EU" -> (known after apply)
      ~ num_bytes           = 0 -> (known after apply)
      ~ num_long_term_bytes = 0 -> (known after apply)
      ~ num_rows            = 0 -> (known after apply)
      ~ project             = "*" -> (known after apply)
      ~ schema              = jsonencode(
          ~ [ # forces replacement
                {
                    name = "field1"
                    type = "STRING"
                },
              - {
                  - name = "field2"
                  - type = "STRING"
                },
                {
                    name = "field3"
                    type = "STRING"
                },
            ]
        )
      ~ self_link           = "*" -> (known after apply)
      ~ type                = "TABLE" -> (known after apply)
        # (3 unchanged attributes hidden)
    }

Plan: 1 to add, 0 to change, 1 to destroy.

Error: googleapi: Error 409: Already Exists: Table *:*.*, duplicate

  with module.*.google_bigquery_table.*,
  on ../../modules/*/bigquery.tf line 157, in resource "google_bigquery_table" "*":
 157: resource "google_bigquery_table" "*" {

Panic Output

Expected Behavior

There should be a config ( for example ignore_unknown_columns ) to influence what to do with columns that has been removed from the TF config.

keep it untouched and ignore them from now on OR
current behaviour

At the 8th step in "Steps to Reproduce" terraform should either fail/try to replace it (if we have no new config, just like now), or it should successfully apply the changes, and still keep the columns in BQ that aren't present anymore in the TF schema (if the new config gets enabled).

Actual Behavior

At the 8th step in "Steps to Reproduce" terraform fails. See "Debug Output".

Steps to Reproduce

have an auto-generated bq schema (for example from proto, mysql cdc, whatever)
create a bigquery table config in terraform with that schema
tf apply (table gets created)
use that table and fill all the columns with data
get a warning from developers/etc that scripts require all those fields, so never remove them
the source of the auto-generated schema removes that field
auto-generated schema gets updated with one less column
tf apply

IMO this is a quite common pattern, that people mirror/ingest/capture something from another system into BQ, and just because the source of the data doesn't have/fill a certain column anymore, the users / scripts / etc might still need that field even if it contains values only for older entries.

Important Factoids

I wanted to emphasize here, that I was torn between this being a bug or an enhancement. I chose bug, because it prevent such a common use-case, and tries to solve it in a catastrophically wrong way (table replacement).

References

The text was updated successfully, but these errors were encountered:

edwardmedia · 2022-08-31T19:51:46Z

@nbali did you review the limitation on the schema below?

https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/bigquery_table#schema

nbali · 2022-08-31T20:05:43Z

@edwardmedia I did, and I get why is it like that, BUT I don't see how is that relevant on the long term though. IMO it's not a limitation that can't be worked around. It's just a limitation that hasn't been handled in a more thorough manner. YET. Both values are JSONs. They can be diffed, merged, etc. This config would require the current and the expected schema to be merged together (if a column is missing from the expected schema, but present in the current, add it to the expected), and applied on table. It will still fail if the update is not doable (like changing type from record to integer), but it wouldn't fail due to the "deprecated" columns.

edwardmedia · 2022-09-02T18:08:18Z

@nbali are you able to create a config with resources (not module) that can be reproed the issue?

nbali · 2022-09-05T20:35:51Z

@edwardmedia I have updated the issue with original + modified config, and the received output.

edwardmedia · 2022-09-17T17:25:08Z

@nbali somehow I can't repro the issue. Are you able to run from the original config to the modified config directly without hitting below error?
After adding deletion_protection=false to your code, it seems to be fine.

 Error: cannot destroy instance without setting deletion_protection=false and running `terraform apply`

Did I miss anything here in repro?

nbali · 2022-09-17T20:51:39Z

Well I think you missed probably the most important business aspect of this issue. The table should never be dropped, and that deletion protection should never be turned off. I mean doesn't it just drops the table and creates a new one in that case? That is an even worse scenario than to unable to update a schema.

edwardmedia · 2022-10-06T20:01:58Z

@nbali I do see tables.patch is available. But by the time this resource was implemented, only tables.update was available. That is why you see # forces replacement in the plan, and this is by design. To request this change to use tables.patch, you may file an enhancement. Does that make sense?

The table should never be dropped, and that deletion protection should never be turned off.

nbali · 2022-10-10T10:13:08Z

@edwardmedia
Using tables.patch instead of table.update would change the "replace" behaviour? I don't see how it's coming from the API at all, so why would that change it?

"this is by design"
What part of your statement are you referring to here? That it recreates the table? I highly doubt anyone would assume the table will be recreated with a simple schema modification. So what is the design aspect behind that?

edwardmedia · 2022-10-11T16:22:16Z

@nbali I am quoting the content from the link I provided here

tables.patch:
Updates information in an existing table. The update method replaces the entire table resource, whereas the patch method only replaces fields that are provided in the submitted table resource.

Again tables.patch wasn't available and recreation was by design. Still have questions?

edwardmedia · 2022-10-23T14:41:12Z

To request the change to use tables.patch, please file an enhancement. This is not a bug and close this issue accordingly.

github-actions · 2022-11-23T02:22:13Z

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

nbali added the bug label Aug 31, 2022

edwardmedia self-assigned this Aug 31, 2022

edwardmedia added the waiting-response label Aug 31, 2022

github-actions bot removed the waiting-response label Aug 31, 2022

edwardmedia added the waiting-response label Aug 31, 2022

github-actions bot removed the waiting-response label Aug 31, 2022

edwardmedia added the waiting-response label Sep 2, 2022

github-actions bot removed the waiting-response label Sep 2, 2022

edwardmedia added the waiting-response label Sep 2, 2022

github-actions bot removed the waiting-response label Sep 5, 2022

edwardmedia added the waiting-response label Sep 17, 2022

github-actions bot removed the waiting-response label Sep 17, 2022

edwardmedia added the waiting-response label Oct 6, 2022

github-actions bot removed the waiting-response label Oct 6, 2022

edwardmedia added the waiting-response label Oct 6, 2022

github-actions bot removed the waiting-response label Oct 10, 2022

edwardmedia added the waiting-response label Oct 11, 2022

edwardmedia closed this as completed Oct 23, 2022

github-actions bot removed the waiting-response label Oct 23, 2022

github-actions bot locked as resolved and limited conversation to collaborators Nov 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BigQuery table schema update fails/tries to replace the table if a column gets removed #12441

BigQuery table schema update fails/tries to replace the table if a column gets removed #12441

nbali commented Aug 31, 2022 •

edited

Loading

edwardmedia commented Aug 31, 2022

nbali commented Aug 31, 2022

edwardmedia commented Sep 2, 2022

nbali commented Sep 5, 2022

edwardmedia commented Sep 17, 2022 •

edited

Loading

nbali commented Sep 17, 2022

edwardmedia commented Oct 6, 2022

nbali commented Oct 10, 2022 •

edited

Loading

edwardmedia commented Oct 11, 2022 •

edited

Loading

edwardmedia commented Oct 23, 2022

github-actions bot commented Nov 23, 2022

BigQuery table schema update fails/tries to replace the table if a column gets removed #12441

BigQuery table schema update fails/tries to replace the table if a column gets removed #12441

Comments

nbali commented Aug 31, 2022 • edited Loading

Community Note

Terraform Version

Affected Resource(s)

Terraform Configuration Files

Debug Output

Panic Output

Expected Behavior

Actual Behavior

Steps to Reproduce

Important Factoids

References

edwardmedia commented Aug 31, 2022

nbali commented Aug 31, 2022

edwardmedia commented Sep 2, 2022

nbali commented Sep 5, 2022

edwardmedia commented Sep 17, 2022 • edited Loading

nbali commented Sep 17, 2022

edwardmedia commented Oct 6, 2022

nbali commented Oct 10, 2022 • edited Loading

edwardmedia commented Oct 11, 2022 • edited Loading

edwardmedia commented Oct 23, 2022

github-actions bot commented Nov 23, 2022

nbali commented Aug 31, 2022 •

edited

Loading

edwardmedia commented Sep 17, 2022 •

edited

Loading

nbali commented Oct 10, 2022 •

edited

Loading

edwardmedia commented Oct 11, 2022 •

edited

Loading