Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BigQuery table schema update fails/tries to replace the table if a column gets removed #12441

Closed
nbali opened this issue Aug 31, 2022 · 11 comments
Assignees
Labels

Comments

@nbali
Copy link

nbali commented Aug 31, 2022

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request.
  • Please do not leave +1 or me too comments, they generate extra noise for issue followers and do not help prioritize the request.
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment.
  • If an issue is assigned to the modular-magician user, it is either in the process of being autogenerated, or is planned to be autogenerated soon. If an issue is assigned to a user, that user is claiming responsibility for the issue. If an issue is assigned to hashibot, a community member has claimed the issue already.

Terraform Version

$ terraform -v
Terraform v1.2.8
on windows_amd64
+ provider registry.terraform.io/hashicorp/google v4.34.0

Affected Resource(s)

  • google_bigquery_table

Terraform Configuration Files

original config

resource "google_bigquery_table" "test_table" {
  dataset_id = "dataset_id"
  table_id   = "table_id"
  schema = <<-EOF
  [
    {
      "name": "field1",
      "type": "STRING"
    },
    {
      "name": "field2",
      "type": "STRING"
    },
    {
      "name": "field3",
      "type": "STRING"
    }
  ]
  EOF
}

modified config

resource "google_bigquery_table" "test_table" {
  dataset_id = "dataset_id"
  table_id   = "table_id"
  schema = <<-EOF
  [
    {
      "name": "field1",
      "type": "STRING"
    },
    {
      "name": "field3",
      "type": "STRING"
    }
  ]
  EOF
}

Debug Output

Terraform used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
-/+ destroy and then create replacement

Terraform will perform the following actions:

  # *.google_bigquery_table.* must be replaced
-/+ resource "google_bigquery_table" "*" {
      ~ creation_time       = * -> (known after apply)
      ~ etag                = "*" -> (known after apply)
      ~ expiration_time     = 0 -> (known after apply)
      ~ id                  = "*" -> (known after apply)
      - labels              = {} -> null
      ~ last_modified_time  = * -> (known after apply)
      ~ location            = "EU" -> (known after apply)
      ~ num_bytes           = 0 -> (known after apply)
      ~ num_long_term_bytes = 0 -> (known after apply)
      ~ num_rows            = 0 -> (known after apply)
      ~ project             = "*" -> (known after apply)
      ~ schema              = jsonencode(
          ~ [ # forces replacement
                {
                    name = "field1"
                    type = "STRING"
                },
              - {
                  - name = "field2"
                  - type = "STRING"
                },
                {
                    name = "field3"
                    type = "STRING"
                },
            ]
        )
      ~ self_link           = "*" -> (known after apply)
      ~ type                = "TABLE" -> (known after apply)
        # (3 unchanged attributes hidden)
    }

Plan: 1 to add, 0 to change, 1 to destroy.
Error: googleapi: Error 409: Already Exists: Table *:*.*, duplicate

  with module.*.google_bigquery_table.*,
  on ../../modules/*/bigquery.tf line 157, in resource "google_bigquery_table" "*":
 157: resource "google_bigquery_table" "*" {

Panic Output

Expected Behavior

There should be a config ( for example ignore_unknown_columns ) to influence what to do with columns that has been removed from the TF config.

  • keep it untouched and ignore them from now on OR
  • current behaviour

At the 8th step in "Steps to Reproduce" terraform should either fail/try to replace it (if we have no new config, just like now), or it should successfully apply the changes, and still keep the columns in BQ that aren't present anymore in the TF schema (if the new config gets enabled).

Actual Behavior

At the 8th step in "Steps to Reproduce" terraform fails. See "Debug Output".

Steps to Reproduce

  1. have an auto-generated bq schema (for example from proto, mysql cdc, whatever)
  2. create a bigquery table config in terraform with that schema
  3. tf apply (table gets created)
  4. use that table and fill all the columns with data
  5. get a warning from developers/etc that scripts require all those fields, so never remove them
  6. the source of the auto-generated schema removes that field
  7. auto-generated schema gets updated with one less column
  8. tf apply

IMO this is a quite common pattern, that people mirror/ingest/capture something from another system into BQ, and just because the source of the data doesn't have/fill a certain column anymore, the users / scripts / etc might still need that field even if it contains values only for older entries.

Important Factoids

I wanted to emphasize here, that I was torn between this being a bug or an enhancement. I chose bug, because it prevent such a common use-case, and tries to solve it in a catastrophically wrong way (table replacement).

References

@nbali nbali added the bug label Aug 31, 2022
@edwardmedia edwardmedia self-assigned this Aug 31, 2022
@edwardmedia
Copy link
Contributor

@nbali
Copy link
Author

nbali commented Aug 31, 2022

@edwardmedia I did, and I get why is it like that, BUT I don't see how is that relevant on the long term though. IMO it's not a limitation that can't be worked around. It's just a limitation that hasn't been handled in a more thorough manner. YET. Both values are JSONs. They can be diffed, merged, etc. This config would require the current and the expected schema to be merged together (if a column is missing from the expected schema, but present in the current, add it to the expected), and applied on table. It will still fail if the update is not doable (like changing type from record to integer), but it wouldn't fail due to the "deprecated" columns.

@edwardmedia
Copy link
Contributor

@nbali are you able to create a config with resources (not module) that can be reproed the issue?

@nbali
Copy link
Author

nbali commented Sep 5, 2022

@edwardmedia I have updated the issue with original + modified config, and the received output.

@edwardmedia
Copy link
Contributor

edwardmedia commented Sep 17, 2022

@nbali somehow I can't repro the issue. Are you able to run from the original config to the modified config directly without hitting below error?
After adding deletion_protection=false to your code, it seems to be fine.

 Error: cannot destroy instance without setting deletion_protection=false and running `terraform apply`

Did I miss anything here in repro?

@nbali
Copy link
Author

nbali commented Sep 17, 2022

Well I think you missed probably the most important business aspect of this issue. The table should never be dropped, and that deletion protection should never be turned off. I mean doesn't it just drops the table and creates a new one in that case? That is an even worse scenario than to unable to update a schema.

@edwardmedia
Copy link
Contributor

@nbali I do see tables.patch is available. But by the time this resource was implemented, only tables.update was available. That is why you see # forces replacement in the plan, and this is by design. To request this change to use tables.patch, you may file an enhancement. Does that make sense?

The table should never be dropped, and that deletion protection should never be turned off.

@nbali
Copy link
Author

nbali commented Oct 10, 2022

@edwardmedia
Using tables.patch instead of table.update would change the "replace" behaviour? I don't see how it's coming from the API at all, so why would that change it?

"this is by design"
What part of your statement are you referring to here? That it recreates the table? I highly doubt anyone would assume the table will be recreated with a simple schema modification. So what is the design aspect behind that?

@edwardmedia
Copy link
Contributor

edwardmedia commented Oct 11, 2022

@nbali I am quoting the content from the link I provided here

tables.patch:
Updates information in an existing table. The update method replaces the entire table resource, whereas the patch method only replaces fields that are provided in the submitted table resource.

Again tables.patch wasn't available and recreation was by design. Still have questions?

@edwardmedia
Copy link
Contributor

To request the change to use tables.patch, please file an enhancement. This is not a bug and close this issue accordingly.

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants