Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

error updating glue crawler #22868

Open
moreandres opened this issue Feb 1, 2022 · 7 comments
Open

error updating glue crawler #22868

moreandres opened this issue Feb 1, 2022 · 7 comments
Labels
service/glue Issues and PRs that pertain to the glue service.

Comments

@moreandres
Copy link

Community Note

  • Please vote on this issue by adding a 馃憤 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform CLI and Terraform AWS Provider Version

$ terraform -v
Terraform v1.1.4
on linux_amd64
+ provider registry.terraform.io/hashicorp/aws v3.71.0
+ provider registry.terraform.io/hashicorp/template v2.2.0

Affected Resource(s)

  • aws_glue_crawler

Terraform Configuration Files

provider "aws" {
  region = "us-east-1"
}

resource "aws_glue_catalog_database" "example" {
  name = "example"
}

resource "aws_glue_crawler" "example" {
  database_name = aws_glue_catalog_database.example.name
  name          = "example"
  role          = "arn:aws:iam::ACCOUNT:role/ROLE"

  recrawl_policy {
    recrawl_behavior = "CRAWL_NEW_FOLDERS_ONLY"
  }

  s3_target {
    path = "s3://path"
  }
}

Debug Output

https://gist.github.com/moreandres/d72b773e03a198d541653152ff6b21c5

Panic Output

Not applicable.

Expected Behavior

The resource should have been recreated as it cannot be updated when CRAWL_NEW_FOLDERS_ONLY recrawl behavior policy is enabled.

Actual Behavior

The following error is shown while trying to modify the resource instead of recreating it.
As a workaround, the resource can be manually recreated (or tainted to be done automatically).

Error: error updating Glue crawler: InvalidInputException: Amazon S3 target is immutable when "Crawl new folders only" recrawl behavior is selected. Deselect "Crawl new folders only" to change the Amazon S3 target.

Steps to Reproduce

  1. terraform apply
  2. Rename S3 target path
  3. terraform apply

Important Factoids

None

References

https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/glue_crawler#recrawl-policy
https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-glue-crawler.html

@github-actions github-actions bot added needs-triage Waiting for first response or review from a maintainer. service/glue Issues and PRs that pertain to the glue service. labels Feb 1, 2022
@justinretzolk justinretzolk removed the needs-triage Waiting for first response or review from a maintainer. label Feb 1, 2022
@herrsergio
Copy link

I have a similar behavior using:
Terraform 1.0.4
AWS Provider 3.75.1

 ~ recrawl_policy {
          ~ recrawl_behavior = "CRAWL_EVERYTHING" -> "CRAWL_NEW_FOLDERS_ONLY"
        }
Error: error updating Glue crawler: InvalidInputException: The SchemaChangePolicy for "Crawl new folders only" Amazon S3 target can have only LOG DeleteBehavior value and LOG UpdateBehavior value.

@ishukhrat2010
Copy link

ishukhrat2010 commented May 4, 2022

I have the same issue with incremental crawler even when I don't change s3 target, just re-applying my module with changes unrelated to the crawler and/or s3 target.

Terraform has been successfully initialized!
aws_glue_crawler.crawler: Refreshing state... [id=xxxx]
aws_glue_crawler.crawler: Modifying... [id=xxxx]
Error: error updating Glue crawler: InvalidInputException: Amazon S3 target is immutable when "Crawl new folders only" recrawl behavior is selected. Deselect "Crawl new folders only" to change the Amazon S3 target.

@mm808
Copy link

mm808 commented Jun 23, 2022

When attempting to edit the IAM policy of my deployed crawler to add CloudWatch logs I received the error:
The SchemaChangePolicy for "Crawl new folders only" Amazon S3 target can have only LOG DeleteBehavior value and LOG UpdateBehavior value.

@MahsaMaslahati
Copy link

The problem is that when you set recrawl_behavior to "CRAWL_NEW_FOLDERS_ONLY", the value for schema_change_policy. delete_behavior and schema_change_policy. update_behavior can only be set to "LOG"

schema_change_policy {
    delete_behavior = "LOG"
    update_behavior = "LOG"
  }

However, their default values are:

schema_change_policy {
    delete_behavior = "DEPRECATE_IN_DATABASE"
    update_behavior = "UPDATE_IN_DATABASE"
  }

So you need to set them to their correct values when setting recrawl_behavior to "CRAWL_NEW_FOLDERS_ONLY"

@iwb-vhuysmans
Copy link

iwb-vhuysmans commented Jan 4, 2023

I have the same issue when trying to change the S3 target:

 s3_target {
          ~ path        = "s3://s3bucketexample/subfolderA/subfolderB" -> "s3://s3bucketexample/subfolderA"
            # (2 unchanged attributes hidden)
        }

This gives me:

Error: error updating Glue crawler: InvalidInputException: Amazon S3 target is immutable when "Crawl new folders only" recrawl behavior is selected. Deselect "Crawl new folders only" to change the Amazon S3 target.

I have both delete_behavior and update_behavior set to "LOG".

schema_change_policy {
    delete_behavior = "LOG"
    update_behavior = "LOG"
  }

recrawl_policy {
    recrawl_behavior = "CRAWL_NEW_FOLDERS_ONLY"
 }

Re-creating the resource instead of updating would solve the issue for my use-case. As temporary solution I use the -replace functionality to recreate the crawler resource I would like to update.

terraform apply -replace="aws_glue_crawler.example_crawler

Note that this is only temporary, as it requires a manual action I would like to avoid in the future.

@mfpereira
Copy link

Hi guys,

The s3_target name looks wrong for me. It is related to the s3_source in AWS Crawler definition. In the same way, the others *_target looks wrong. They should be *_source

@csaxton
Copy link

csaxton commented Sep 21, 2023

hi there,
only supporting LOG for the schema update policy when using incremental re-crawls appears to be a documented constraint
as is mentioned in the AWS documentation 'Incremental crawls in AWS Glue' under the 'Notes and Restrictions for Incremental Crawls' section

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
service/glue Issues and PRs that pertain to the glue service.
Projects
None yet
Development

No branches or pull requests

9 participants