Don't list bootstrap servers as a diff when applying #19659

DanaLacoste · 2021-06-03T22:57:08Z

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

Description

NOTE: This is more than a cosmetic issue, but less than a "it breaks things" issue. Terraform is doing the right thing on apply, but it is outputting something extraneous which has caused confusion while trying to investigate an unrelated change.

When applying on an MSK cluster which has more nodes than availability zones, the diff will show that Objects have changed outside of Terraform and report that bootstrap_brokers_sasl_scram (or _iam, or....) has changed, with a list of nodes. This is caused by the AWS SDK call returning a random list of nodes, one per AZ (so if you have one node per AZ, you will get the same list every time, but if you have multiple nodes in any AZ, then you cannot predict which nodes will return)

This request is to silence that output (somehow?). I have tried adding ignore_changes = [bootstrap_brokers_sasl_scram] to no effect: the change is already (correctly) being ignored as far as "should terraform re-apply this resource to fix the change?" after all, it is only identifying the property of the resource (as returned by the AWS API call) as changed.

NOTE 1: This is related to #17579 but is a different issue: this is not about order of the results, but content of the (ordered) results.

NOTE 2: This does not occur if you have number of nodes <= number of availability zones (i.e. it only happens if you set up a bigger cluster to handle a larger load.)

Current Output

# terraform init

Terraform v0.15.4 is already installed
Initializing modules...
- msk in msk

Initializing the backend...

Successfully configured the backend "s3"! Terraform will automatically
use this backend unless the backend configuration changes.

Initializing provider plugins...
- terraform.io/builtin/terraform is built in to Terraform
- Reusing previous version of hashicorp/dns from the dependency lock file
- Reusing previous version of hashicorp/aws from the dependency lock file
- Reusing previous version of hashicorp/template from the dependency lock file
- Installing hashicorp/template v2.2.0...
- Installed hashicorp/template v2.2.0 (signed by HashiCorp)
- Installing hashicorp/dns v3.1.0...
- Installed hashicorp/dns v3.1.0 (signed by HashiCorp)
- Installing hashicorp/aws v3.42.0...
- Installed hashicorp/aws v3.42.0 (signed by HashiCorp)

Terraform has been successfully initialized!

You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.

# terraform apply

If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.

<Refreshes deleted for clarity>

Note: Objects have changed outside of Terraform

Terraform detected the following changes made outside of Terraform since the last "terraform apply":

  # aws_msk_cluster.cluster has been changed
  ~ resource "aws_msk_cluster" "cluster" {
      ~ bootstrap_brokers_sasl_scram = "b-1.cluster.123456.c01.kafka.us-east-1.amazonaws.com:9096,b-5.cluster.123456.c01.kafka.us-east-1.amazonaws.com:9096,b-6.cluster.123456.c01.kafka.us-east-1.amazonaws.com:9096" -> "b-1.cluster.123456.c01.kafka.us-east-1.amazonaws.com:9096,b-2.cluster.123456.c01.kafka.us-east-1.amazonaws.com:9096,b-5.cluster.123456.c01.kafka.us-east-1.amazonaws.com:9096"
        id                           = "arn:aws:kafka:us-east-1:1234567890:cluster/cluster/<guid>"
        # (8 unchanged attributes hidden)

        # (6 unchanged blocks hidden)
    }

Unless you have made equivalent changes to your configuration, or ignored the relevant attributes using ignore_changes, the following plan may include actions to undo or respond to these changes.

──────

No changes. Your infrastructure matches the configuration.

Your configuration already matches the changes detected above. If you'd like to update the Terraform state to match, create and apply a refresh-only plan:
  terraform apply -refresh-only

Apply complete! Resources: 0 added, 0 changed, 0 destroyed.

Desired Output (reduced for clarity)

# terraform apply

If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.

<Refreshes deleted for clarity>

──────

No changes. Your infrastructure matches the configuration.

Your configuration already matches the changes detected above. If you'd like to update the Terraform state to match, create and apply a refresh-only plan:
  terraform apply -refresh-only

Apply complete! Resources: 0 added, 0 changed, 0 destroyed.

New or Affected Resource(s)

aws_msk_cluster

Potential Terraform Configuration

This is a portion of our config (the relevant part).
I was hoping the lifecycle rule would fix the issue, but it did not.

resource "aws_msk_cluster" "cluster" {
  cluster_name           = "test-cluster"
  kafka_version          = var.kafka_version
  number_of_broker_nodes = var.broker_count

  broker_node_group_info {
    instance_type   = var.instance_type
    ebs_volume_size = var.ebs_volume_size
    client_subnets  = var.subnet_ids
    security_groups = [aws_security_group.sg.id]
  }

  configuration_info {
    arn      = aws_msk_configuration.cluster_config.arn
    revision = aws_msk_configuration.cluster_config.latest_revision
  }

  encryption_info {
    encryption_in_transit {
      client_broker = "TLS"
      in_cluster    = true
    }
    encryption_at_rest_kms_key_arn = aws_kms_key.msk.arn
  }

  client_authentication {
    sasl {
      scram = true
    }
  }

  logging_info {
    broker_logs {
      s3 {
        enabled = true
        bucket  = aws_s3_bucket.log_bucket.id
      }
    }
  }

  open_monitoring {
    prometheus {
      jmx_exporter {
        enabled_in_broker = true
      }
      node_exporter {
        enabled_in_broker = true
      }
    }
  }

  lifecycle {
    # This is because https://docs.aws.amazon.com/msk/latest/developerguide/msk-get-bootstrap-brokers.html
    # IFF the number of nodes is greater than the number of AZs, then this value will be a random selection
    # of the nodes, one per AZ (so if you have 6 nodes and 3 AZs, you will get only three responses)
    ignore_changes  = [bootstrap_brokers_sasl_scram]
    prevent_destroy = true
  }
}

References

None (yet!)

The text was updated successfully, but these errors were encountered:

DanaLacoste · 2021-06-03T23:02:42Z

Added as a feature request because I totally get what's happening here: Terraform is storing the value in the state file (can be in outputs, too, after all, if you add that) and AWS returns a different value every time.

So what I am looking for is an ehancement to, well, silence that output, if possible.

james-bjss · 2021-08-16T09:03:35Z

Ran across the same issue last week. It seems AWS returns 3 brokers by random (from each AZ) when calling get-bootstrap-brokers. This is problematic when the cluster has multiple nodes/brokers in each AZs as the values change between plans/applies. In our case we have a 6 broker cluster across three AZ's so 2 brokers in each AZ.

In addition to ignoring the changes could we consider providing an attribute for the broker hostnames so we can list all node endpoints?

It looks like the CLI call that list-nodes makes achieves this. The underlying Rest endpoint is:
https://docs.aws.amazon.com/msk/1.0/apireference/clusters-clusterarn-nodes.html

In our case we want to fetch all the broker host names so we can setup Consul entries for each.

Additional Info:

# Initial check of Brokers

aws kafka get-bootstrap-brokers --cluster-arn <redacted>
{
    "BootstrapBrokerString": "b-6.<redacted>.kafka.eu-west-1.amazonaws.com:9092,b-2.<redacted>.eu-west-1.amazonaws.com:9092,b-4<redacted>eu-west-1.amazonaws.com:9092",
    "BootstrapBrokerStringTls": "b-6<redacted>.eu-west-1.amazonaws.com:9094,b-2.<redacted>.eu-west-1.amazonaws.com:9094,b-4.<redacted>.eu-west-1.amazonaws.com:9094"
}

# Second check (30 seconds later):

❯ aws kafka get-bootstrap-brokers --cluster-arn <redacted>
    "BootstrapBrokerString": "b-1.<redacted>.eu-west-1.amazonaws.com:9092,b-5.<redacted>.eu-west-1.amazonaws.com:9092,b-3.<redacted>.eu-west-1.amazonaws.com:9092",
    "BootstrapBrokerStringTls": "b-1.<redacted>.eu-west-1.amazonaws.com:9094,b-5.<redacted>.eu-west-1.amazonaws.com:9094,b-3.<redacted>.eu-west-1.amazonaws.com:9094"
}

It appears this is (partially) documented behavior. Quoting AWS Docs:
A list of brokers that a client can use to bootstrap. This list doesn't necessarily include all of the brokers in the cluster.
See: https://docs.aws.amazon.com/msk/1.0/apireference/clusters-clusterarn-bootstrap-brokers.html

ryan-dyer-sp · 2022-07-29T12:52:25Z

I cant speak to the need for the diff to not show up in an apply. However, this undeterministic nature does cause problems when this parameter(boostrap_broker_*) is being passed to another resource; eg. aws_ssm_parameter. Applies can often fail on our >3 node clusters with the following

╷
│ Error: Provider produced inconsistent final plan
│
│ When expanding the plan for aws_ssm_parameter.bootstrap_servers_param[0] to
│ include new values learned so far during apply, provider
│ "registry.terraform.io/hashicorp/aws" produced an invalid new value for
│ .value: inconsistent values for sensitive attribute.
│
│ This is a bug in the provider, which should be reported in the provider's
│ own issue tracker.
╵
Releasing state lock. This may take a few moments...
ERRO[0017] 1 error occurred:
        * exit status 1

We have to hope that the api calls made to retrieve this value during the course of the plan and apply process return the same result.

lplazas · 2022-09-12T20:56:15Z

Is there a known way to fix this situation? This seems to be more of an API issue, can it be flagged somewhere?

Ideally one should be able to either get all the broker connect strings, in my set up I have 4 nodes across 2 AZ and I always get 3 results back that randomly shuffle every time there is an apply. I run multiple connectors on MSK connect and the change on the bootstrap_brokers causes a replacement of the connectors.

Jean717p · 2022-12-19T18:28:55Z

@DanaLacoste
According to AWS any broker carries the required metadata to be a bootstrap node.

Any broker can be bootstrap server because every broker receives metadata information.

Furthermore, the bootstrap string returned from the cli call invoked by the aws_msk_cluster terraform resource returns the ever-changing aws generated bootstrap string that contains 3 brokers across AZs in which the MSK cluster is deployed (unless only two brokers are available or less AZs are selected).

@lplazas
In order to avoid the ever-changing bootstrap broker string, you could generate yours from the brokers and use all the brokers (or a subset) in it through the data "aws_msk_broker_nodes" terraform resource that invokes a sorted by node id result of the list-nodes cli command, as @james-bjss was pointing out.

data "aws_msk_broker_nodes" "nodes" {
  cluster_arn = aws_msk_cluster.kafka.arn
}

output "kafka_sasl_scram_connection_string" {
  description = "Connection host:port pairs"
  value       = join(",", [for endpoint in flatten(data.aws_msk_broker_nodes.nodes.node_info_list.*.endpoints) : format("%s:9096", endpoint)])
}

Then use the obtained output.

DanaLacoste added the enhancement Requests to existing resources that expand the functionality or scope. label Jun 3, 2021

github-actions bot added needs-triage Waiting for first response or review from a maintainer. service/kafka Issues and PRs that pertain to the kafka service. labels Jun 3, 2021

nitrocode mentioned this issue Aug 16, 2021

Correct module hostname count cloudposse/terraform-aws-msk-apache-kafka-cluster#29

Merged

This was referenced Aug 18, 2021

Add New MSK Broker Node Data Resource #20615

Merged

Create a new MSK Nodes Data Source to allow MSK Broker Node Properties to be Retrieved #20628

Closed

breathingdust removed the needs-triage Waiting for first response or review from a maintainer. label Aug 31, 2021

ewbankkit mentioned this issue Oct 20, 2021

aws_msk_cluster.bootstrap_brokers_* returns nondeterministic subset when amount of brokers is >3 #21393

Closed

tim-finnigan mentioned this issue Mar 21, 2023

kafka get-bootstrap-brokers doesn't return all brokers aws/aws-sdk#500

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't list bootstrap servers as a diff when applying #19659

Don't list bootstrap servers as a diff when applying #19659

DanaLacoste commented Jun 3, 2021 •

edited

DanaLacoste commented Jun 3, 2021

james-bjss commented Aug 16, 2021 •

edited

ryan-dyer-sp commented Jul 29, 2022

lplazas commented Sep 12, 2022 •

edited

Jean717p commented Dec 19, 2022

Don't list bootstrap servers as a diff when applying #19659

Don't list bootstrap servers as a diff when applying #19659

Comments

DanaLacoste commented Jun 3, 2021 • edited

Community Note

Description

Current Output

Desired Output (reduced for clarity)

New or Affected Resource(s)

Potential Terraform Configuration

References

DanaLacoste commented Jun 3, 2021

james-bjss commented Aug 16, 2021 • edited

ryan-dyer-sp commented Jul 29, 2022

lplazas commented Sep 12, 2022 • edited

Jean717p commented Dec 19, 2022

DanaLacoste commented Jun 3, 2021 •

edited

james-bjss commented Aug 16, 2021 •

edited

lplazas commented Sep 12, 2022 •

edited