Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't list bootstrap servers as a diff when applying #19659

Open
DanaLacoste opened this issue Jun 3, 2021 · 5 comments
Open

Don't list bootstrap servers as a diff when applying #19659

DanaLacoste opened this issue Jun 3, 2021 · 5 comments
Labels
enhancement Requests to existing resources that expand the functionality or scope. service/kafka Issues and PRs that pertain to the kafka service.

Comments

@DanaLacoste
Copy link

DanaLacoste commented Jun 3, 2021

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Description

NOTE: This is more than a cosmetic issue, but less than a "it breaks things" issue. Terraform is doing the right thing on apply, but it is outputting something extraneous which has caused confusion while trying to investigate an unrelated change.

When applying on an MSK cluster which has more nodes than availability zones, the diff will show that Objects have changed outside of Terraform and report that bootstrap_brokers_sasl_scram (or _iam, or....) has changed, with a list of nodes. This is caused by the AWS SDK call returning a random list of nodes, one per AZ (so if you have one node per AZ, you will get the same list every time, but if you have multiple nodes in any AZ, then you cannot predict which nodes will return)

This request is to silence that output (somehow?). I have tried adding ignore_changes = [bootstrap_brokers_sasl_scram] to no effect: the change is already (correctly) being ignored as far as "should terraform re-apply this resource to fix the change?" after all, it is only identifying the property of the resource (as returned by the AWS API call) as changed.

NOTE 1: This is related to #17579 but is a different issue: this is not about order of the results, but content of the (ordered) results.

NOTE 2: This does not occur if you have number of nodes <= number of availability zones (i.e. it only happens if you set up a bigger cluster to handle a larger load.)

Current Output

# terraform init

Terraform v0.15.4 is already installed
Initializing modules...
- msk in msk

Initializing the backend...

Successfully configured the backend "s3"! Terraform will automatically
use this backend unless the backend configuration changes.

Initializing provider plugins...
- terraform.io/builtin/terraform is built in to Terraform
- Reusing previous version of hashicorp/dns from the dependency lock file
- Reusing previous version of hashicorp/aws from the dependency lock file
- Reusing previous version of hashicorp/template from the dependency lock file
- Installing hashicorp/template v2.2.0...
- Installed hashicorp/template v2.2.0 (signed by HashiCorp)
- Installing hashicorp/dns v3.1.0...
- Installed hashicorp/dns v3.1.0 (signed by HashiCorp)
- Installing hashicorp/aws v3.42.0...
- Installed hashicorp/aws v3.42.0 (signed by HashiCorp)

Terraform has been successfully initialized!

You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.

# terraform apply

If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.

<Refreshes deleted for clarity>

Note: Objects have changed outside of Terraform

Terraform detected the following changes made outside of Terraform since the last "terraform apply":

  # aws_msk_cluster.cluster has been changed
  ~ resource "aws_msk_cluster" "cluster" {
      ~ bootstrap_brokers_sasl_scram = "b-1.cluster.123456.c01.kafka.us-east-1.amazonaws.com:9096,b-5.cluster.123456.c01.kafka.us-east-1.amazonaws.com:9096,b-6.cluster.123456.c01.kafka.us-east-1.amazonaws.com:9096" -> "b-1.cluster.123456.c01.kafka.us-east-1.amazonaws.com:9096,b-2.cluster.123456.c01.kafka.us-east-1.amazonaws.com:9096,b-5.cluster.123456.c01.kafka.us-east-1.amazonaws.com:9096"
        id                           = "arn:aws:kafka:us-east-1:1234567890:cluster/cluster/<guid>"
        # (8 unchanged attributes hidden)

        # (6 unchanged blocks hidden)
    }

Unless you have made equivalent changes to your configuration, or ignored the relevant attributes using ignore_changes, the following plan may include actions to undo or respond to these changes.

──────

No changes. Your infrastructure matches the configuration.

Your configuration already matches the changes detected above. If you'd like to update the Terraform state to match, create and apply a refresh-only plan:
  terraform apply -refresh-only

Apply complete! Resources: 0 added, 0 changed, 0 destroyed.

Desired Output (reduced for clarity)

# terraform apply

If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.

<Refreshes deleted for clarity>

──────

No changes. Your infrastructure matches the configuration.

Your configuration already matches the changes detected above. If you'd like to update the Terraform state to match, create and apply a refresh-only plan:
  terraform apply -refresh-only

Apply complete! Resources: 0 added, 0 changed, 0 destroyed.

New or Affected Resource(s)

  • aws_msk_cluster

Potential Terraform Configuration

This is a portion of our config (the relevant part).
I was hoping the lifecycle rule would fix the issue, but it did not.

resource "aws_msk_cluster" "cluster" {
  cluster_name           = "test-cluster"
  kafka_version          = var.kafka_version
  number_of_broker_nodes = var.broker_count

  broker_node_group_info {
    instance_type   = var.instance_type
    ebs_volume_size = var.ebs_volume_size
    client_subnets  = var.subnet_ids
    security_groups = [aws_security_group.sg.id]
  }

  configuration_info {
    arn      = aws_msk_configuration.cluster_config.arn
    revision = aws_msk_configuration.cluster_config.latest_revision
  }

  encryption_info {
    encryption_in_transit {
      client_broker = "TLS"
      in_cluster    = true
    }
    encryption_at_rest_kms_key_arn = aws_kms_key.msk.arn
  }

  client_authentication {
    sasl {
      scram = true
    }
  }

  logging_info {
    broker_logs {
      s3 {
        enabled = true
        bucket  = aws_s3_bucket.log_bucket.id
      }
    }
  }

  open_monitoring {
    prometheus {
      jmx_exporter {
        enabled_in_broker = true
      }
      node_exporter {
        enabled_in_broker = true
      }
    }
  }

  lifecycle {
    # This is because https://docs.aws.amazon.com/msk/latest/developerguide/msk-get-bootstrap-brokers.html
    # IFF the number of nodes is greater than the number of AZs, then this value will be a random selection
    # of the nodes, one per AZ (so if you have 6 nodes and 3 AZs, you will get only three responses)
    ignore_changes  = [bootstrap_brokers_sasl_scram]
    prevent_destroy = true
  }
}

References

  • None (yet!)
@DanaLacoste DanaLacoste added the enhancement Requests to existing resources that expand the functionality or scope. label Jun 3, 2021
@github-actions github-actions bot added needs-triage Waiting for first response or review from a maintainer. service/kafka Issues and PRs that pertain to the kafka service. labels Jun 3, 2021
@DanaLacoste
Copy link
Author

Added as a feature request because I totally get what's happening here: Terraform is storing the value in the state file (can be in outputs, too, after all, if you add that) and AWS returns a different value every time.

So what I am looking for is an ehancement to, well, silence that output, if possible.

@james-bjss
Copy link
Contributor

james-bjss commented Aug 16, 2021

Ran across the same issue last week. It seems AWS returns 3 brokers by random (from each AZ) when calling get-bootstrap-brokers. This is problematic when the cluster has multiple nodes/brokers in each AZs as the values change between plans/applies. In our case we have a 6 broker cluster across three AZ's so 2 brokers in each AZ.

In addition to ignoring the changes could we consider providing an attribute for the broker hostnames so we can list all node endpoints?

It looks like the CLI call that list-nodes makes achieves this. The underlying Rest endpoint is:
https://docs.aws.amazon.com/msk/1.0/apireference/clusters-clusterarn-nodes.html

In our case we want to fetch all the broker host names so we can setup Consul entries for each.

Additional Info:

# Initial check of Brokers

aws kafka get-bootstrap-brokers --cluster-arn <redacted>
{
    "BootstrapBrokerString": "b-6.<redacted>.kafka.eu-west-1.amazonaws.com:9092,b-2.<redacted>.eu-west-1.amazonaws.com:9092,b-4<redacted>eu-west-1.amazonaws.com:9092",
    "BootstrapBrokerStringTls": "b-6<redacted>.eu-west-1.amazonaws.com:9094,b-2.<redacted>.eu-west-1.amazonaws.com:9094,b-4.<redacted>.eu-west-1.amazonaws.com:9094"
}

# Second check (30 seconds later):

❯ aws kafka get-bootstrap-brokers --cluster-arn <redacted>
    "BootstrapBrokerString": "b-1.<redacted>.eu-west-1.amazonaws.com:9092,b-5.<redacted>.eu-west-1.amazonaws.com:9092,b-3.<redacted>.eu-west-1.amazonaws.com:9092",
    "BootstrapBrokerStringTls": "b-1.<redacted>.eu-west-1.amazonaws.com:9094,b-5.<redacted>.eu-west-1.amazonaws.com:9094,b-3.<redacted>.eu-west-1.amazonaws.com:9094"
}

It appears this is (partially) documented behavior. Quoting AWS Docs:
A list of brokers that a client can use to bootstrap. This list doesn't necessarily include all of the brokers in the cluster.
See: https://docs.aws.amazon.com/msk/1.0/apireference/clusters-clusterarn-bootstrap-brokers.html

@ryan-dyer-sp
Copy link
Contributor

I cant speak to the need for the diff to not show up in an apply. However, this undeterministic nature does cause problems when this parameter(boostrap_broker_*) is being passed to another resource; eg. aws_ssm_parameter. Applies can often fail on our >3 node clusters with the following

╷
│ Error: Provider produced inconsistent final plan
│
│ When expanding the plan for aws_ssm_parameter.bootstrap_servers_param[0] to
│ include new values learned so far during apply, provider
│ "registry.terraform.io/hashicorp/aws" produced an invalid new value for
│ .value: inconsistent values for sensitive attribute.
│
│ This is a bug in the provider, which should be reported in the provider's
│ own issue tracker.
╵
Releasing state lock. This may take a few moments...
ERRO[0017] 1 error occurred:
        * exit status 1

We have to hope that the api calls made to retrieve this value during the course of the plan and apply process return the same result.

@lplazas
Copy link
Contributor

lplazas commented Sep 12, 2022

Is there a known way to fix this situation? This seems to be more of an API issue, can it be flagged somewhere?

Ideally one should be able to either get all the broker connect strings, in my set up I have 4 nodes across 2 AZ and I always get 3 results back that randomly shuffle every time there is an apply. I run multiple connectors on MSK connect and the change on the bootstrap_brokers causes a replacement of the connectors.

@Jean717p
Copy link

@DanaLacoste
According to AWS any broker carries the required metadata to be a bootstrap node.

Any broker can be bootstrap server because every broker receives metadata information.

Furthermore, the bootstrap string returned from the cli call invoked by the aws_msk_cluster terraform resource returns the ever-changing aws generated bootstrap string that contains 3 brokers across AZs in which the MSK cluster is deployed (unless only two brokers are available or less AZs are selected).

@lplazas
In order to avoid the ever-changing bootstrap broker string, you could generate yours from the brokers and use all the brokers (or a subset) in it through the data "aws_msk_broker_nodes" terraform resource that invokes a sorted by node id result of the list-nodes cli command, as @james-bjss was pointing out.

data "aws_msk_broker_nodes" "nodes" {
  cluster_arn = aws_msk_cluster.kafka.arn
}

output "kafka_sasl_scram_connection_string" {
  description = "Connection host:port pairs"
  value       = join(",", [for endpoint in flatten(data.aws_msk_broker_nodes.nodes.node_info_list.*.endpoints) : format("%s:9096", endpoint)])
}

Then use the obtained output.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Requests to existing resources that expand the functionality or scope. service/kafka Issues and PRs that pertain to the kafka service.
Projects
None yet
Development

No branches or pull requests

6 participants