Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement]: Extend 'aws_msk_cluster' by adding flag for turning on AWS PrivateLink feature (msk multi-vpc) #34419

Open
malamin opened this issue Nov 15, 2023 · 13 comments
Labels
enhancement Requests to existing resources that expand the functionality or scope. service/kafka Issues and PRs that pertain to the kafka service.

Comments

@malamin
Copy link

malamin commented Nov 15, 2023

Description

After creating MSK provisioned cluster with 'aws_msk_cluster' resource, it is not possible to apply cluster policy with terraform, because option for enabling AWS PrivateLink (MSK multi-VPC) is missing and by default this option is turned off (needs to be turned on manually before applying cluster policy and there is no drift afterwards in 'aws_msk_cluster' resource)

image

resource "aws_security_group" "msk_sg" {
  name        = "${local.cluster_name}-sg"
  description = "MSK cluster security group"
  vpc_id      = var.vpc_id
}

resource "aws_security_group_rule" "sg_ingress" {
  for_each          = {for account in var.client_accounts : account.account_id => account}
  type              = "ingress"
  from_port         = 9094
  to_port           = 9094
  protocol          = "tcp"
  cidr_blocks       = [each.value.cidr_block]
  security_group_id = aws_security_group.msk_sg.id
}

resource "aws_security_group_rule" "sg_egress" {
  for_each          = {for account in var.client_accounts : account.account_id => account}
  type              = "egress"
  from_port         = 9094
  to_port           = 9094
  protocol          = "tcp"
  cidr_blocks       = [each.value.cidr_block]
  security_group_id = aws_security_group.msk_sg.id
}

resource "aws_cloudwatch_log_group" "log_group" {
  name              = "/aws/msk/${local.cluster_name}"
  retention_in_days = var.cloud_watch_retention_days
  tags              = {
    "Name" = local.cluster_name
  }
}

resource "aws_msk_cluster" "msk_cluster" {
  cluster_name           = local.cluster_name
  kafka_version          = var.kafka_version
  number_of_broker_nodes = local.number_of_broker_nodes

  broker_node_group_info {
    instance_type   = var.instance_type
    client_subnets  = var.private_subnet_ids
    security_groups = [aws_security_group.msk_sg.id]
    storage_info {
      ebs_storage_info {
        volume_size = var.volume_size
      }
    }
  }

  encryption_info {
    encryption_at_rest_kms_key_arn = var.kms_msk_arn
  }

  logging_info {
    broker_logs {
      cloudwatch_logs {
        enabled   = true
        log_group = aws_cloudwatch_log_group.log_group.name
      }
    }
  }

  client_authentication {
    unauthenticated = false
    sasl {
      iam   = true
      scram = false
    }
  }
}

resource "aws_msk_cluster_policy" "msk_cluster_policy" {
  count       = length(var.client_accounts) > 0 ? 1 : 0
  cluster_arn = aws_msk_cluster.msk_cluster.arn
  policy      = data.aws_iam_policy_document.cluster_iam_policy_document.json
}

data "aws_iam_policy_document" "cluster_iam_policy_document" {
  statement {
    effect = "Allow"
    principals {
      type        = "AWS"
      identifiers = [for client_account in var.client_accounts : "arn:aws:iam::${client_account.account_id}:root"]
    }
    actions = [
      "kafka:CreateVpcConnection",
      "kafka:GetBootstrapBrokers",
      "kafka:DescribeCluster",
      "kafka:DescribeClusterV2",
      "kafka-cluster:Connect",
      "kafka-cluster:DescribeTopic",
    ]
    resources = [
      aws_msk_cluster.msk_cluster.arn,
      "arn:aws:kafka:*:${var.current_account_id}:topic/${local.cluster_name}/*/*"
    ]
  }
}

Affected Resource(s) and/or Data Source(s)

aws_msk_cluster, aws_msk_cluster_policy

Potential Terraform Configuration

No response

References

No response

Would you like to implement a fix?

None

@malamin malamin added the enhancement Requests to existing resources that expand the functionality or scope. label Nov 15, 2023
@github-actions github-actions bot added service/iam Issues and PRs that pertain to the iam service. service/kafka Issues and PRs that pertain to the kafka service. service/logs Issues and PRs that pertain to the logs service. service/vpc Issues and PRs that pertain to the vpc service. labels Nov 15, 2023
Copy link

Community Note

Voting for Prioritization

  • Please vote on this issue by adding a 👍 reaction to the original post to help the community and maintainers prioritize this request.
  • Please see our prioritization guide for information on how we prioritize.
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request.

Volunteering to Work on This Issue

  • If you are interested in working on this issue, please leave a comment.
  • If this would be your first contribution, please review the contribution guide.

@terraform-aws-provider terraform-aws-provider bot added the needs-triage Waiting for first response or review from a maintainer. label Nov 15, 2023
@justinretzolk
Copy link
Member

Hey @malamin 👋 Thank you for taking the time to raise this! I found #31062, which seems to reference the relatively new aws_msk_vpc_connection resource as an answer for this. Admittedly, I'm not familiar enough to know; can you review the linked issue (and it's comments), and the linked resource document and let me know if that covers what you're looking for?

@justinretzolk justinretzolk added waiting-response Maintainers are waiting on response from community or contributor. and removed service/iam Issues and PRs that pertain to the iam service. needs-triage Waiting for first response or review from a maintainer. service/logs Issues and PRs that pertain to the logs service. service/vpc Issues and PRs that pertain to the vpc service. labels Nov 15, 2023
@malamin
Copy link
Author

malamin commented Nov 20, 2023 via email

@github-actions github-actions bot removed the waiting-response Maintainers are waiting on response from community or contributor. label Nov 20, 2023
@naviat
Copy link

naviat commented Jan 1, 2024

@justinretzolk Should we consider automating manual steps, such as enabling multi-VPC, by using Terraform?
@malamin Do you have any way to do this case by terraform?

@cobbr2
Copy link

cobbr2 commented Jan 20, 2024

Definitely want to be able to do it from terraform myself. Any interface in AWS that involves a long UPDATING wait (which this does) really needs to be automated. Also, the msk_cluster_policy resource doesn't work on provisioned clusters until this has been done (serverless clusters seem to get this work done for free); it generates

 Error: setting MSK Cluster Policy (arn:aws:kafka:STUFF:cluster/common-blue/ANDNONNSENSE): operation error Kafka: PutClusterPolicy, https response error StatusCode: 400, RequestID: 7755396f-74b4-4eae-860a-9ef80efea1df, BadRequestException: The cluster must have multi-VPC private connectivity enabled for its cluster policy.

It appears the correct API is documented at https://docs.aws.amazon.com/msk/1.0/apireference/clusters-clusterarn-security.html (we want to update the VpcConnectivityInfo).

@cobbr2
Copy link

cobbr2 commented Feb 8, 2024

That problem with setting the MSK Cluster Policy appears to me to be a red herring. I saw it also with MSK Serverless, which does not have the multi-VPC requirement (it does it by default). I think the problem in our case is that we were starting deployment of two MSK replicators while the policy was still being set up. If we make the replicators depend on the policies (instead of just the clusters), we reliably set the policies. (We don't reliably set up working replicators, but that'll be another issue when AWS support and I can finally figure out why.)

@dabmajor
Copy link

The aws_msk_vpc_connection resource appears to be for creating a Managed VPC Connection. As I understand it, a Managed VPC Connection is different than enabling multi-VPC connectivity on a single cluster's network configuration.

@fedeostrit
Copy link

It is terrible that this cannot be enabled by terraform, it is precisely as indicated that there is no option to enable this and it would have to be in the resource "aws_msk_cluster" "multi-vpc = true or false" "multi-vpc = enabled or disabled"

@nairb
Copy link

nairb commented Apr 18, 2024

I found this thread while attempting to use MSK as a source for Opensearch Ingestion, which requires multi-vpc be enabled. You can turn it on with Terraform like so:

resource "aws_msk_cluster" "msk_cluster" {
  ...
  broker_node_group_info {
    connectivity_info {
      vpc_connectivity {
        client_authentication {
          sasl { 
            iam = "true" 
          }
        }
      }
    }
  }
  ...
}

@dabmajor
Copy link

I found this thread while attempting to use MSK as a source for Opensearch Ingestion, which requires multi-vpc be enabled. You can turn it on with Terraform like so:

resource "aws_msk_cluster" "msk_cluster" {
  ...
  broker_node_group_info {
    connectivity_info {
      vpc_connectivity {
        client_authentication {
          sasl { 
            iam = "true" 
          }
        }
      }
    }
  }
  ...
}

Based on the testing I have seen, this is not a complete solution and is still dependent on manual configuration in the aws console.

@nairb
Copy link

nairb commented Apr 19, 2024

I found this thread while attempting to use MSK as a source for Opensearch Ingestion, which requires multi-vpc be enabled. You can turn it on with Terraform like so:

resource "aws_msk_cluster" "msk_cluster" {
  ...
  broker_node_group_info {
    connectivity_info {
      vpc_connectivity {
        client_authentication {
          sasl { 
            iam = "true" 
          }
        }
      }
    }
  }
  ...
}

Based on the testing I have seen, this is not a complete solution and is still dependent on manual configuration in the aws console.

All of my resources were created via Terraform with no manual intervention and OSIS was able to create the vpc connection to the MSK cluster.

@sagar89jadhav
Copy link

I found this thread while attempting to use MSK as a source for Opensearch Ingestion, which requires multi-vpc be enabled. You can turn it on with Terraform like so:

resource "aws_msk_cluster" "msk_cluster" {
  ...
  broker_node_group_info {
    connectivity_info {
      vpc_connectivity {
        client_authentication {
          sasl { 
            iam = "true" 
          }
        }
      }
    }
  }
  ...
}

Based on the testing I have seen, this is not a complete solution and is still dependent on manual configuration in the aws console.

All of my resources were created via Terraform with no manual intervention and OSIS was able to create the vpc connection to the MSK cluster.

I did try the above configuration. Yes, it worked but terraform took almost 1hr 15min. to update the private link/ multi-vpc. settings.

@dabmajor
Copy link

dabmajor commented Jun 14, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Requests to existing resources that expand the functionality or scope. service/kafka Issues and PRs that pertain to the kafka service.
Projects
None yet
Development

No branches or pull requests

8 participants