Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add aws_security_group_rules resource #9032

Closed

Conversation

jakauppila
Copy link
Contributor

Takes the changes in #1824 and applies them on top of master with a couple required tweaks.

I do not have an account with EC2 Classic to test with, thus the single failed test.

Community Note

  • Please vote on this pull request by adding a 👍 reaction to the original pull request comment to help the community and maintainers prioritize this request
  • Please do not leave "+1" comments, they generate extra noise for pull request followers and do not help prioritize the request

Release note for CHANGELOG:

New Resource: aws_security_group_rules

Output from acceptance testing:

$ make testacc TESTARGS='-run=TestAccAWSSecurityGroupRules_'
==> Checking that code complies with gofmt requirements...
TF_ACC=1 go test ./... -v -parallel 20 -run=TestAccAWSSecurityGroupRules_ -timeout 120m
?       github.com/terraform-providers/terraform-provider-aws   [no test files]
=== RUN   TestAccAWSSecurityGroupRules_basic
--- PASS: TestAccAWSSecurityGroupRules_basic (30.53s)
=== RUN   TestAccAWSSecurityGroupRules_ipv6
--- PASS: TestAccAWSSecurityGroupRules_ipv6 (30.36s)
=== RUN   TestAccAWSSecurityGroupRules_self
--- PASS: TestAccAWSSecurityGroupRules_self (29.77s)
=== RUN   TestAccAWSSecurityGroupRules_vpc
--- PASS: TestAccAWSSecurityGroupRules_vpc (29.80s)
=== RUN   TestAccAWSSecurityGroupRules_vpcNegOneIngress
--- PASS: TestAccAWSSecurityGroupRules_vpcNegOneIngress (29.92s)
=== RUN   TestAccAWSSecurityGroupRules_vpcProtoNumIngress
--- PASS: TestAccAWSSecurityGroupRules_vpcProtoNumIngress (29.28s)
=== RUN   TestAccAWSSecurityGroupRules_MultiIngress
--- PASS: TestAccAWSSecurityGroupRules_MultiIngress (31.17s)
=== RUN   TestAccAWSSecurityGroupRules_Change
--- PASS: TestAccAWSSecurityGroupRules_Change (48.06s)
=== RUN   TestAccAWSSecurityGroupRules_drift
--- PASS: TestAccAWSSecurityGroupRules_drift (18.04s)
=== RUN   TestAccAWSSecurityGroupRules_drift_complex
--- PASS: TestAccAWSSecurityGroupRules_drift_complex (31.16s)
=== RUN   TestAccAWSSecurityGroupRules_invalidCIDRBlock
--- PASS: TestAccAWSSecurityGroupRules_invalidCIDRBlock (1.46s)
=== RUN   TestAccAWSSecurityGroupRules_CIDRandGroups
--- PASS: TestAccAWSSecurityGroupRules_CIDRandGroups (30.70s)
=== RUN   TestAccAWSSecurityGroupRules_ingressWithCidrAndSGs
--- PASS: TestAccAWSSecurityGroupRules_ingressWithCidrAndSGs (31.10s)
=== RUN   TestAccAWSSecurityGroupRules_ingressWithCidrAndSGs_classic
--- FAIL: TestAccAWSSecurityGroupRules_ingressWithCidrAndSGs_classic (10.97s)
    testing.go:568: Step 0 error: errors during apply:

        Error: Error authorizing security group ingress rules: InvalidGroupId.Malformed: Invalid id: "tf_other_acc_tests" (expecting "sg-...")
                status code: 400, request id: 36169e42-833b-4cbf-9c57-16ef4017f629

          on /tmp/tf-test016411106/main.tf line 24:
          (source code not available)


=== RUN   TestAccAWSSecurityGroupRules_egressWithPrefixList
--- PASS: TestAccAWSSecurityGroupRules_egressWithPrefixList (42.42s)
=== RUN   TestAccAWSSecurityGroupRules_failWithDiffMismatch
--- PASS: TestAccAWSSecurityGroupRules_failWithDiffMismatch (30.60s)
=== RUN   TestAccAWSSecurityGroupRules_Destroy
--- PASS: TestAccAWSSecurityGroupRules_Destroy (45.37s)
=== RUN   TestAccAWSSecurityGroupRules_Empty
--- PASS: TestAccAWSSecurityGroupRules_Empty (46.93s)
=== RUN   TestAccAWSSecurityGroupRules_ConflictEmpty
--- PASS: TestAccAWSSecurityGroupRules_ConflictEmpty (46.61s)
FAIL
FAIL    github.com/terraform-providers/terraform-provider-aws/aws       594.295s
GNUmakefile:20: recipe for target 'testacc' failed
make: *** [testacc] Error 1

...

@ghost ghost added size/XXL Managed by automation to categorize the size of a PR. documentation Introduces or discusses updates to documentation. provider Pertains to the provider itself, rather than any interaction with AWS. service/ec2 Issues and PRs that pertain to the ec2 service. tests PRs: expanded test coverage. Issues: expanded coverage, enhancements to test infrastructure. labels Jun 18, 2019
@bflad bflad added new-resource Introduces a new resource. proposal Proposes new design or functionality. labels Jun 22, 2019
@bflad bflad requested a review from a team June 22, 2019 01:04
@jakauppila
Copy link
Contributor Author

@bflad Just curious if there's any sort of ETA for review?

@igozali
Copy link

igozali commented Jul 16, 2019

Hi @jakauppila, I came across this PR while looking for a solution to the issue where Terraform couldn't detect diffs on the aws_security_group resource (say, if changes are manually made on the AWS console). I'm not familiar enough with the code base to understand the scope of what's worked on in this PR, but do you think it could help fix that issue I linked?

@jakauppila
Copy link
Contributor Author

jakauppila commented Jul 16, 2019

HI @igozali, in my own testing of inline rules of aws_security_group it was able to detect drift and remediate accordingly to those defined outside of the definition.

This PR adds an additional aws_security_group_rules resource so when you have complex situations like security groups that reference each other (which cannot be accomplished via inline rules due to cyclical dependencies) it has a single source of truth to correct drift that is not defined within your TF configuration. Currently, the only option in these complex scenarios is to define explicit aws_security_group_rule resources that only care about that particular rule and thus cannot correct drift outside of those definitions.

Are you defining your rules inline within the aws_security_group resource or with separate aws_security_group_rule resources?

@igozali
Copy link

igozali commented Jul 16, 2019

Thanks for responding! After reading your description, I think your PR solves a different issue. My issue is the following: if I have a security group as follows (which, to answer your question, security groups are defined inline):

provider "aws" {
  version = "~> 2.19"
  region  = "us-west-1"
}

terraform {
  required_version = "~> 0.12.0"
}

resource aws_security_group "test" {
  name   = "please_delete_me"
  vpc_id = "vpc-xxxxxxxx"

  ingress {
    from_port   = 2222
    to_port     = 2222
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  egress {
    from_port   = 2222
    to_port     = 2222
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }  
}

And then in AWS console, I manually add an ingress rule, Terraform seems to detect the drift properly.

However, if I remove the last ingress rule in Terraform aws_security_group resource as follows:

provider "aws" {
  version = "~> 2.19"
  region  = "us-west-1"
}

terraform {
  required_version = "~> 0.12.0"
}

resource aws_security_group "test" {
  name   = "please_delete_me"
  vpc_id = "vpc-xxxxxxxx"

  egress {
    from_port   = 2222
    to_port     = 2222
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }
}

Terraform won't be able to detect changes on the ingress rules on the terraform plan invocation.

Sorry for the noise due to an unrelated issue!

@mattkovacik
Copy link

@bflad - I am also eagerly awaiting this new feature for the AWS provider plugin, do you think this will get included with the 2.23.0 release?

@tiagom62
Copy link

What remains to get this merged in? The PR has been sitting for 2 months.

@binarymist
Copy link

Progress update please? Desperately in need of this.

@aeschright
Copy link
Contributor

Hi everyone, thanks for your patience on this. As you probably noticed, we have a large backlog of PRs to look at right now. In addition, most of the team will be at HashiConf this week. Given the nature of this resource, I think we'll need to discuss it internally before continuing -- it may overlap with other product plans. We really appreciate the work you've put in, and I hope we can give you a more detailed answer soon.

@tmccombs
Copy link
Contributor

Should cidr_blocks and ipv6_cidr_blocks be schema.TypeSet rather than schema.TypeList?

@jakauppila
Copy link
Contributor Author

@aeschright Just curious if there's any timeline that could be provided to set expectations?

@jakauppila
Copy link
Contributor Author

@aeschright Any chance we could get an update on this?

@Jimmyscene
Copy link

Progress update please?

@iTaybb
Copy link

iTaybb commented Mar 30, 2020

Waiting eagerly...

@rdelcampog
Copy link
Contributor

@aeschright @bflad Maybe we can have a progress update? This is the second most 👍 PR and is opened for almost one year :(

@jakauppila
Copy link
Contributor Author

@bflad During the AWS Provider Office Hours, you mentioned that more thought needs to go into the use-cases before this PR could possibly be merged.

Could you express your concerns concretely here so we can address them?

@bflad
Copy link
Member

bflad commented Jun 5, 2020

Hi @jakauppila 👋 Thank you for bringing this up during our office hours and we apologize that this continues to be a frustrating experience with the Terraform AWS Provider.

To be fully transparent upfront: A lot of the silence on this manner stems from the fact that we (the HashiCorp maintainers of the project), do not have what we feel is an acceptable path forward for this particular issue that does not potentially increase confusion or potentially burden operators with existing configurations. Most importantly, this has nothing to do with the quality or proposal of this particular contribution. We should certainly be more upfront about that and I apologize since that may come across as not appreciating contributions or potentially raises other negative feelings. That is certainly not something we want to foster. We want to open a larger dialogue.

Going forward with this particular issue, it would be great if we can come together as a community to discuss these points, but likely in a different forum than comments on a proposed change request since this warrants a higher bandwidth discussion and likely a lengthy write up about the Terraform design decisions impacting this area. Below I will briefly try to summarize the problem and the recommended guidance on the problem, outline some of the historical and internal Terraform context to set the stage, present some of conflicting and confusing design decisions with any proposed change to this area, and finally offer some potential paths forward on the manner.


To begin, here is a summary this issue in a Terraform configuration from my understanding. Please let me know if this is incorrect. While the below only shows ingress for brevity, egress also has the same issue.

resource "aws_security_group" "a" {
  name = "a"

  ingress {
    from_port                = 22
    to_port                  = 22
    protocol                 = "tcp"
    source_security_group_id = aws_security_group.b.id
  }
}

resource "aws_security_group" "b" {
  name = "b"

  ingress {
    from_port                = 22
    to_port                  = 22
    protocol                 = "tcp"
    source_security_group_id = aws_security_group.a.id
  }
}

Effectively, the desire is to allow each of the EC2 Security Groups to cross-communicate. However, when this configuration is applied, Terraform will return a cycle error since both resources reference each other.

The current recommended guidance on this situation is to switch from using ingress/egress configuration blocks in the aws_security_group resource as shown above, to the below usage of only defining ingress and/or egress rules via the aws_security_group_rule resource (no ingress/egress configuration blocks in the aws_security_group resource):

resource "aws_security_group" "a" {
  name = "a"
}

resource "aws_security_group_rule" "a_from_b" {
  security_group_id        = aws_security_group.a.id
  type                     = "ingress"
  from_port                = 22
  to_port                  = 22
  protocol                 = "tcp"
  source_security_group_id = aws_security_group.b.id
}

resource "aws_security_group" "b" {
  name = "b"
}

resource "aws_security_group_rule" "b_from_a" {
  security_group_id        = aws_security_group.b.id
  type                     = "ingress"
  from_port                = 22
  to_port                  = 22
  protocol                 = "tcp"
  source_security_group_id = aws_security_group.a.id
}

We will discuss why this configuration change may not be desirable later on.


Terraform's core logic is based on a directed acyclic graph. All operations that are to be implemented in Terraform must abide to being composed into well defined nodes and edges. Each Terraform resource is a self-contained node and is generally wholly separate from other resources (nodes) except for connecting edges created by configuration references. (Aside: There is a special exception of provisioners, which have completely separate rules and handling with their own class of bugs and potentially confusing behaviors in comparison to resources). That graph determines operation ordering and concurrency, built from practitioner configurations. For more information about this particular design, see also the Terraform Internals section of the documentation. This implementation detail does not typically matter to most practitioners or contributions, however it is relevant to this discussion so it feels worth mentioning for anyone without that context.

When Terraform was initially being designed a few years ago, the schema that defines a Terraform resource was written around two conceptual implementation details. The first being standard read-write/read-only attributes and the second being "sub" resources. This can be seen with the Go types used to represent the resource schema, schema.Schema and schema.Resource (importantly, the latter are generally referred to as "configuration blocks" today).

Example of this implementation in the aws_security_group resource today:

// ... other schema omitted for brevity ...
"ingress": {
	Type:       schema.TypeSet,
	Optional:   true,
	Computed:   true,
	ConfigMode: schema.SchemaConfigModeAttr, // Please note: this in of itself is a special implementation detail for this attribute, but is unrelated to this discussion
	Elem: &schema.Resource{
		Schema: map[string]*schema.Schema{
			"from_port": {
				Type:     schema.TypeInt,
				Required: true,
			},
			// ... other schema omitted for brevity ...

With the equivalent Terraform configuration:

resource "aws_security_group" "example" {
  # ... other configuration omitted for brevity ...
  ingress {
    from_port = 22
    # ... other configuration omitted for brevity ...
  }
}

Back then, this concept of "sub" resources was presented as a potential future enhancement to Terraform to allow these inner pieces of schema to be handled separately in configurations (e.g. separate referencing) and the directed acyclic graph logic (e.g. ordering / cycle detection). Implementing these "sub" resources as separate graph nodes and edges which would help prevent cycle errors like this situation proved to be an exhaustive challenge however and has never come to fruition. Even through today (Terraform 0.12, 0.13, and the foreseeable pre-1.0 future), this internal feature of Terraform has become less and less likely to be implemented. While there have been glimmers of potential hope with updates to some of the major underlying logic such as dynamic for configuration blocks in Terraform 0.12, the maintainers across both teams are less confident this will occur anytime in the next year or two. Anecdotally, even the "sub" resource terminology has increasingly been taking a backseat in Terraform design discussions.

Another important Terraform concept within this discussion is Terraform's drift detection abilities. Given a previous state of a resource and its underlying schema attributes, Terraform and the Terraform Plugin SDK build a difference between that state and a desired configuration. A nice writeup of this concept and its internal implementation details within Terraform can be found in the Resource Instance Change Lifecycle document within the Terraform CLI repository. Terraform resources cannot support a partial configuration of an individual attribute as this is core to how Terraform represents a difference between the configuration and state.

The exceptions to enabling drift detection are encoded by provider developers adding Computed: true in the resource schema to disallow that behavior on attributes that conceptually have problems being modeled with drift detection for the majority use cases. These include situations where an API has multiple defaults for a value depending on other configuration, where a value may be implicitly created by certain actions in the API, or when multiple Terraform resources exist to manage the same infrastructure.

From a practitioner perspective, this drift detection means that any Terraform configuration will report a difference via terraform plan / terraform apply should the actual component change, when any portion of that attribute is configured. Many practitioners have come to expect that a majority of Terraform's attributes have this functionality enabled to ensure unexpected changes are caught, recreate environments that will exactly match the existing ones, and enforce compliance needs.

Given the above constraints and when an API has a parent-child relationship between components, Terraform resources must be designed in one of two manners. The first being all children components being exclusively managed by a parent resource and the second being all child components individually managed by a child resource without the ability to detect when extraneous children components exist. It is possible to allow both of these resources to exist for different use cases (and we have plenty of cases of this in the Terraform AWS Provider), but they cannot coexist in the configuration of a single parent, otherwise a perpetual difference will occur between the two. This behavior cannot be avoided by provider developers coding the resources differently or by practitioner configurations without an ignore_changes workaround and complicated update workflow. This is often found to be counter-intuitive to newer practitioners as their configuration seems valid, however these reports often necessitate the creation of additional highly visible documentation since providers cannot signal this problem to the Terraform user interface.

Separate children resources allow for greater composition (separate configurations can worry only about the components they need) while solely having a parent resource with exclusive management provides greater drift detection (forcing configurations to worry about all components). Across the Terraform ecosystem, there is no definitive methodology applied to this situation as its ambiguous whether one implementation may be better than the other.

The aws_security_group resource, which has been in existence for greater than five years and is integral to a large portion of the community's networking configurations, has ingress and egress configuration blocks that implement exclusive management of all EC2 Security Group ingress/egress rules for the EC2 Security Group. When any ingress or egress configuration is provided, Terraform will perform drift detection by proposing the addition of missing rules and removal of undeclared rules across all ingress/egress rules in the configuration and EC2.

Wholly separate, the aws_security_group_rule resource configures only an individual EC2 Security Group Rule. It is only concerned with that rule and no others that may or may not be present. To support this separate resource, the Computed: true property in the aws_security_group resource schema for ingress and egress does allow Terraform to ignore any differences when there is no configuration for the parent resource, to allow the child resource to manage the partial configuration.

There are two unique differences for the aws_security_group situation in comparison to other dual parent-child resources though. One is that the children references can be cyclic, which is atypical of almost all others. The other is that EC2 Security Groups are generally more sensitive pieces of configuration for the importance of drift detection since this is often a place where compliance audits are performed regularly.


Given that background, we can hopefully lay out some of the design decisions we need to consider:

  • Terraform and its provider ecosystem have been generally designed with the goal of usable and reliable infrastructure provisioning being top priority. Drift detection and a subset of this problem being exclusive management of child resources is a secondary priority to the first. The pragmatic decision to previously introduce a separate aws_security_group_rule resource satisfies the "usable" goal in this situation.
  • There is no real precedent for how to handle this particular situation in the Terraform ecosystem, given the equally frustrating combination of atypical cyclic references and increased desire for drift detection of this particular configuration.
  • The confusing behavior when attempting to manage child components between multiple of these parent-child resources is a constant source of bug reports and practitioner confusion. Even with documentation warnings, it is not a good user experience that the provider developers here have much control over.
  • Adding a second parent resource to the mix here, while not existing anywhere else in the Terraform ecosystem (that we are familiar with), could further increase this practitioner confusion. In particular, this new resource would not provide warnings/errors if attempting to use multiple of this resource to manage the same EC2 Security Group even ignoring the original cycle problem:
# This example would introduce perpetual differences
# without Terraform providing any user interface warnings.
# Practitioners would be required to do one of the following to learn its not supported:
#  * (Re-)Read resource documentation
#  * Ask colleagues or in a forum
#  * Report a GitHub issue
resource "aws_security_group" "a" {
  name = "a"
}

resource "aws_security_group_rules" "a-ingress-ssh" {
  security_group_id = aws_security_group.a.id

  ingress {
    from_port   = 22
    to_port     = 22
    protocol    = "tcp"
    cidr_blocks = ["10.0.0.0/8"]
  }

  # ... potentially others ...
}

# Potentially in another Terraform configuration, managed by some other team
resource "aws_security_group_rules" "a-ingress-https" {
  security_group_id = aws_security_group.a.id

  ingress {
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  # ... potentially others ...
}
# This example would introduce perpetual differences
# without Terraform providing any user interface warnings.
# The ingress/egress attributes do not have Computed: true
resource "aws_security_group" "a" {
  name = "a"
}

resource "aws_security_group_rules" "a-ingress" {
  security_group_id = aws_security_group.a.id

  ingress {
    from_port   = 22
    to_port     = 22
    protocol    = "tcp"
    cidr_blocks = ["10.0.0.0/8"]
  }

  ingress {
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  # ... potentially others ...
}

# Potentially in another Terraform configuration, managed by some other team
# aws_security_group_rules.a-ingress will remove egress
# aws_security_group_rules.a-egress will try to re-add
resource "aws_security_group_rules" "a-egress" {
  security_group_id = aws_security_group.a.id

  egress {
    from_port   = 0
    to_port     = 65536
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  # ... potentially others ...
}
  • Some additional questions may also arise: How do we tell the community about this new resource? Why is there a new resource? Which resource is correct or better? Do I have to migrate? Should I migrate? Can this be combined with existing resources? e.g.
# This example would introduce perpetual differences
# without Terraform providing any user interface warnings.
# The ingress/egress attributes do not have Computed: true
resource "aws_security_group" "a" {
  name = "a"
}

resource "aws_security_group_rules" "a-ingress" {
  security_group_id = aws_security_group.a.id

  ingress {
    from_port   = 22
    to_port     = 22
    protocol    = "tcp"
    cidr_blocks = ["10.0.0.0/8"]
  }

  ingress {
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  # ... potentially others ...
}

# Potentially in another Terraform configuration, managed by some other team
# aws_security_group_rules.a-ingress will always try to remove this rule, while this tries to add it
resource "aws_security_group_rule" "a-egress" {
  security_group_id = aws_security_group.a.id
  type              = "egress"
  from_port         = 0
  to_port           = 65536
  protocol          = "tcp"
  cidr_blocks       = ["0.0.0.0/0"]
}
# This example would introduce perpetual differences
# without Terraform providing any user interface warnings.
# The ingress/egress attributes do not have Computed: true
# aws_security_group_rules.a-ingress will always try to remove this rule, while this tries to add it
resource "aws_security_group" "a" {
  name = "a"

  egress {
    from_port   = 0
    to_port     = 65536
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }
}

resource "aws_security_group_rules" "a-ingress" {
  security_group_id = aws_security_group.a.id

  ingress {
    from_port   = 22
    to_port     = 22
    protocol    = "tcp"
    cidr_blocks = ["10.0.0.0/8"]
  }

  ingress {
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  # ... potentially others ...
}
  • The existing aws_security_group resource has a large usage footprint. We would be very hesitant to make breaking changes to that resource, including the deprecation/removal of its ingress and egress attributes so there remains one canonical parent resource, unless there is no other option since it would be an equally large burden on the community to change configurations.

All these put us in a rough position with the current proposal, since there is additional burden somewhere. We would prefer to not have a single resource that operates differently than the majority of other resources. While the above configurations may seem obvious when the resources are declared next to each other, varying team structures lead to varying configuration layouts and ownership.

By example: an application team, separate from an operations team which handles core networking management, may reach first for the plural rules resource since they wish to add multiple rules to a security group. They don't want to manage the security group itself (or at least if they tried, it'd error and require a manual import step, which is our warning mechanism in that scenario), so they avoid the aws_security_group resource. The aws_security_group_rule resource only does one rule when they want to manage multiple, so it seems best to use the plural. Terraform will not provide any warning about re-managing this infrastructure on resource creation and propose removing the operations team rules. Combine this with potentially missing the problem during human review processes and its likely to cause headaches for everyone involved.

Ideally this issue would be fixed upstream to either support these sub resources so we do not need a separate resource or we would be provided a mechanism for providers to provide warnings/errors if there are conflicting resources managing the same infrastructure to reduce confusion and reliance on purely documentation warnings. It is worth noting though that any of these warning enhancements would not work across separate Terraform states (usually split by team function), which is often the cause of these sorts of problems, so they are really only helpful in the more "obvious" same configuration scenario.


All of the above gives us pause to change the current situation, given that Terraform is still able to correctly provision this infrastructure, just without drift detection capabilities. To reiterate, this has nothing to do with this particular code submission, but rather an overall Terraform design problem. Since this design problem only exists in very few real world cases (in the amount of affected infrastructure sense), the broader Terraform teams with their limited resources have opted to spend time working on larger initiatives in the Terraform ecosystem such as configuration language improvements that positively impact more of the community.

We would encourage a few actions to move forward here, if this issue is drastically affecting your environment:

  • Ensuring there is an overall tracking issue in this project that succinctly captures this situation. Since this is a design problem, the maintainers would prefer discussions and decision making to occur via that type of forum first versus discussing a particular change request. If there are currently disparate issues outlining the same issue, we all can redirect others to this new centralized issue and better use it for accurate prioritization. (If you would like me to go through that issue creation effort, potentially using some of this background, please let me know.)
  • Ensuring there is an overall tracking issue in the Terraform core repository that succinctly captures this situation. Same ideas apply about centralizing discussions and any prioritization.
  • Reaching out to our product managers (for example, @maryelizbeth has contact information available in her GitHub profile) or your technical account managers (if you have one) outlining how this issue is affecting your environment.
  • We could potentially setup a call with you to discuss further in person.
  • We could potentially setup a community call on this manner to discuss further in person.

Please reach out with any questions and thank your for your time and consideration.

@jakauppila
Copy link
Contributor Author

jakauppila commented Jun 5, 2020

@bflad I appreciate the breadth and depth of your response, it really helps illustrate the problem and issues that could arise if merged as-is. Thank you.

I have created a post over on the forums to continue the discussion further: https://discuss.hashicorp.com/t/discussion-of-aws-security-group-rules-for-absolute-management-while-avoiding-cyclical-dependencies/9647

As included at the bottom of that post, I boiled the ask down to 2 user stories which most succinctly capture what we're trying to accomplish.

User Story # 1

As a Terraform Practitioner
I want to holistically manage two security groups that reference each other on either their ingress or egress rules
So that any rules introduced outside of the Infrastructure as Code definition are removed upon execution

User Story # 2

As a Governance, Risk Management, and Compliance Auditor
I want to know that Infrastructure as Code configurations contain and enforce the desired state of security group definitions
So that compliance is maintained

@bflad If you wouldn't mind, feel free to create the issue to track this, I don't think there was one that clearly captured it.

@tmccombs
Copy link
Contributor

tmccombs commented Jun 5, 2020

Another limitation of using aws_security_group_rule that I didn't see mentioned in the response, is that, iirc, any change to the rule results in the rule getting destroyed and recreated, which can result in a service interruption during the apply. This limitation does not apply to using ingress/egress blocks since the entire security group ruleset is updated at once.

@bflad
Copy link
Member

bflad commented Jun 6, 2020

@tmccombs you can use create_before_destroy to potentially handle that situation better (or create a second rule)

@tmccombs
Copy link
Contributor

tmccombs commented Jun 6, 2020

@bflad it's been a while since I had to deal with this, but I think that only works if the changes result in a rule that doesn't conflict with the original. So, for example, if you change the port range, or add or remove a cidr block from cidr_blocks, then AWS won't allow you to create the new resource before the old one has been destroyed, because it overlaps with an existing rule.

@binarymist
Copy link

Progress update please, can this be merged?

@jf
Copy link

jf commented Aug 14, 2020

Progress update please, can this be merged?

I think it looks like this was the intention, but #9032 (comment) seems to indicate that discussion is being moved to https://discuss.hashicorp.com/t/discussion-of-aws-security-group-rules-for-absolute-management-while-avoiding-cyclical-dependencies/9647

Base automatically changed from master to main January 23, 2021 00:56
@breathingdust breathingdust requested a review from a team as a code owner January 23, 2021 00:56
@zhelding
Copy link
Contributor

Pull request #21306 has significantly refactored the AWS Provider codebase. As a result, most PRs opened prior to the refactor now have merge conflicts that must be resolved before proceeding.

Specifically, PR #21306 relocated the code for all AWS resources and data sources from a single aws directory to a large number of separate directories in internal/service, each corresponding to a particular AWS service. This separation of code has also allowed for us to simplify the names of underlying functions -- while still avoiding namespace collisions.

We recognize that many pull requests have been open for some time without yet being addressed by our maintainers. Therefore, we want to make it clear that resolving these conflicts in no way affects the prioritization of a particular pull request. Once a pull request has been prioritized for review, the necessary changes will be made by a maintainer -- either directly or in collaboration with the pull request author.

For a more complete description of this refactor, including examples of how old filepaths and function names correspond to their new counterparts: please refer to issue #20000.

For a quick guide on how to amend your pull request to resolve the merge conflicts resulting from this refactor and bring it in line with our new code patterns: please refer to our Service Package Refactor Pull Request Guide.

@GoodMirek
Copy link

The existing aws_security_group resource has a large usage footprint. We would be very hesitant to make breaking changes to that resource, including the deprecation/removal of its ingress and egress attributes so there remains one canonical parent resource, unless there is no other option since it would be an equally large burden on the community to change configurations.

Sorry if my question was naive, but could it solve the compatibility concerns if there was a new boolean argument rules_managed_by_aws_security_group_rules for the resource aws_security_group ?
Of course, its default value should be false.
If value of that argument was true, then it would exclude possibility for using ingress and egress configuration blocks.

@m00lecule
Copy link

Keeping fingers crossed for this PR!

@GoodMirek
Copy link

@m00lecule Given this PR is three years old, I doubt it is going to be merged.

@m00lecule
Copy link

m00lecule commented May 19, 2022

@m00lecule Given this PR is three years old, I doubt it is going to be merged.

Yup it has been slept on @GoodMirek for too long. This resource might enhance ec2 network security by drift detection without cilcular dependency issue. I am also willing to help to resolve PR conflicts.

@jakauppila
Copy link
Contributor Author

@m00lecule Given this PR is three years old, I doubt it is going to be merged.

Yup it has been slept on @GoodMirek for too long. This resource might enhance ec2 network security by drift detection without cilcular dependency issue. I am also willing to help to resolve PR conflicts.

I'm happy to rebase this PR if/when the team commits to merging; but their reluctance is detailed in #9032 (comment). As far as I am aware, this stance has not changed.

@YakDriver
Copy link
Member

Problem to Be Solved

  1. A set of SG rules that
  2. Reference each other
  3. Have drift detection

Problem Now

aws_security_group allows 1) a set of SG rules. However, it does not allow 3) drift detection because Computed: True. If I'm understanding correctly, having 3) allows 2) reference each other to work.

aws_security_group_rule can be used multiple times to allow 1) a "set" of SG rules. These rules have 3) drift detection (no Computed: True) but cannot be 2) cross referential because of the basic way Terraform works (cycle detection).

Going forward

The problem I see here is that providing what is requested is not technically possible while letting Terraform itself manage the diff/drift of the rule set. Doing so would lead to cyclic errors and/or perpetual diffs. Within the resource / provider we would need to manage changes to provide 2) rule cross referencing and 3) drift detection. That breaks significantly with the provider model.

Thus, I suggest that we close this issue by Aug 16, 2022 unless:

  1. significant interest is reasserted and/or
  2. a feasible, design consistent path forward is discovered.

@YakDriver YakDriver added the waiting-response Maintainers are waiting on response from community or contributor. label Aug 2, 2022
@GoodMirek
Copy link

For me, the problem to be solved is a valid and common use case.
I understand there are technical difficulties and Terraform is currently not well suited to support this use case. At the same time, from my point of view this use case is so important that this issue should not be closed, although it might require a major improvement of some parts of Terraform.

Problem to Be Solved

A set of SG rules that
Reference each other
Have drift detection

@github-actions github-actions bot removed the waiting-response Maintainers are waiting on response from community or contributor. label Aug 2, 2022
@tmccombs
Copy link
Contributor

tmccombs commented Aug 2, 2022

@YakDriver first of all, I think your description is backwards. aws_security_group provides drift detection, but not ability to have cyclical dependencies between security groups (which I think is what you were getting at with 2). Whereas aws_security_group_rule allows having cyclical dependencies between security groups (because the rules are created after the security groups are created), but does not fully support drift detection, because security rules that are not in the terraform state, are not detected by terraform.

Secondly, from what I understand, it is technically possible, with the proposed new aws_security_group_rules resource. However, such a resource could potentially be confusing, and is a rather unusual pattern for a terraform provider.

@jakauppila
Copy link
Contributor Author

@YakDriver I would re-iterate the user stories I had posted in #9032 (comment) as to why this change is still needed.

User Story # 1

As a Terraform Practitioner
I want to holistically manage two security groups that reference each other on either their ingress or egress rules
So that any rules introduced outside of the Infrastructure as Code definition are removed upon execution

User Story # 2

As a Governance, Risk Management, and Compliance Auditor
I want to know that Infrastructure as Code configurations contain and enforce the desired state of security group definitions
So that compliance is maintained

The justification of not merging this change from @bflad in #9032 (comment) seems to focus on the user experience, and that introducing another method to manage security group rules will cause confusion.

Might I suggest that the ingress and egress blocks within aws_security_group be deprecated much like the blocks within aws_s3_bucket were when split out into their own resources? Obviously this would have a huge impact to users, but the S3 changes did as well.

@jar-b
Copy link
Member

jar-b commented Jan 30, 2023

Hello everyone - We want to thank you all for continued patience regarding this feature request. At this point the considerations outlined in our previous response have not changed, and we are choosing to close this pull request. We want to reiterate that this decision is in no way a reflection on the quality of the proposed resource itself, but rather to avoid placing an additional burden on practitioners using security group resources.

At times we have to make difficult judgment calls as maintainers, and always attempt to weigh impact to the broader community with support for specific cases. The fact that Terraform is still able to correctly provision this infrastructure (albeit without drift detection capabilities) heavily factored into the decision to avoid impacting all users for this specific case. Adding a third method for managing security group rules without an obvious deprecation path for the existing two patterns also factored in.

With all of this said, we recognize the importance of detecting drift on such a critical resource, and acknowledge we are leaving a gap for this particular case with this trade-off. As AWS now supports resource identifiers and tags for security group rules, we have begun exploring how this could improve the provider implementation. Effort thus far has focused on the unregistered (ie. not externally visible or documented) aws_vpc_security_group_[ingress|egress]_rule resources. These are in the very early design stages, and we have no deprecation plans or changes to recommended patterns at this moment. While we can make no guarantees, we will continue to keep this use case (circular rule dependencies with drift detection) in consideration as we weigh potential changes to security group rule patterns and to these resources under active development.

We understand that this will be a frustrating outcome for those who have invested time and effort into developing this feature or advocating for its inclusion. Thank you again for your effort, participation and patience.

@jar-b jar-b closed this Jan 30, 2023
@jakauppila
Copy link
Contributor Author

jakauppila commented Jan 30, 2023

@jar-b The addition of the aws_vpc_security_group_[ingress|egress]_rule resources is yet another method for (individually) managing rules and a direct counter to the previous response by maintainers further increasing the burden on practitioners rather than addressing the valid concerns as laid out in #9032 (comment).

@jar-b
Copy link
Member

jar-b commented Jan 31, 2023

Hi @jakauppila, thanks for your response.

I should have provided some additional context around the aws_vpc_security_group_[ingress|egress]_rule resources. This issue comment provides some background on why changes to the underlying AWS security group rule structures necessitate a new resource:

#20104 (comment)

While the existing aws_security_group_rule resource has been modified to include a security_group_rule_id attribute under some conditions, it is not always possible to make a 1:1 mapping. If ID-based security group rules were to be required for some future functionality, the value of a new resource that always generates a ID-based rule is more explicit.

#20104 (comment)

I also want to reiterate that these resources are not yet enabled, and that appropriate documentation and deprecation plans (if applicable) would be part of that effort.

Apologies for any confusion references to these new resources may have caused in my previous comment. By including them I meant to convey that we are continuing to consider how to improve the workflow around security groups, and not abandoning improvements to this resource group as closure of this pull request might imply.

@jakauppila
Copy link
Contributor Author

Hi @jar-b, appreciate the additional context.

So the creation of the new ingress|egress_rule resources makes sense due to the need to generate the IDs, but I would still challenge the team on addressing the concerns in #9032 as it pertains to cyclical dependencies and drift detection when it comes to rollout/deprecation plans.

Adding additional plural aws_vpc_security_group_[ingress|egress]_rules resources that holistically manage the rules for a particular security group would solve both problems.

@github-actions
Copy link

github-actions bot commented Mar 3, 2023

I'm going to lock this pull request because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 3, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
documentation Introduces or discusses updates to documentation. new-resource Introduces a new resource. proposal Proposes new design or functionality. provider Pertains to the provider itself, rather than any interaction with AWS. service/ec2 Issues and PRs that pertain to the ec2 service. size/XXL Managed by automation to categorize the size of a PR. tests PRs: expanded test coverage. Issues: expanded coverage, enhancements to test infrastructure.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet