Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MachinePool (AWS): AdditionalSecurityGroupIDs #2113

Conversation

2uasimojo
Copy link
Member

@2uasimojo 2uasimojo commented Sep 13, 2023

Mirror the installer's new AdditionalSecurityGroupIDs field (see openshift/installer#7151) into hive's MachinePool CRD. Plumb it into the installer's provider code so the additional SGIDs are included in generated MachineSets.

Note:
There's a back door, added via #1743 / a028ecd and refined via #1767 / cdf3601, whereby a (single) additional security group can be added by name. It is mutually exclusive with this feature: if you try to use both, your MachinePool will get the UnsupportedConfiguration condition. (We probably didn't need to make the two things mutually exclusive, but we did anyway.)

HIVE-2306

Mirror the installer's new AdditionalSecurityGroupIDs field (see
openshift/installer#7151) into hive's MachinePool CRD. Plumb it into the
installer's provider code so the additional SGIDs are included in
generated MachineSets.

Note:
There's a back door, added via openshift#1743 / a028ecd and refined via openshift#1767 /
cdf3601, whereby a (single) additional security group can be added by
name. It is mutually exclusive with this feature: if you try to use
both, your MachinePool will get the UnsupportedConfiguration condition.
(We probably didn't *need* to make the two things mutually exclusive,
but we did anyway.)

HIVE-2306
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 13, 2023
Copy link
Member Author

@2uasimojo 2uasimojo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/assign @abutcher

One thing to notice, related to #1767: There's no VPC ID in play for the additional SGs in this PR because they're SG IDs (not names) and are therefore globally unique. I think I understood that correctly, but please check me.

@@ -317,13 +335,6 @@ func (a *AWSActuator) updateProviderConfig(machineSet *machineapi.MachineSet, in
}
}

providerConfig.SecurityGroups = []machineapi.AWSResourceReference{{
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NTR: This was overwriting the filter generated by the installer's provider code. AFAICS, until this feature, it was overwriting it with the exact same value, generated the exact same way. Please scrutinize for any discrepancy I may have missed that would make removing this Bad™.

}

for _, test := range tests {
validateAWSMachineSets := func(t *testing.T, mSets []*machineapi.MachineSet, test ttype) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NTR: This chunk is mostly refactor. I was going to have to add another param to the func, which already had a param list a mile long. All but the first two were test.something, so I wanted to just be able to pass in test. In order to do so, I had to de-anonymize the test struct and then scope this func so I could ref it. (I could have pulled the ttype def out of the test function, but that seemed more wronger.)

More below...

Comment on lines +669 to +675
if test.expectedSGIDs != nil {
if assert.Equal(t, len(test.expectedSGIDs), len(awsProvider.SecurityGroups)-1, "expected n-1 SecurityGroups holding IDs") {
for i := 0; i < len(test.expectedSGIDs); i++ {
assert.Equal(t, test.expectedSGIDs[i], *awsProvider.SecurityGroups[i+1].ID, "mismatched security group ID")
}
}
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...here. Other than renaming vars, this is the only delta to the func: new logic to validate security group IDs.

@codecov
Copy link

codecov bot commented Sep 13, 2023

Codecov Report

Merging #2113 (c544d43) into master (28a3122) will increase coverage by 0.00%.
Report is 2 commits behind head on master.
The diff coverage is 100.00%.

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #2113   +/-   ##
=======================================
  Coverage   57.55%   57.55%           
=======================================
  Files         187      187           
  Lines       25810    25815    +5     
=======================================
+ Hits        14854    14859    +5     
  Misses       9711     9711           
  Partials     1245     1245           
Files Changed Coverage Δ
pkg/controller/machinepool/awsactuator.go 78.55% <100.00%> (+0.29%) ⬆️

@2uasimojo
Copy link
Member Author

/test e2e

infra flake

@2uasimojo
Copy link
Member Author

Also @abutcher: we don't need to do anything for the day 0 path here, do we? The IDs for the default pool would come in via the install-config, so anything in the MP would be irrelevant. It would end up being one of the many potential discrepancies between the MP and the pool. Does that cause problems when machines are scaled/deleted on the spoke?

@2uasimojo
Copy link
Member Author

/test e2e

infra flake

@2uasimojo
Copy link
Member Author

Tested this live, at least to the point where I proved that the additionalSecurityGroupIDs make it into the MachineSets.

@abutcher
Copy link
Member

Also @abutcher: we don't need to do anything for the day 0 path here, do we? The IDs for the default pool would come in via the install-config, so anything in the MP would be irrelevant. It would end up being one of the many potential discrepancies between the MP and the pool. Does that cause problems when machines are scaled/deleted on the spoke?

I could see discrepancies causing issues in future scaling but its no different from the current situation. In my testing below, I made the worker MP and the install config sync up wrt security groups + subnets, etc.


Created a cluster that utilized additionalSecurityGroupIDs in the install config which required providing pre-existing subnets (and a pre-existing VPC by virtue of needing pre-existing subnets).

Note that the installer requires that subnets be listed in install config platform when additionalSecurityGroupIDs are added to the worker machinepool. This requirement is because security groups are associated with VPCs so the subnets must be pre-created in BYO VPC. For this reason, I think we could require that hive MachinePools specify subnets via webhook when using additionalSecurityGroupIDs. It's basically implied because the machines won't get the new SG unless it's associated with the same VPC as the cluster's worker subnets but is a minor rough edge.

install-config secret looks like this:

- apiVersion: v1
  kind: Secret
  metadata:
    creationTimestamp: null
    name: abutchertest-install-config
  stringData:
    install-config.yaml: |
      apiVersion: v1
      baseDomain: hive.openshift.com
      compute:
      - name: worker
        platform:
          aws:
            rootVolume:
              iops: 0
              size: 120
              type: gp3
            type: m6i.xlarge
            additionalSecurityGroupIDs:
            - sg-02943a2e5ed3d9fb5
            zones:
            - us-east-1a
            - us-east-1c
            - us-east-1d
        replicas: 3
      controlPlane:
        name: master
        platform:
          aws:
            rootVolume:
              iops: 0
              size: 120
              type: gp3
            type: m6i.xlarge
            zones:
            - us-east-1a
            - us-east-1c
            - us-east-1d
        replicas: 3
      metadata:
        creationTimestamp: null
        name: abutchertest
      networking:
        clusterNetwork:
        - cidr: 10.128.0.0/14
          hostPrefix: 23
        machineNetwork:
        - cidr: 10.0.0.0/16
        serviceNetwork:
        - 172.30.0.0/16
      platform:
        aws:
          region: us-east-1
          subnets:
          - subnet-0cbc43a00e711d0a4
          - subnet-0b36a9848eced2140
          - subnet-05a6dc61cc42ddc89
          - subnet-0ca3b67d4dfda4c82
          - subnet-051a1b3295812ec53
          - subnet-0fb2d5778908e19c8

The day 0 worker MachineSet as adopted by Hive continues to list both the default SG and the additional SG I've added by ID.

apiVersion: hive.openshift.io/v1
kind: MachinePool
metadata:
  name: abutchertest-worker
  namespace: default
spec:
  clusterDeploymentRef:
    name: abutchertest
  name: worker
  platform:
    aws:
      additionalSecurityGroupIDs:
      - sg-02943a2e5ed3d9fb5
      rootVolume:
        size: 120
        type: gp3
      subnets:
      - subnet-0cbc43a00e711d0a4
      - subnet-0b36a9848eced2140
      - subnet-05a6dc61cc42ddc89
      - subnet-0ca3b67d4dfda4c82
      - subnet-051a1b3295812ec53
      - subnet-0fb2d5778908e19c8
      type: m6i.xlarge
      zones:
      - us-east-1a
      - us-east-1c
      - us-east-1d
  replicas: 3
[
  {
    "filters": [
      {
        "name": "tag:Name",
        "values": [
          "abutchertest-w798b-worker-sg"
        ]
      }
    ]
  },
  {
    "id": "sg-02943a2e5ed3d9fb5"
  }
]

An additional infra security group also picked up the additional security groups.

apiVersion: hive.openshift.io/v1
kind: MachinePool
metadata:
  name: abutchertest-infra
  namespace: default
spec:
  clusterDeploymentRef:
    name: abutchertest
  name: infra
  platform:
    aws:
      additionalSecurityGroupIDs:
      - sg-02943a2e5ed3d9fb5
      rootVolume:
        size: 120
        type: gp3
      subnets:
      - subnet-0cbc43a00e711d0a4
      - subnet-0b36a9848eced2140
      - subnet-05a6dc61cc42ddc89
      - subnet-0ca3b67d4dfda4c82
      - subnet-051a1b3295812ec53
      - subnet-0fb2d5778908e19c8
      type: m6i.xlarge
      zones:
      - us-east-1a
      - us-east-1c
      - us-east-1d
  replicas: 3
[
  {
    "filters": [
      {
        "name": "tag:Name",
        "values": [
          "abutchertest-w798b-worker-sg"
        ]
      }
    ]
  },
  {
    "id": "sg-02943a2e5ed3d9fb5"
  }
]

Machines were created, were configured with the extra SG and successfully became nodes.

Tested the mutual exclusivity of specifying additionalSecurityGroupIDs along with the "hive.openshift.io/extra-worker-security-group" annotation.

apiVersion: hive.openshift.io/v1
kind: MachinePool
metadata:
  annotations:
    "hive.openshift.io/extra-worker-security-group": abutchertestsg2
  name: abutchertest-infra
  namespace: default
spec:
  clusterDeploymentRef:
    name: abutchertest
  name: infra
  platform:
    aws:
      additionalSecurityGroupIDs:
      - sg-02943a2e5ed3d9fb5
      rootVolume:
        size: 120
        type: gp3
      subnets:
      - subnet-0cbc43a00e711d0a4
      - subnet-0b36a9848eced2140
      - subnet-05a6dc61cc42ddc89
      - subnet-0ca3b67d4dfda4c82
      - subnet-051a1b3295812ec53
      - subnet-0fb2d5778908e19c8
      type: m6i.xlarge
      zones:
      - us-east-1a
      - us-east-1c
      - us-east-1d
  replicas: 3

As expected, this was an unsupported configuration.

  - lastProbeTime: "2023-09-15T17:06:07Z"
    lastTransitionTime: "2023-09-15T17:06:07Z"
    message: The hive.openshift.io/extra-worker-security-group annotation cannot be
      used with AdditionalSecurityGroupIDs
    reason: SecurityGroupOptionConflict
    status: "True"
    type: UnsupportedConfiguration

Thorough check that the annotation continues to work independently of the new field. This also worked and resulted in machines which were configured with the extra worker SG identified by name in the configuration as it is with the annotation.

apiVersion: hive.openshift.io/v1
kind: MachinePool
metadata:
  annotations:
    "hive.openshift.io/extra-worker-security-group": abutchertestsg2
  name: abutchertest-infra
  namespace: default
spec:
  clusterDeploymentRef:
    name: abutchertest
  name: infra
  platform:
    aws:
      rootVolume:
        size: 120
        type: gp3
      subnets:
      - subnet-0cbc43a00e711d0a4
      - subnet-0b36a9848eced2140
      - subnet-05a6dc61cc42ddc89
      - subnet-0ca3b67d4dfda4c82
      - subnet-051a1b3295812ec53
      - subnet-0fb2d5778908e19c8
      type: m6i.xlarge
      zones:
      - us-east-1a
      - us-east-1c
      - us-east-1d
  replicas: 3
[
  {
    "filters": [
      {
        "name": "tag:Name",
        "values": [
          "abutchertest-w798b-worker-sg",
          "abutchertestsg2"
        ]
      },
      {
        "name": "vpc-id",
        "values": [
          "vpc-0714e8a04b6477a54"
        ]
      }
    ]
  }
]

Holding incase you wanna do the webhook validation but I don't think it's necessarily needed on a first pass. @2uasimojo
/hold

/lgtm

@openshift-ci openshift-ci bot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm Indicates that a PR is ready to be merged. labels Sep 15, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 15, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: 2uasimojo, abutcher

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@2uasimojo
Copy link
Member Author

Thanks for the deep review @abutcher!

I'm going to leave webhook validation on the backlog for now. There are so many holes in our MachinePool/install-config relationship already. Though I understand this one would be intra-MP only.

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 15, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 15, 2023

@2uasimojo: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-merge-robot openshift-merge-robot merged commit bb2b8bb into openshift:master Sep 15, 2023
8 checks passed
@2uasimojo 2uasimojo deleted the HIVE-2306/aws-mp-additional-security-groups branch September 15, 2023 21:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants