Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Support TargetGroupBinding on targets outside the cluster's VPC #3479

Merged
merged 8 commits into from
Mar 15, 2024

Conversation

ikosenn
Copy link
Contributor

@ikosenn ikosenn commented Nov 12, 2023

Issue

fixes #3084

Description

This PR adds VPC ID to the TargetGroupBinding Spec so as to allow registration in target groups that are created with in a VPC that is different from that which the controller is running in.

Checklist

  • Added tests that cover your change (if possible)
  • Added/modified documentation as required (such as the README.md, or the docs directory)
  • Manually tested
  • Made sure the title of the PR is a good description that can go into the release notes

BONUS POINTS checklist: complete for good vibes and maybe prizes?! 🤯

  • Backfilled missing tests for code in same general area 🎉
  • Refactored something and made the world a better place 🌟

Copy link

linux-foundation-easycla bot commented Nov 12, 2023

CLA Signed

The committers listed above are authorized under a signed CLA.

@k8s-ci-robot
Copy link
Contributor

Welcome @ikosenn!

It looks like this is your first PR to kubernetes-sigs/aws-load-balancer-controller 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/aws-load-balancer-controller has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Nov 12, 2023
@k8s-ci-robot
Copy link
Contributor

Hi @ikosenn. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Nov 12, 2023
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Nov 12, 2023
@nathanpreen
Copy link

Hey @ikosenn how can we get this over the line?

@ikosenn
Copy link
Contributor Author

ikosenn commented Nov 27, 2023

@johngmyers whenever you have a minute kindly take a look at this PR. Follow up from #3085 (comment)

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Nov 28, 2023
@nathanpreen
Copy link

Hey @johngmyers / @M00nF1sh any chance we could get this looked at?

@ikosenn
Copy link
Contributor Author

ikosenn commented Jan 19, 2024

@oliviassss kindly assist in reviewing this PR.

@@ -41,7 +41,7 @@ func Test_targetGroupBindingMutator_MutateCreate(t *testing.T) {
wantErr error
}{
{
name: "targetGroupBinding with TargetType and ipAddressType already set",
name: "targetGroupBinding with TargetType and ipAddressType and vpcId already set",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have a separate unit test where the vpcId is not set?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment addressed


!!!tip ""
If the VpcId is not explicitly specified, a mutating webhook will automatically call AWS API to find the VpcId for your TargetGroup and set it to correct value.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please also add a sample yaml file here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment addressed

vpcId := m.vpcID
// Target group is in a different VPC from the cluster's VPC
if tgVpcId != "" && tgVpcId != m.vpcID {
vpcId = tgVpcId
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably better to call out via the logger that the specified VPC in TGB is different from cluster vpc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment addressed

@ikosenn ikosenn force-pushed the tg-different-vpc branch 2 times, most recently from abdb293 to 7c60930 Compare January 29, 2024 01:34
@oliviassss
Copy link
Collaborator

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jan 29, 2024
@oliviassss oliviassss added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Jan 29, 2024
@ikosenn
Copy link
Contributor Author

ikosenn commented Jan 30, 2024

/retest

@oliviassss
Copy link
Collaborator

@ikosenn, please check the error here, looks introduced by this PR
{"level":"error","ts":"2024-02-05T03:26:44Z","msg":"Reconciler error","controller":"ingress","object":{"name":"e2e-group.aa96c17f"},"namespace":"","name":"e2e-group.aa96c17f","reconcileID":"9b7875ec-ad84-47e1-b09e-af5d2f140af8","error":"admission webhook \"vtargetgroupbinding.elbv2.k8s.aws\" denied the request: TargetGroupBinding update may not change these fields: spec.vpcId"}

Maybe try to only run this failed test suite in your local to debug
Summarizing 1 Failure: [TIMEDOUT] test ingresses with multiple path and backends with podReadinessGate enabled [It] IngressGroup across namespaces should behaves correctly /home/prow/go/src/github.com/kubernetes-sigs/aws-load-balancer-controller/test/e2e/ingress/multi_path_backend_test.go:101
you can run this cmd with a focus on the specific test suite

ginkgo -timeout 2h -v -r test/e2e -- \
  --kubeconfig=${CLUSTER_KUBECONFIG} \
  --cluster-name=${CLUSTER_NAME} \
  --aws-region=${AWS_REGION} \
  --aws-vpc-id=${cluster_vpc_id} \

@ikosenn
Copy link
Contributor Author

ikosenn commented Feb 5, 2024

@ikosenn, please check the error here, looks introduced by this PR {"level":"error","ts":"2024-02-05T03:26:44Z","msg":"Reconciler error","controller":"ingress","object":{"name":"e2e-group.aa96c17f"},"namespace":"","name":"e2e-group.aa96c17f","reconcileID":"9b7875ec-ad84-47e1-b09e-af5d2f140af8","error":"admission webhook \"vtargetgroupbinding.elbv2.k8s.aws\" denied the request: TargetGroupBinding update may not change these fields: spec.vpcId"}

Maybe try to only run this failed test suite in your local to debug Summarizing 1 Failure: [TIMEDOUT] test ingresses with multiple path and backends with podReadinessGate enabled [It] IngressGroup across namespaces should behaves correctly /home/prow/go/src/github.com/kubernetes-sigs/aws-load-balancer-controller/test/e2e/ingress/multi_path_backend_test.go:101 you can run this cmd with a focus on the specific test suite

ginkgo -timeout 2h -v -r test/e2e -- \
  --kubeconfig=${CLUSTER_KUBECONFIG} \
  --cluster-name=${CLUSTER_NAME} \
  --aws-region=${AWS_REGION} \
  --aws-vpc-id=${cluster_vpc_id} \

I figured out why the tests were timing out. Working on a fix. Thanks for the tip. I had to wait for 2hrs for all e2e tests to run before it got to the problematic one. This will definitely save me sometime. Thanks.

@ikosenn ikosenn force-pushed the tg-different-vpc branch 2 times, most recently from 85c80f2 to 00dcc1a Compare February 12, 2024 04:36
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 28, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ikosenn, M00nF1sh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 28, 2024
@codecov-commenter
Copy link

codecov-commenter commented Feb 28, 2024

Codecov Report

Attention: Patch coverage is 78.04878% with 9 lines in your changes are missing coverage. Please review.

Project coverage is 17.92%. Comparing base (b63a294) to head (e65c8cd).

Files Patch % Lines
pkg/targetgroupbinding/resource_manager.go 0.00% 8 Missing ⚠️
pkg/deploy/elbv2/target_group_binding_manager.go 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3479      +/-   ##
==========================================
+ Coverage   17.83%   17.92%   +0.09%     
==========================================
  Files         175      175              
  Lines       29046    29085      +39     
==========================================
+ Hits         5180     5214      +34     
- Misses      23527    23533       +6     
+ Partials      339      338       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 1, 2024
@k8s-ci-robot
Copy link
Contributor

New changes are detected. LGTM label has been removed.

@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Mar 1, 2024
@ikosenn
Copy link
Contributor Author

ikosenn commented Mar 1, 2024

Hey @oliviassss and @M00nF1sh. I pushed some tests to improve coverage in the latest commit.
Also if you have a minute kindly help me understand why this failed https://github.com/kubernetes-sigs/aws-load-balancer-controller/actions/runs/8000114447/job/22100889749?pr=3479. Running make crds doesn't introduces any new changes

@oliviassss
Copy link
Collaborator

@ikosenn, thank you for adding the tests. I have this commit to fix the failure in ci unit-test, maybe a rebase?
#3580

@ikosenn
Copy link
Contributor Author

ikosenn commented Mar 2, 2024

@ikosenn, thank you for adding the tests. I have this commit to fix the failure in ci unit-test, maybe a rebase?
#3580

@oliviassss Thanks for the tip. I had already rebased. Looks like I need to update some packages. Was missing --- on the yaml file e65c8cd. Hopefully that fixes it.

@ikosenn
Copy link
Contributor Author

ikosenn commented Mar 8, 2024

Hi @oliviassss and @M00nF1sh. Should I go ahead and update the packages that failed the vulnerability checks? Looks like minor/patch version upgrades

@oliviassss
Copy link
Collaborator

@ikosenn, I think we need to upgrade controller-runtime, but it has some breaking changes that will break the LBC codes. I'll investigate later, please check my comment here: #3598 (comment)

@oliviassss oliviassss merged commit ac21af6 into kubernetes-sigs:main Mar 15, 2024
7 of 9 checks passed
@shraddhabang
Copy link
Collaborator

@ikosenn, please check the error here, looks introduced by this PR {"level":"error","ts":"2024-02-05T03:26:44Z","msg":"Reconciler error","controller":"ingress","object":{"name":"e2e-group.aa96c17f"},"namespace":"","name":"e2e-group.aa96c17f","reconcileID":"9b7875ec-ad84-47e1-b09e-af5d2f140af8","error":"admission webhook \"vtargetgroupbinding.elbv2.k8s.aws\" denied the request: TargetGroupBinding update may not change these fields: spec.vpcId"}

I figured out why the tests were timing out. Working on a fix. Thanks for the tip. I had to wait for 2hrs for all e2e tests to run before it got to the problematic one. This will definitely save me sometime. Thanks.

Hey @ikosenn , I am getting the same reconcile error on my setup when I am trying your feature for testing? It happens for the existing tgbs in ingresses. Any idea?

@ikosenn
Copy link
Contributor Author

ikosenn commented Mar 26, 2024

@ikosenn, please check the error here, looks introduced by this PR {"level":"error","ts":"2024-02-05T03:26:44Z","msg":"Reconciler error","controller":"ingress","object":{"name":"e2e-group.aa96c17f"},"namespace":"","name":"e2e-group.aa96c17f","reconcileID":"9b7875ec-ad84-47e1-b09e-af5d2f140af8","error":"admission webhook \"vtargetgroupbinding.elbv2.k8s.aws\" denied the request: TargetGroupBinding update may not change these fields: spec.vpcId"}

I figured out why the tests were timing out. Working on a fix. Thanks for the tip. I had to wait for 2hrs for all e2e tests to run before it got to the problematic one. This will definitely save me sometime. Thanks.

Hey @ikosenn , I am getting the same reconcile error on my setup when I am trying your feature for testing? It happens for the existing tgbs in ingresses. Any idea?

@shraddhabang I think I have a hunch. It must be because of the fix I did for the e2e to pass I had to add VpcId to the elbv2model.TargetGroupBindingSpec which works well for new ingress TGBs but if you had one already it would add the VpcId to the spec and that would fail the check here. Will get a solution ASAP.

@shraddhabang
Copy link
Collaborator

@ikosenn Hey, Do you have any update on the fix?

@ikosenn
Copy link
Contributor Author

ikosenn commented Mar 29, 2024

@ikosenn Hey, Do you have any update on the fix?

@shraddhabang working on it right now. Was struggling to get a solution that I liked but I have one now. Going to run upgrade tests on the three components (TG, ingress, service) to ensure backwards compatibility isn’t broken. Should have this up for review by Monday morning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cannot register nodes with target group outside host VPC
8 participants