latest capa-controller is not working properly with our cluster-eks and spawns infinite number of VPC due some error #3048

AndiDog · 2023-12-14T10:36:45Z

during network creation it will fail and that cause the process to restart and create VPC again

Probably related:

E1213 13:17:04.251075       1 controller.go:324] "Reconciler error" err="failed to reconcile network for AWSManagedControlPlane org-giantswarm/vac0eks: failed to patch conditions: AWSManagedControlPlane.controlplane.cluster.x-k8s.io \"vac0eks\" is invalid: spec.network.subnets[7]: Duplicate value: map[string]interface {}{\"id\":\"\"}" controller="awsmanagedcontrolplane" controllerGroup="controlplane.cluster.x-k8s.io" controllerKind="AWSManagedControlPlane" AWSManagedControlPlane="org-giantswarm/vac0eks" namespace="org-giantswarm" name="vac0eks" reconcileID="12841c2f-7435-4f3b-9106-782a05e97702"

The text was updated successfully, but these errors were encountered:

AndiDog · 2023-12-14T15:46:25Z

The mess on the test AWS account is cleaned up. I don't have an effort estimate yet for CAPA, given how its AWSManagedControlPlane reconciler has zero unit tests covering the Reconcile or reconcileNormal functions and testability (e.g. dependency injection for mockability) must first be added.

AndiDog · 2024-01-04T12:18:20Z

The bug is very basic: CAPA creates a VPC, fails to store it for whatever reason. On next reconciliation, CAPA pretends it doesn't know anything about the VPC (which it really doesn't without making AWS requests) and happily creates a new one. In our case of repeated errors, this happens again and again. I implemented a basic unit test for EKS, and VPC creation idempotence (applies to EC2 and EKS based clusters alike). Upstream PR coming up soon.

AndiDog · 2024-01-05T09:24:56Z

Unfortunately, my pending PR kubernetes-sigs/cluster-api-provider-aws#4637 blocks opening the follow-up which adds the EKS unit test.

However I managed to extract the fix for the blatant bug in a small, separate PR kubernetes-sigs/cluster-api-provider-aws#4723 so we can go on nevertheless and fix the terrifying issue.

AndiDog · 2024-01-10T13:14:00Z

Image pull errors fixed via giantswarm/cluster-api-provider-aws-app#211, so now this should be done (except for developer-reserved MCs where Flux is paused, currently golem).

AndiDog · 2024-01-11T08:19:02Z

The storage error will be solved via #2870.

AndiDog self-assigned this Dec 14, 2023

architectbot added the team/phoenix Team Phoenix label Dec 14, 2023

AndiDog added area/kaas Mission: Cloud Native Platform - Self-driving Kubernetes as a Service kind/bug provider/cluster-api-aws Cluster API based running on AWS topic/capi labels Dec 14, 2023

AndiDog mentioned this issue Dec 14, 2023

Revert CRDs upgrade since CAPA creates unlimited VPCs for EKS clusters because the subnet id field cannot be set giantswarm/cluster-api-provider-aws-app#200

Merged

1 task

AndiDog mentioned this issue Dec 20, 2023

Backported fixes and features for CAPA v2.3.x giantswarm/cluster-api-provider-aws-app#204

Merged

1 task

This was referenced Jan 8, 2024

Backports of several fixes giantswarm/cluster-api-provider-aws#580

Merged

More backported fixes and features for CAPA v2.3.x giantswarm/cluster-api-provider-aws-app#208

Merged

Release v2.10.0 giantswarm/cluster-api-provider-aws-app#209

Merged

AndiDog closed this as completed Jan 10, 2024

This was referenced Jan 11, 2024

cluster-aws: Adapt to CAPA subnet spec changes #2870

Closed

Update CRDs to make the subnet id field required again giantswarm/cluster-api-provider-aws-app#212

Merged

AndiDog mentioned this issue Jan 19, 2024

cluster-aws/cluster-eks: Duplicate subnet map keys should be an error #3153

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

latest capa-controller is not working properly with our cluster-eks and spawns infinite number of VPC due some error #3048

latest capa-controller is not working properly with our cluster-eks and spawns infinite number of VPC due some error #3048

AndiDog commented Dec 14, 2023 •

edited

AndiDog commented Dec 14, 2023

AndiDog commented Jan 4, 2024

AndiDog commented Jan 5, 2024

AndiDog commented Jan 10, 2024

AndiDog commented Jan 11, 2024

latest capa-controller is not working properly with our cluster-eks and spawns infinite number of VPC due some error #3048

latest capa-controller is not working properly with our cluster-eks and spawns infinite number of VPC due some error #3048

Comments

AndiDog commented Dec 14, 2023 • edited

AndiDog commented Dec 14, 2023

AndiDog commented Jan 4, 2024

AndiDog commented Jan 5, 2024

AndiDog commented Jan 10, 2024

AndiDog commented Jan 11, 2024

AndiDog commented Dec 14, 2023 •

edited