Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ServiceRouter constantly being synced when using enterprise Consul #3235

Closed
komapa opened this issue Nov 21, 2023 · 2 comments · Fixed by #3284
Closed

ServiceRouter constantly being synced when using enterprise Consul #3235

komapa opened this issue Nov 21, 2023 · 2 comments · Fixed by #3284
Assignees
Labels
type/bug Something isn't working

Comments

@komapa
Copy link

komapa commented Nov 21, 2023

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request. Searching for pre-existing feature requests helps us consolidate datapoints for identical requirements into a single place, thank you!
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request.
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment.

Overview of the Issue

So a couple of problem with ServiceRouter CRDs in consul-k8s, I will try to summarize them all in this ticket because they are somewhat related and probably fixed with the same effort on your side.

  1. When specifying routes[].destination.requestTimeout, if I specify it as 180s, consul-k8s will actually make that 3m0s so when we are using tools like ArgoCD to make sure manifests are not changes, that causes diff between the manifest we submitted as 180s and what is persistent in the CRD: 3m0s. That is kind of easy to work around by changing our manifests to use 180s but this leads to number 2)
  2. It seems like we always have to specify idleTimeout: 0s because if we do not specify it, the CRD defaults to it anyway and also causes diffs in ArgoCD. Again, we have to work around it by asking engineers to add that value even though it does not make sense for their destination logic.

Which leads us to the main problem which is the fact that when we are running in WAN federation, there is only ever one admin partition and that is called "default". consul-k8s has a problem where it defaults the partition to empty value but Consul sever 1.5.x sets it as the value default so your configEntry.MatchesConsul(entry) check fails even though nothing is different between Consul and Kubernetes.

Running a diff instead of cmp shows the problem:

&api.ServiceRouterConfigEntry{
      ... // 2 ignored and 2 identical fields
      Routes: []api.ServiceRoute{
          {
              Match: &{HTTP: &{PathPrefix: "/"}},
              Destination: &api.ServiceRouteDestination{
                  Service:        "",
                  ServiceSubset:  "",
                  Namespace:      "application-platform-feature-flags",
-                 Partition:      "",
+                 Partition:      "default",
                  PrefixRewrite:  "",
                  RequestTimeout: s"3m0s",
                  ... // 7 identical fields
              },
          },
      },
      ... // 3 ignored fields
  }

Reproduction Steps

❯ consul config read -kind service-router -name feature-flags -namespace application-platform-feature-flags
{
    "Kind": "service-router",
    "Name": "feature-flags",
    "Partition": "default",
    "Namespace": "application-platform-feature-flags",
    "Routes": [
        {
            "Match": {
                "HTTP": {
                    "PathPrefix": "/"
                }
            },
            "Destination": {
                "RequestTimeout": "3m0s",
                "Namespace": "application-platform-feature-flags",
                "Partition": "default",
                "NumRetries": 5,
                "RetryOnConnectFailure": true
            }
        }
    ],
    "Meta": {
        "consul.hashicorp.com/source-datacenter": "us-135-stag-default",
        "external-source": "kubernetes"
    },
    "CreateIndex": 24426017,
    "ModifyIndex": 24810004
}
apiVersion: consul.hashicorp.com/v1alpha1
kind: ServiceRouter
metadata:
  creationTimestamp: "2023-11-18T01:33:36Z"
  finalizers:
  - finalizers.consul.hashicorp.com
  generation: 15
  labels:
    argocd.argoproj.io/instance: application-platform-feature-flags-stag-default-us-east-1
  name: feature-flags
  namespace: application-platform-feature-flags
  resourceVersion: "186291980"
  uid: cbbff1db-86e0-4d0c-8857-56a327bfbb34
spec:
  routes:
  - destination:
      idleTimeout: 0s
      namespace: application-platform-feature-flags
      numRetries: 5
      requestTimeout: 3m0s
      retryOnConnectFailure: true
    match:
      http:
        pathPrefix: /
status:
  conditions:
  - lastTransitionTime: "2023-11-21T04:41:24Z"
    status: "True"
    type: Synced
  lastSyncedTime: "2023-11-21T04:41:24Z"

Expected behavior

Expected was to not have an infinity sync loop in the consul-connect-injector pods.

Environment details

consul-k8s version 1.2.3

Additional Context

This is the diff that I was able to at least stop it from syncing all the time:

Index: control-plane/build-support/functions/20-build.sh
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/control-plane/build-support/functions/20-build.sh b/control-plane/build-support/functions/20-build.sh
--- a/control-plane/build-support/functions/20-build.sh	(revision 96ae13d32f01d0d3e888d3ff2240e48b4da8ae9e)
+++ b/control-plane/build-support/functions/20-build.sh	(date 1700537403021)
@@ -188,7 +188,7 @@
    #   build with go install.
    #   The GOXPARALLEL environment variable is used if set
 
-   if [ $GOTAGS == "fips" ]; then
+   if [ "$GOTAGS" == "fips" ]; then
      CGO_ENABLED=1
    else
      CGO_ENABLED=0
Index: control-plane/controllers/configentry_controller.go
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/control-plane/controllers/configentry_controller.go b/control-plane/controllers/configentry_controller.go
--- a/control-plane/controllers/configentry_controller.go	(revision 96ae13d32f01d0d3e888d3ff2240e48b4da8ae9e)
+++ b/control-plane/controllers/configentry_controller.go	(date 1700542012198)
@@ -223,7 +223,7 @@
 		// This functionality exists to help folks who are upgrading from older helm
 		// chart versions where they had previously created config entries themselves but
 		// now want to manage them through custom resources.
-		if configEntry.GetObjectMeta().Annotations[common.MigrateEntryKey] != common.MigrateEntryTrue {
+		if !configEntry.MatchesConsul(entry) && configEntry.GetObjectMeta().Annotations[common.MigrateEntryKey] != common.MigrateEntryTrue {
 			return r.syncFailed(ctx, logger, crdCtrl, configEntry, ExternallyManagedConfigError,
 				sourceDatacenterMismatchErr(sourceDatacenter))
 		}
Index: control-plane/api/v1alpha1/servicerouter_types.go
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/control-plane/api/v1alpha1/servicerouter_types.go b/control-plane/api/v1alpha1/servicerouter_types.go
--- a/control-plane/api/v1alpha1/servicerouter_types.go	(revision 96ae13d32f01d0d3e888d3ff2240e48b4da8ae9e)
+++ b/control-plane/api/v1alpha1/servicerouter_types.go	(date 1700543968093)
@@ -288,6 +288,9 @@
 				if r.Destination.Namespace == "" {
 					in.Spec.Routes[i].Destination.Namespace = namespace
 				}
+				if r.Destination.Partition == "" {
+					in.Spec.Routes[i].Destination.Partition = common.DefaultConsulPartition
+				}
 			}
 		}
 	}
@@ -365,7 +368,7 @@
 	if !consulMeta.PartitionsEnabled {
 		for i, r := range in.Spec.Routes {
 			if r.Destination != nil {
-				if r.Destination.Partition != "" {
+				if r.Destination.Partition != "" && r.Destination.Partition != common.DefaultConsulPartition {
 					errs = append(errs, field.Invalid(path.Child("routes").Index(i).Child("destination").Child("partition"), r.Destination.Partition, `Consul Enterprise partitions must be enabled to set destination.partition`))
 				}
 			}
Index: Makefile
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/Makefile b/Makefile
--- a/Makefile	(revision 96ae13d32f01d0d3e888d3ff2240e48b4da8ae9e)
+++ b/Makefile	(date 1700536688019)
@@ -30,6 +30,7 @@
 control-plane-dev-docker: ## Build consul-k8s-control-plane dev Docker image.
 	@$(SHELL) $(CURDIR)/control-plane/build-support/scripts/build-local.sh -o linux -a $(GOARCH)
 	@docker build -t '$(DEV_IMAGE)' \
+       --platform linux/$(GOARCH) \
        --target=dev \
        --build-arg 'TARGETARCH=$(GOARCH)' \
        --build-arg 'GIT_COMMIT=$(GIT_COMMIT)' \
@@ -120,7 +121,7 @@
 acceptance-lint: ## Run linter in the control-plane directory.
 	cd acceptance; golangci-lint run -c ../.golangci.yml
 
-# For CNI acceptance tests, the calico CNI pluging needs to be installed on Kind. Our consul-cni plugin will not work 
+# For CNI acceptance tests, the calico CNI pluging needs to be installed on Kind. Our consul-cni plugin will not work
 # without another plugin installed first
 kind-cni-calico:
 	kubectl create namespace calico-system ||true
@@ -196,7 +197,7 @@
 	@echo "Installing copywrite"
 	@go install github.com/hashicorp/copywrite@latest
 endif
-	@copywrite headers --spdx "MPL-2.0" 
+	@copywrite headers --spdx "MPL-2.0"
 
 # ===========> CI Targets
 
@komapa komapa added the type/bug Something isn't working label Nov 21, 2023
@komapa
Copy link
Author

komapa commented Nov 21, 2023

If we just decide to set partition: default in our CRDs without any of the above patches, we get this error:

error when replacing "/dev/shm/2022498401": admission webhook "mutate-servicerouter.consul.hashicorp.com" denied the request: servicerouter.consul.hashicorp.com "feature-flags-http-with-migrations" is invalid: spec.routes[0].destination.partition: Invalid value: "default": Consul Enterprise partitions must be enabled to set destination.partition

So, we cannot use admin partitions because we use -WAN federation but Consul is setting the partition to default but consul-k8s does not think partitions are enabled, so it is kind of a vicious cycle.

@dhiaayachi dhiaayachi self-assigned this Nov 22, 2023
@komapa
Copy link
Author

komapa commented Nov 29, 2023

PR: #3284

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants