Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Catalog: Use EndpointSlice and propagate Kubernetes Topology information to synced consul service #3693

Merged
merged 18 commits into from Mar 19, 2024

Conversation

jukie
Copy link
Contributor

@jukie jukie commented Feb 28, 2024

Changes proposed in this PR

  • Switch to EndpointSlice vs Endpoints
  • Adds topology zone from Endpoint to Cosul service meta
  • Adds topology region from Endpoint's associated Node to Consul service meta
  • Adds get, list, and watch permissions for EndpointSlice to sync catalog's ClusterRole
  • Removes get, list, and watch permissions for Endpoints from sync catalog's ClusterRole

How I've tested this PR

  • Existing and new tests
  • Built and deployed catalog service to verify

How I expect reviewers to test this PR

  • Enable service sync and validate the registered instances have a listed meta entry for external-k8s-topology-zone and external-k8s-topology-region

Checklist

@jukie jukie force-pushed the feat/zone-topology-meta branch 2 times, most recently from 5951693 to a31b3b7 Compare February 28, 2024 09:21
@jukie jukie changed the title Draft: Use EndpointSlice and propagate zone metadata to consul service Draft: Use EndpointSlice and propagate zone topology into consul service Feb 28, 2024
@jukie jukie changed the title Draft: Use EndpointSlice and propagate zone topology into consul service Draft: Use EndpointSlice and propagate topology zone information into consul service Feb 28, 2024
@jukie jukie changed the title Draft: Use EndpointSlice and propagate topology zone information into consul service Catalog: Use EndpointSlice and propagate kubernertes topology zone information into synced consul service Feb 28, 2024
@david-yu
Copy link
Contributor

@jukie Are there similarities between this PR and #3522? Not sure if you're also able to incorporate some of those changes into this PR, if not that's ok.

@jukie
Copy link
Contributor Author

jukie commented Feb 28, 2024

@david-yu Similar end result yes, but that PR specifically targets NodePort services and uses underlying node labels. My change doesn't currently include pulling in region information as that's not directly available on EndpointSlice.

@jukie
Copy link
Contributor Author

jukie commented Feb 28, 2024

My PR now includes region info for NodePort services as well so encompasses everything from the mentioned PR plus the addition of zone info for all other service types

@david-yu
Copy link
Contributor

Thank you, I will try to have someone review sometime later next week.

@jukie
Copy link
Contributor Author

jukie commented Feb 28, 2024

@david-yu took it further to include all service types now

@jukie jukie changed the title Catalog: Use EndpointSlice and propagate kubernertes topology zone information into synced consul service Catalog: Use EndpointSlice and propagate kubernertes topology zone and region to synced consul service Feb 28, 2024
@jukie jukie changed the title Catalog: Use EndpointSlice and propagate kubernertes topology zone and region to synced consul service Catalog: Use EndpointSlice and propagate Kubernetes Topology Zone and region to synced consul service Feb 28, 2024
@jukie jukie changed the title Catalog: Use EndpointSlice and propagate Kubernetes Topology Zone and region to synced consul service Catalog: Use EndpointSlice and propagate Kubernetes Topology information to synced consul service Feb 28, 2024
@jmurret
Copy link
Contributor

jmurret commented Mar 6, 2024

👋 @jukie thank you so much for this work! It looks great. I can see that the some of the unit tests in catalog package in control-plane are failing locally and in CI (you may not have access to see CI runs). Do these pass for you? If you have any time to resolve them, that would be a great help. Otherwise, I can try to take a look at the end of the week. 🙏
Screenshot 2024-03-06 at 9 57 54 AM

@jmurret jmurret self-assigned this Mar 6, 2024
@jukie
Copy link
Contributor Author

jukie commented Mar 6, 2024

Yes these succeeded for me but I haven't re-tested after pulling in latest commits of main. Will doublecheck that in a few hours, thanks for taking a look!

@jukie
Copy link
Contributor Author

jukie commented Mar 7, 2024

Found the problem. The nodes being referenced by the endpoints in the test also need to exist so we can grab region info.

@jmurret
Copy link
Contributor

jmurret commented Mar 7, 2024

Updated somebroken helm chart tests andalso fixed logic in sync-catalog-clusterrole. TestSyncCatalogNamespaces and TestSyncCatalogNamespaces acceptance tests are currently failing. will try to take a look tomorrow.

@jmurret
Copy link
Contributor

jmurret commented Mar 19, 2024

It passed. It was called with this below. basically packages are split into groupings to go on runners and then are used to set the PACKAGES envvar here. So, all packages will have their tests run this way.

PACKAGES="api-gateway ingress-gateway sync example consul-dns"
  echo "Running packages: ${PACKAGES}"
  for PKG in ${PACKAGES}
  do
    FULLPKG="github.com/hashicorp/consul-k8s/acceptance/tests/${PKG}"
    echo "Testing package: ${FULLPKG}"
    go run gotest.tools/gotestsum@v1.11.0 \
      --format=github-actions \
      --jsonfile="jsonfile-${PKG////-}" \
      --packages="${PKG}" \
      --junitfile="/tmp/test-results/acceptance/${PKG}-gotestsum-report.xml" \
      -- "${FULLPKG}" -p 1 -timeout 2h -failfast \
      -use-kind \
      -kube-contexts="kind-dc1,kind-dc2,kind-dc[3](https://github.com/hashicorp/consul-k8s-workflows/actions/runs/8327109167/job/22784276823#step:32:3),kind-dc[4](https://github.com/hashicorp/consul-k8s-workflows/actions/runs/8327109167/job/22784276823#step:32:4)" \
      -enable-enterprise \
      -enable-multi-cluster \
      -debug-directory=/tmp/test-results/debug \
      -consul-image=docker.mirror.hashicorp.services/hashicorppreview/consul-enterprise:1.19-dev \
      -consul-k8s-image=docker.io/hashicorppreview/consul-k8s-control-plane:1.[5](https://github.com/hashicorp/consul-k8s-workflows/actions/runs/8327109167/job/22784276823#step:32:5).0-dev-pr-5606e615d94262ac878a0b3be374b7f19565256a \
      -consul-dataplane-image=docker.mirror.hashicorp.services/hashicorppreview/consul-dataplane:1.5-dev \
      -enable-enterprise -enable-multi-cluster -enable-transparent-proxy
  done

@jukie
Copy link
Contributor Author

jukie commented Mar 19, 2024

@kolorful did you have a use case for region or would topology zone be good enough?

@kolorful
Copy link
Contributor

kolorful commented Mar 19, 2024

@jukie does that make any difference? I don't have usecase region yet, but figured why not include region when adding az cause we might need it later on.

@jukie
Copy link
Contributor Author

jukie commented Mar 19, 2024

Main difference is the zone information being available on the endpoints directly. It's probably not a big deal but was curious if there was any concern over the amount of extra api server calls to get each node. That already happens for NodePort type services so again, probably not a big deal to expand that to all service types.

@jmurret
Copy link
Contributor

jmurret commented Mar 19, 2024

@jukie thanks for raising your scalability concerns about also grabbing node information while processing endpoint slices. I think we would want a way to turn that off via configuration/helm chart since we don't have load tests asserting its performance. I propose we remove that from this PR since there is no current requirement for Region and when it is needed we can add the appropriate helm configuration. SOund ok?

This PR looks great. I think the two things needed to approve and merge are:

  • remove the Region/node lookups
  • make the endpointslice pointer change that you mentioned.

Thanks again for your efforts on this!

@jukie
Copy link
Contributor Author

jukie commented Mar 19, 2024

Done. I've also taken another stab at dropping endpoints permission from the catalog cluster role because I really don't think we need it (all tests are passing for me without) but you can add that back in again if preferred.

@jukie
Copy link
Contributor Author

jukie commented Mar 19, 2024

@jmurret could please kick off the acceptance tests again?

@kolorful
Copy link
Contributor

@jukie I see, thanks for explaining. I think not having region info is fine, because usually it's part of the zone info anyway.

Copy link
Contributor

@jmurret jmurret left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀 🙏 thank you @jukie for sticking with this and getting it done. We appreciate it!

@jukie
Copy link
Contributor Author

jukie commented Mar 19, 2024

Awesome! Will this automatically land on the -dev release for the image and helm charts?

@jukie jukie deleted the feat/zone-topology-meta branch March 19, 2024 21:03
jmurret added a commit that referenced this pull request Mar 19, 2024
…ion to synced consul service (#3693)

* Use EndpointSlice and propagate zone metadata to consul service

* Fix tests

* Add test for zone metadata

* Cleanup and changelog entry

* Fix clusterrole permissions and type on Informer

* Include region info for NodePort services

* Include topology region for all service types

* Update release note

* Fix tests

* fix sync-catalog-clusterrole and tests

* fix stash conflict

* adding endpoints permission back to sync catalog since it still uses it.

* Fix endpointslice map

* Fix topology region

* Remove region lookups, remove endpoints permissions, use pointers for endpointslice map

* Drop region test

---------

Co-authored-by: John Murret <john.murret@hashicorp.com>
jmurret added a commit that referenced this pull request Mar 19, 2024
…ion to synced consul service (#3693)

* Use EndpointSlice and propagate zone metadata to consul service

* Fix tests

* Add test for zone metadata

* Cleanup and changelog entry

* Fix clusterrole permissions and type on Informer

* Include region info for NodePort services

* Include topology region for all service types

* Update release note

* Fix tests

* fix sync-catalog-clusterrole and tests

* fix stash conflict

* adding endpoints permission back to sync catalog since it still uses it.

* Fix endpointslice map

* Fix topology region

* Remove region lookups, remove endpoints permissions, use pointers for endpointslice map

* Drop region test

---------

Co-authored-by: John Murret <john.murret@hashicorp.com>
jmurret added a commit that referenced this pull request Mar 19, 2024
…ion to synced consul service (#3693)

* Use EndpointSlice and propagate zone metadata to consul service

* Fix tests

* Add test for zone metadata

* Cleanup and changelog entry

* Fix clusterrole permissions and type on Informer

* Include region info for NodePort services

* Include topology region for all service types

* Update release note

* Fix tests

* fix sync-catalog-clusterrole and tests

* fix stash conflict

* adding endpoints permission back to sync catalog since it still uses it.

* Fix endpointslice map

* Fix topology region

* Remove region lookups, remove endpoints permissions, use pointers for endpointslice map

* Drop region test

---------

Co-authored-by: John Murret <john.murret@hashicorp.com>
@jmurret
Copy link
Contributor

jmurret commented Mar 19, 2024

@jukie there is no -dev channel/artifact for dev releases. you'll have to use the helm chart on main until the next 1.4 patch release. the values.yaml has a link to an internal dev preview that I think you should have access to:

image: docker.mirror.hashicorp.services/hashicorppreview/consul:1.19-dev

jmurret added a commit that referenced this pull request Mar 19, 2024
…ogy information to synced consul service into release/1.4.x (#3782)

* backport of commit b8e5ece

* backport of commit ee26768

* backport of commit 4a3c209

* backport of commit c11b034

* backport of commit 093826a

* backport of commit 75810ab

* backport of commit f0f1762

* backport of commit 395c4fe

* backport of commit 73ffbcb

* backport of commit cc5d60c

* backport of commit 0d8b7ae

* backport of commit 5dcb453

* backport of commit 09506ba

* Catalog: Use EndpointSlice and propagate Kubernetes Topology information to synced consul service (#3693)

* Use EndpointSlice and propagate zone metadata to consul service

* Fix tests

* Add test for zone metadata

* Cleanup and changelog entry

* Fix clusterrole permissions and type on Informer

* Include region info for NodePort services

* Include topology region for all service types

* Update release note

* Fix tests

* fix sync-catalog-clusterrole and tests

* fix stash conflict

* adding endpoints permission back to sync catalog since it still uses it.

* Fix endpointslice map

* Fix topology region

* Remove region lookups, remove endpoints permissions, use pointers for endpointslice map

* Drop region test

---------

Co-authored-by: John Murret <john.murret@hashicorp.com>

---------

Co-authored-by: jukie <10012479+Jukie@users.noreply.github.com>
Co-authored-by: John Murret <john.murret@hashicorp.com>
jmurret added a commit that referenced this pull request Mar 19, 2024
…ogy information to synced consul service into release/1.2.x (#3780)

* backport of commit b8e5ece

* backport of commit ee26768

* backport of commit 4a3c209

* backport of commit c11b034

* backport of commit 093826a

* backport of commit 75810ab

* backport of commit f0f1762

* backport of commit 395c4fe

* backport of commit 73ffbcb

* backport of commit cc5d60c

* backport of commit 0d8b7ae

* backport of commit 5dcb453

* backport of commit 09506ba

* Catalog: Use EndpointSlice and propagate Kubernetes Topology information to synced consul service (#3693)

* Use EndpointSlice and propagate zone metadata to consul service

* Fix tests

* Add test for zone metadata

* Cleanup and changelog entry

* Fix clusterrole permissions and type on Informer

* Include region info for NodePort services

* Include topology region for all service types

* Update release note

* Fix tests

* fix sync-catalog-clusterrole and tests

* fix stash conflict

* adding endpoints permission back to sync catalog since it still uses it.

* Fix endpointslice map

* Fix topology region

* Remove region lookups, remove endpoints permissions, use pointers for endpointslice map

* Drop region test

---------

Co-authored-by: John Murret <john.murret@hashicorp.com>

---------

Co-authored-by: jukie <10012479+Jukie@users.noreply.github.com>
Co-authored-by: John Murret <john.murret@hashicorp.com>
jmurret added a commit that referenced this pull request Mar 19, 2024
…ogy information to synced consul service into release/1.3.x (#3781)

* backport of commit b8e5ece

* backport of commit ee26768

* backport of commit 4a3c209

* backport of commit c11b034

* backport of commit 093826a

* backport of commit 75810ab

* backport of commit f0f1762

* backport of commit 395c4fe

* backport of commit 73ffbcb

* backport of commit cc5d60c

* backport of commit 0d8b7ae

* backport of commit 5dcb453

* backport of commit 09506ba

* Catalog: Use EndpointSlice and propagate Kubernetes Topology information to synced consul service (#3693)

* Use EndpointSlice and propagate zone metadata to consul service

* Fix tests

* Add test for zone metadata

* Cleanup and changelog entry

* Fix clusterrole permissions and type on Informer

* Include region info for NodePort services

* Include topology region for all service types

* Update release note

* Fix tests

* fix sync-catalog-clusterrole and tests

* fix stash conflict

* adding endpoints permission back to sync catalog since it still uses it.

* Fix endpointslice map

* Fix topology region

* Remove region lookups, remove endpoints permissions, use pointers for endpointslice map

* Drop region test

---------

Co-authored-by: John Murret <john.murret@hashicorp.com>

---------

Co-authored-by: jukie <10012479+Jukie@users.noreply.github.com>
Co-authored-by: John Murret <john.murret@hashicorp.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants