Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ambient: service entry initial impl #45621

Merged
merged 27 commits into from Jul 7, 2023

Conversation

GregHanson
Copy link
Member

@GregHanson GregHanson commented Jun 23, 2023

Adds support for a subset of the service entry api to ambient mesh. The api supported is as follows:

  • addresses (the VIPS)
  • auto VIP address allocation (addresses is optional)
  • endpoints (static only)
  • location (prefer mesh external for now for DIRECT passthrough in ztunnel)
  • workloadSelector (selects pods and workload entries)

notable exclusions:

To help us figure out who should review this PR, please put an X in all the areas that this PR affects.

  • [ X ] Ambient
  • Configuration Infrastructure
  • Docs
  • Installation
  • Networking
  • Performance and Scalability
  • Policies and Telemetry
  • Security
  • Test and Release
  • User Experience
  • Developer Infrastructure

Adds support for a subset of the service entry api to ambient mesh.
The api supported is as follows:
- addresses (the VIPS)
- auto VIP address allocation (addresses is optional)
- endpoints (static only)
- location (prefer mesh external for now for DIRECT passthrough in ztunnel)
- workloadSelector (selects pods and workload entries)

notable exclusions:
- hosts (coming per istio/ztunnel#536)
- exportTo
- resolution (everything is static)
- subjectAltNames (pending ztunnel support)

Signed-off-by: Kevin Dorosh <kevin.dorosh@solo.io>
@GregHanson GregHanson requested review from a team as code owners June 23, 2023 14:49
@istio-policy-bot istio-policy-bot added the area/ambient Issues related to ambient mesh label Jun 23, 2023
@istio-testing istio-testing added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Jun 23, 2023
c.ambientIndex.byPod[networkAddr] = newWl
}
c.ambientIndex.byUID[c.generatePodUID(pod)] = newWl
updates[model.ConfigKey{Kind: kind.Address, Name: newWl.ResourceName()}] = struct{}{}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Insert should be used here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and L480 L492

Copy link
Contributor

@kdorosh kdorosh Jun 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed in 946171f

@istio-testing istio-testing added the needs-rebase Indicates a PR needs to be rebased before being merged label Jun 25, 2023
Comment on lines 259 to 261
if m.serviceEntryController != nil {
m.serviceEntryController.AppendServiceHandler(kubeRegistry.ServiceEntryHandler)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we need to append this service entry handler regardless of features.EnableK8SServiceSelectWorkloadEntries (see line 247)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed in 7055f90

framework.NewTest(t).
Features("traffic.ambient").
Run(func(t framework.TestContext) {
t.Skip("this will work if we set DNS_AUTO_ALLOCATE=true and once we have https://github.com/istio/ztunnel/pull/536")
Copy link
Contributor

@kdorosh kdorosh Jun 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now that this PR has merged we should be able to get the hostname resolved trivially (maybe a small CNI change to ensure DNS udp+tcp traffic is captured by ztunnel?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tested this locally. it worked with some minor changes to ztunnel, is sort of blocked by istio/ztunnel#582 or other ztunnel work for now

// RequestHeader=Accept:*/*
// RequestHeader=User-Agent:curl/7.81.0
// Hostname=uncaptured-v1-868c9b59b5-rxvfq
Check: check.BodyContains(`Hostname=uncaptured-v`),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit; stronger assertion to have both v1 +v2 checks serially to ensure we are load balancing

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed

// RequestHeader=Accept:*/*
// RequestHeader=User-Agent:curl/7.81.0
// Hostname=uncaptured-v1-868c9b59b5-rxvfq
Check: check.BodyContains(`Hostname=uncaptured-v`),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit; stronger assertion to have both v1 +v2 checks serially to ensure we are load balancing

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed

Comment on lines 626 to 627
// The service entry spec (if any) this service was derived from.
ServiceEntry *networking.ServiceEntry
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this part in particular is the most awkward part of this PR; open to other ideas on how to integrate with existing VIP auto-allocation and also allow access to the service entry fields we need in the ambient index. @howardjohn for any ideas

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another idea im having. create a new handler interface and pass the original SE along

currentServiceEntry := curr.Spec.(*networking.ServiceEntry)
cs := convertServices(curr)
with the converted service (unchanged in shape) cc @stevenctl

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah unfortunately this only works in one of the two places we need this.. the other one is during VIP re-auto allocation where we don't have the parent SEs. this is why I added this to the Service itself

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will hopefully come back to this later with a more productive comment but my initial reaction is pretty negative to this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After looking through things I don't have any obvious suggestions but I feel highly uncomfortable with this.

We were already maxing out our tech debt in the ambient controller v1. Then we added WE support and added a lot more (basically 2x). Now SE doubles that again. And all of that is mostly fine since its all isolated to the mess of ambientindex.go -- but I feel very uncomfortable having this leak out into our fundamental core types that risks permeating through the rest of Istio

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i also was not a fan, was hoping you'd have a concrete idea or suggestion. I suppose I could just switch this to use a k8s controller and mirror the way we do it for WEs; and we can punt for now on auto allocation in ambient. my biggest concern there being, if and when we do want to support it, we may need to refactor a fair bit. in fact, i already feel that way; it feels more natural to reuse model.Service in ambient index if we can rather than treat SE and k8s services differently

how do you feel about the mergeability of this PR without the vip auto allocation in mind? do you think we will definitely want to support that at some point in the future? (I know personally I don't use it often but wanted to plan to support for backwards compat)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think the whole thing needs a rewrite -- which I have something in mind for (#45426 for ambient only, https://docs.google.com/document/d/1-ywpCnOfubqg7WAXSPf4YgbaFDBEU9HIqMWcxZLhzwE/edit for all of Istio) but I don't want to block on that. I think its plausible we could get something reasonable in the next 3 months or so - I don't want to block anything on it but I don't mind a niche feature like auto allocation waiting

it would probably be good long term to support auto_allocation but it is not GA I think so its not necessarily strictly required. I personally think its a bad idea and we should kill it (#43952 (comment)).

I also wouldn't mind if it was just inefficient for now.

I guess in order of preference:

  1. Remove the need for this and have no auto alloc support for now
  2. Remove the need for this and have inefficient auto alloc support for now
  3. Just do it anyways and feel bad about it

I can live with any of them

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I was thinking is to make the handler pass Service+ServiceEntry but its kind of hairy to pipe that through the whole serviceentry controller

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

decided to just remove auto vip allocation per discussion above in 37e605d

Comment on lines 417 to 418
idx.mu.Lock()
defer idx.mu.Unlock()
Copy link
Contributor

@kdorosh kdorosh Jun 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit; move this lock below the nil check on line 420

Copy link
Contributor

@kdorosh kdorosh Jun 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed in 2473dc2

Comment on lines 850 to 877
for _, svc := range c.ambientIndex.servicesMap {

if svc.Attributes.ServiceEntry == nil {
// if we are here then this is dev error
log.Warn("dev error: service entry spec is nil; it should have been populated by the service entry handler")
continue
}

if svc.Attributes.ServiceEntry.WorkloadSelector == nil {
// nothing to do. we construct the ztunnel config if `endpoints` are provided in the service entry handler
continue
}

if svc.Attributes.ServiceEntry.Endpoints != nil {
// it is an error to provide both `endpoints` and `workloadSelector` in a service entry
continue
}

sel := svc.Attributes.ServiceEntry.WorkloadSelector.Labels
if !labels.Instance(sel).SubsetOf(pod.Labels) {
continue
}

vipsToPorts := getVIPsFromServiceEntry(svc, nil)
for vip, ports := range vipsToPorts {
vips[vip] = ports
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be DRY'ed with the other implementation (which is more correct, since it has the necessary vip := c.network.String() + "/" + vip // services must be on our network, don't inherit from workload lines)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed in 5a0ee02

vipsToPorts := getVIPsFromServiceEntry(oldServiceEntry, nil)
for vip := range vipsToPorts {
// we also need to remove the VIP from any pod that has it (e.g. from service entry `workloadSelector`)
for nwAddr := range a.byPod {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we also need to add this logic for standalone WEs that are just selected by SEs. we should update the unit tests to cover this case

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the unit tests actually already covered this scenario, and this code can be removed since pod cleanup is handled during re-construction of the pod workloads on SE update. removed dead code in 0eb683e

@istio-testing istio-testing removed the needs-rebase Indicates a PR needs to be rebased before being merged label Jun 30, 2023
Kevin Dorosh added 2 commits June 30, 2023 17:43
Signed-off-by: Kevin Dorosh <kevin.dorosh@solo.io>
Signed-off-by: Kevin Dorosh <kevin.dorosh@solo.io>
@istio-testing istio-testing added the needs-rebase Indicates a PR needs to be rebased before being merged label Jun 30, 2023
Kevin Dorosh added 2 commits June 30, 2023 17:47
@istio-testing istio-testing removed the needs-rebase Indicates a PR needs to be rebased before being merged label Jul 3, 2023
@@ -617,7 +627,9 @@ func (s *Controller) Cluster() cluster.ID {
}

// AppendServiceHandler adds service resource event handler. Service Entries does not use these handlers.
func (s *Controller) AppendServiceHandler(_ model.ServiceHandler) {}
func (s *Controller) AppendServiceHandler(h model.ServiceHandler) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type ControllerHandlers struct {

We have a util class to dedupe most of the handler stuff.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was necessary so the service entry controller could fire events on new auto vip allocation

@istio-testing istio-testing added the needs-rebase Indicates a PR needs to be rebased before being merged label Jul 5, 2023
}
}

workloadEntries := c.getAllControllerWorkloadEntries()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really understand this logic. Shouldn't it be more like getWorkloadEntriesInService, possibly making that func reusable with labels.Selector?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same with pods.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed in eb05c10

@istio-testing istio-testing removed the needs-rebase Indicates a PR needs to be rebased before being merged label Jul 6, 2023
Signed-off-by: Kevin Dorosh <kevin.dorosh@solo.io>
Signed-off-by: Kevin Dorosh <kevin.dorosh@solo.io>
Kevin Dorosh added 4 commits July 6, 2023 11:45
Signed-off-by: Kevin Dorosh <kevin.dorosh@solo.io>
Signed-off-by: Kevin Dorosh <kevin.dorosh@solo.io>
Signed-off-by: Kevin Dorosh <kevin.dorosh@solo.io>
Signed-off-by: Kevin Dorosh <kevin.dorosh@solo.io>
@kdorosh kdorosh requested a review from howardjohn July 6, 2023 17:42
@kdorosh
Copy link
Contributor

kdorosh commented Jul 6, 2023

@howardjohn PTAL

Comment on lines 626 to 627
// The service entry spec (if any) this service was derived from.
ServiceEntry *networking.ServiceEntry
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will hopefully come back to this later with a more productive comment but my initial reaction is pretty negative to this

@@ -340,7 +354,7 @@ func (a *AmbientIndexImpl) extractWorkload(p *v1.Pod, c *Controller) *model.Work

policies := c.selectorAuthorizationPolicies(p.Namespace, p.Labels)
policies = append(policies, c.convertedSelectorPeerAuthentications(p.Namespace, p.Labels)...)
wl := c.constructWorkload(p, waypoint, policies)
wl := c.constructWorkload(p, waypoint, policies, a.servicesMap)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we could just move constructWorkload to ambientIndex and we don't need to pass a.servicesMap since it can access it directly?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed in 99b26b2

}
}

func getWorkloadServices(serviceEntries map[types.NamespacedName]*model.Service, workloadEntry *v1alpha3.WorkloadEntry,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: getWorkloadServiceEntries may be more clear?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed in 04b7435

a.servicesMap[serviceEntryNamespacedName] = svc
}

sel := klabels.Set(svc.Attributes.ServiceEntry.WorkloadSelector.GetLabels()).AsSelectorPreValidated()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: us ValidatedSetSelector instead, its (much) faster

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed in b632775

Comment on lines 626 to 627
// The service entry spec (if any) this service was derived from.
ServiceEntry *networking.ServiceEntry
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After looking through things I don't have any obvious suggestions but I feel highly uncomfortable with this.

We were already maxing out our tech debt in the ambient controller v1. Then we added WE support and added a lot more (basically 2x). Now SE doubles that again. And all of that is mostly fine since its all isolated to the mess of ambientindex.go -- but I feel very uncomfortable having this leak out into our fundamental core types that risks permeating through the rest of Istio

Kevin Dorosh added 4 commits July 6, 2023 18:01
Signed-off-by: Kevin Dorosh <kevin.dorosh@solo.io>
Signed-off-by: Kevin Dorosh <kevin.dorosh@solo.io>
Signed-off-by: Kevin Dorosh <kevin.dorosh@solo.io>
Signed-off-by: Kevin Dorosh <kevin.dorosh@solo.io>
@kdorosh
Copy link
Contributor

kdorosh commented Jul 6, 2023

addressed all feedback again; most importantly, I removed the auto vip allocation support since it was the root cause of tech debt escaping into core istio types (model.Service)

PTAL @howardjohn

@kdorosh
Copy link
Contributor

kdorosh commented Jul 6, 2023

/test unit-tests

@kdorosh kdorosh mentioned this pull request Jul 7, 2023
@istio-testing istio-testing merged commit 7d6139b into istio:master Jul 7, 2023
27 checks passed
@kdorosh kdorosh deleted the ambient-service-entries branch July 7, 2023 16:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ambient Issues related to ambient mesh size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants