Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Service and WorkloadEntry updates #27184

Merged
merged 12 commits into from
Sep 15, 2020

Conversation

howardjohn
Copy link
Member

@howardjohn howardjohn commented Sep 9, 2020

PR is split into many commits

  1. does some testing refactoring to support this, and adds 2 failing tests
  2. Fixes bug: workloadentry + service stay on incorrect port #27151, enables the test
  3. Fixes Update workload entry causes duplicate address #27183, enables the test
  4. Fixes pod updates for SE selects pod
  5. Fixes SE updates for SE selects pod
  6. Fixes WorkloadEntry mis attributed to kubernetes service #27185

Somehow we managed to have 4 different bugs covering all 4 update scenarios. They should be all fixed and tested now.

cc @kyessenov @JimmyCYJ

@howardjohn howardjohn requested review from a team as code owners September 9, 2020 22:23
@googlebot googlebot added the cla: yes Set by the Google CLA bot to indicate the author of a PR has signed the Google CLA. label Sep 9, 2020
@istio-testing istio-testing added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Sep 9, 2020
@howardjohn howardjohn requested review from a team as code owners September 9, 2020 22:24
@@ -206,10 +206,12 @@ type Controller struct {
// we run through the label selectors here to pick only ones that we need.
// Only nodes with ExternalIP addresses are included in this map !
nodeInfoMap map[string]kubernetesNode
// externalNameSvcInstanceMap stores hostname ==> instance, is used to store instances for ExternalName k8s services
// externalNameSvcInstanceMap stores hostname ==2> instance, is used to store instances for ExternalName k8s services
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo?

if err := cg.ServiceEntryRegistry.AppendWorkloadHandler(k8s.WorkloadInstanceHandler); err != nil {
t.Fatal(err)
}
if err := k8s.AppendWorkloadHandler(cg.ServiceEntryRegistry.WorkloadInstanceHandler); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You will may need to call SyncAll again after this. We call k8s.Run() inside of its constructor, so the WorkloadHandler could miss the initial SyncAll from inside the k8s controller.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ended up causing a race so I think Ill pass it the handler in the constructor

Copy link
Contributor

@stevenctl stevenctl Sep 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either that, or fake kube controller should have Run called outside the constructor and share the same stop chan as the other controllers.

@iandyh
Copy link
Contributor

iandyh commented Sep 10, 2020

Thanks for the fix! Will also take a look later.

Hopefully it can address the issues described in #25110

@howardjohn
Copy link
Member Author

/retest

too see if its a flake

@howardjohn
Copy link
Member Author

/retest

1 similar comment
@howardjohn
Copy link
Member Author

/retest

Copy link
Contributor

@ramaraochavali ramaraochavali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with some minor comments. @hzxuzhonghu Can you also PTAL?

@@ -708,8 +712,9 @@ func (c *Controller) serviceInstancesFromWorkloadInstances(svc *model.Service, r
workloadInstancesExist = len(c.workloadInstancesByIP) > 0
c.RUnlock()

// Only select internal Kubernetes services with selectors
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This comment and code below is little confusing. It took a while for me to understand we are doing an exclusion here rather than selecting

@@ -904,7 +909,14 @@ func (c *Controller) WorkloadInstanceHandler(si *model.WorkloadInstance, event m
case model.EventDelete:
delete(c.workloadInstancesByIP, si.Endpoint.Address)
default: // add or update
// Check to see if the workload entry changed. If it did, clear the old entry
k := si.Name + "~" + si.Namespace
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to this but - We are using "~" separator at multiple places in the code, May be we should move that to some constant like ISTIO_SEPARATOR :-) or some thing and use it every where so that in future for some odd reason if we have to change, we can change easilt

@@ -210,6 +210,8 @@ type Controller struct {
externalNameSvcInstanceMap map[host.Name][]*model.ServiceInstance
// workload instances from workload entries - map of ip -> workload instance
workloadInstancesByIP map[string]*model.WorkloadInstance
// Stores a map of workload instance name/namespace to address
workloadInstancesIPsByName map[string]string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: workloadInstancesIPsByName-> workloadInstanceIPByName /workloadInstanceAddressByName?

// Check to see if the workload entry changed. If it did, clear the old entry
k := si.Name + "~" + si.Namespace
existing := c.workloadInstancesIPsByName[k]
if existing != si.Endpoint.Address {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think in add case, delete will always be executed which is redundant may be only do this for update case?.

pilot/pkg/serviceregistry/serviceentry/servicediscovery.go Outdated Show resolved Hide resolved
@howardjohn
Copy link
Member Author

/retest

@istio-testing istio-testing added the needs-rebase Indicates a PR needs to be rebased before being merged label Sep 14, 2020
@istio-testing istio-testing removed the needs-rebase Indicates a PR needs to be rebased before being merged label Sep 14, 2020
@istio-testing istio-testing merged commit 195a47c into istio:master Sep 15, 2020
@istio-testing
Copy link
Collaborator

In response to a cherrypick label: #27184 failed to apply on top of branch "release-1.7":

Applying: Updates to the testing infra to support testing this change
Using index info to reconstruct a base tree...
A	pilot/pkg/networking/core/v1alpha3/fake.go
M	pilot/pkg/serviceregistry/kube/controller/controller.go
M	pilot/pkg/serviceregistry/kube/controller/fake.go
M	pilot/pkg/serviceregistry/serviceregistry_test.go
M	pilot/pkg/xds/ads_test.go
M	pilot/pkg/xds/fake.go
M	pilot/pkg/xds/xds_test.go
A	pilot/test/xdstest/extract.go
Falling back to patching base and 3-way merge...
CONFLICT (modify/delete): pilot/test/xdstest/extract.go deleted in HEAD and modified in Updates to the testing infra to support testing this change. Version Updates to the testing infra to support testing this change of pilot/test/xdstest/extract.go left in tree.
Auto-merging pilot/pkg/xds/xds_test.go
CONFLICT (content): Merge conflict in pilot/pkg/xds/xds_test.go
Auto-merging pilot/pkg/xds/fake.go
CONFLICT (content): Merge conflict in pilot/pkg/xds/fake.go
Auto-merging pilot/pkg/xds/ads_test.go
CONFLICT (content): Merge conflict in pilot/pkg/xds/ads_test.go
Auto-merging pilot/pkg/serviceregistry/serviceregistry_test.go
CONFLICT (content): Merge conflict in pilot/pkg/serviceregistry/serviceregistry_test.go
Auto-merging pilot/pkg/serviceregistry/kube/controller/fake.go
CONFLICT (content): Merge conflict in pilot/pkg/serviceregistry/kube/controller/fake.go
Auto-merging pilot/pkg/serviceregistry/kube/controller/controller.go
CONFLICT (modify/delete): pilot/pkg/networking/core/v1alpha3/fake.go deleted in HEAD and modified in Updates to the testing infra to support testing this change. Version Updates to the testing infra to support testing this change of pilot/pkg/networking/core/v1alpha3/fake.go left in tree.
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0001 Updates to the testing infra to support testing this change
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

@istio-testing
Copy link
Collaborator

In response to a cherrypick label: new issue created for failed cherrypick: #27304

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes Set by the Google CLA bot to indicate the author of a PR has signed the Google CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
9 participants