Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[e2e failure] [sig-network] Networking Granular Checks: Services [Slow] should function for client IP based session affinity: udp #54524

Closed
spiffxp opened this issue Oct 24, 2017 · 13 comments
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/network Categorizes an issue or PR as relevant to SIG Network.
Milestone

Comments

@spiffxp
Copy link
Member

spiffxp commented Oct 24, 2017

/priority test-failure
/priority critical-urgent
/sig network

This test case has been failing for a while and affects a number of jobs: triage report

This is affecting multiple jobs on the release-master-blocking dashboard, and prevents us from cutting 1.9.0-alpha.2 (kubernetes/sig-release#22). Is there work ongoing to bring this job back to green?

Possibly related:

@k8s-ci-robot k8s-ci-robot added priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. sig/network Categorizes an issue or PR as relevant to SIG Network. labels Oct 24, 2017
@spiffxp
Copy link
Member Author

spiffxp commented Oct 25, 2017

@kubernetes/sig-network-test-failures

@spiffxp
Copy link
Member Author

spiffxp commented Oct 25, 2017

triage cluster 53cdeea7fa10ed3a866c

/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/network/networking.go:249
Oct 18 09:37:05.359: test session affinity, cost time: 1m45.52479004s
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/network/networking.go:248

triage cluster 87b694524eb439d53a6f

/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/network/networking.go:227
Oct 19 19:52:48.646: Expect endpoints: map[netserver-1:{}], got: map[]
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/network/networking.go:245

@spiffxp
Copy link
Member Author

spiffxp commented Oct 25, 2017

/priority failing-test

@k8s-ci-robot k8s-ci-robot added the kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. label Oct 25, 2017
@cmluciano
Copy link

cc @kubernetes/sig-network-bugs

@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Oct 25, 2017
@MrHohn
Copy link
Member

MrHohn commented Oct 25, 2017

Fix #53760 is opened but waiting for approval.

@spiffxp
Copy link
Member Author

spiffxp commented Oct 26, 2017

Of the two triage clusters I linked above, the second one is still an issue. Maybe we didn't catch everything IPv6-related with kubernetes/test-infra#5095 ?

@m1093782566
Copy link
Contributor

/assign

k8s-github-robot pushed a commit that referenced this issue Oct 31, 2017
Automatic merge from submit-queue (batch tested with PRs 54572, 54686). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Fix service session affinity e2e failure cases

**What this PR does / why we need it**:

Fix service session affinity e2e failure cases - debuging...

**Which issue this PR fixes**:

xref #54571 #54524

**Special notes for your reviewer**:

**Release note**:

```release-note
NONE
```

/sig network
@spiffxp
Copy link
Member Author

spiffxp commented Oct 31, 2017

/priority important-soon
ci-kubernetes-e2e-gci-gke-alpha-features is the only remaining failing job, which is not on release-master-blocking

Seems like kubernetes/test-infra#5095 didn't catch this job and a followup PR would be appropriate

/assign @danehans @MrHohn
since they took care of this failure for the gce job

@k8s-ci-robot k8s-ci-robot added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Oct 31, 2017
@spiffxp
Copy link
Member Author

spiffxp commented Oct 31, 2017

/remove-priority critical-urgent

@k8s-ci-robot k8s-ci-robot removed the priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. label Oct 31, 2017
@MrHohn
Copy link
Member

MrHohn commented Nov 5, 2017

Sent out another fix #55122.

@spiffxp
Copy link
Member Author

spiffxp commented Nov 7, 2017

/status approved-for-milestone
tracking this for kubernetes/sig-release#27

@k8s-github-robot
Copy link

[MILESTONENOTIFIER] Milestone Issue Current

@MrHohn @danehans @m1093782566 @spiffxp

Issue Labels
  • sig/network: Issue will be escalated to these SIGs if needed.
  • priority/important-soon: Escalate to the issue owners and SIG owner; move out of milestone after several unsuccessful escalation attempts.
  • kind/bug: Fixes a bug discovered during the current release.
Help

k8s-github-robot pushed a commit that referenced this issue Nov 7, 2017
Automatic merge from submit-queue (batch tested with PRs 55114, 52976, 54871, 55122, 55140). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Don't share nodePort service in session affinity tests

**What this PR does / why we need it**:
From #54524, #54571.

Spent sometime to dig into it today, found this test is flaky mostly because it sends out service requests before kube-proxy reacts on the service session affinity update, hence multiple endpoints are responding instead of one. It is more flaky in alpha CIs probably due to different test sequences.

This PR creates a separate service with `sessionAffinity=ClientIP` so there wouldn't be a race between test begins and kube-proxy reacts. On the other hand, it also seems inappropriate to tweak the`config.NodePortService`, which is shared by other networking tests.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes # (will mark them fixed later).

**Special notes for your reviewer**:
/assign @m1093782566 @bowei 
cc @spiffxp

**Release note**:

```release-note
NONE

```
@MrHohn
Copy link
Member

MrHohn commented Nov 7, 2017

Tests are passing now:

Closing this issue.
/close

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/network Categorizes an issue or PR as relevant to SIG Network.
Projects
None yet
Development

No branches or pull requests

7 participants