New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move equivalence class based scheduling to beta #58222

Open
misterikkit opened this Issue Jan 12, 2018 · 18 comments

Comments

Projects
None yet
9 participants
@misterikkit
Contributor

misterikkit commented Jan 12, 2018

Feature Description

This feature was already implemented by @resouer and is currently in alpha. There is some remaining work to get the feature ready for beta & ga.

Original issue:

Remaining issues (will be periodically updated):

  • Full integration test is required: #56104
  • Amend and enable e2e test: #56577 (comment)
  • Fix potential data race: TODO
  • Evaluate and fix potential improvements:
    • speed up invalidate operation (or not)TODO
    • minimize invalidation scope, TODO-1 TODO-2
  • Compute pod equivalence based on pod properties and not OwnerReference.

/sig scheduling
/kind feature

@k8s-ci-robot

This comment has been minimized.

Show comment
Hide comment
@k8s-ci-robot

k8s-ci-robot Jan 12, 2018

Contributor

@misterikkit: Reiterating the mentions to trigger a notification:
@kubernetes/sig-scheduling-feature-requests

In response to this:

Feature Description

This feature was already implemented by @resouer and is currently in alpha. There is some remaining work to get the feature ready for beta & ga.

Original issue:

Remaining issues (will be periodically updated):

  • Full integration test is required: #56104
  • Amend and enable e2e test: #56577 (comment)
  • Fix potential data race: TODO
  • Evaluate and fix potential improvements:
    • speed up invalidate operation (or not)TODO
    • minimize invalidation scope, TODO-1 TODO-2
  • Compute pod equivalence based on pod properties and not OwnerReference.

/sig scheduling
/kind feature

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Contributor

k8s-ci-robot commented Jan 12, 2018

@misterikkit: Reiterating the mentions to trigger a notification:
@kubernetes/sig-scheduling-feature-requests

In response to this:

Feature Description

This feature was already implemented by @resouer and is currently in alpha. There is some remaining work to get the feature ready for beta & ga.

Original issue:

Remaining issues (will be periodically updated):

  • Full integration test is required: #56104
  • Amend and enable e2e test: #56577 (comment)
  • Fix potential data race: TODO
  • Evaluate and fix potential improvements:
    • speed up invalidate operation (or not)TODO
    • minimize invalidation scope, TODO-1 TODO-2
  • Compute pod equivalence based on pod properties and not OwnerReference.

/sig scheduling
/kind feature

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@resouer

This comment has been minimized.

Show comment
Hide comment
@resouer

resouer Jan 12, 2018

Member

Thanks @misterikkit

We may need to start with

Compute pod equivalence based on pod properties and not OwnerReference.

which looks like a fundamental issue?

(I'm currently OOO and will back next week, sorry for response delay)

Member

resouer commented Jan 12, 2018

Thanks @misterikkit

We may need to start with

Compute pod equivalence based on pod properties and not OwnerReference.

which looks like a fundamental issue?

(I'm currently OOO and will back next week, sorry for response delay)

@resouer resouer added this to the v1.11 milestone Jan 12, 2018

@misterikkit

This comment has been minimized.

Show comment
Hide comment
@misterikkit

misterikkit Jan 12, 2018

Contributor

@resouer Thanks. For changing the equivalence computation, the existing logic is pretty well isolated. I should only need to change EquivalencePodGenerator.getEquivalencePod in pkg/scheduler/algorithm/predicates/utils.go.

However, I propose to refactor a lot of that logic to simplify it and make code updates easier. I'll try to send a PR for that soon.

Contributor

misterikkit commented Jan 12, 2018

@resouer Thanks. For changing the equivalence computation, the existing logic is pretty well isolated. I should only need to change EquivalencePodGenerator.getEquivalencePod in pkg/scheduler/algorithm/predicates/utils.go.

However, I propose to refactor a lot of that logic to simplify it and make code updates easier. I'll try to send a PR for that soon.

@wackxu

This comment has been minimized.

Show comment
Hide comment
@wackxu

wackxu Jan 13, 2018

Contributor

/cc

Contributor

wackxu commented Jan 13, 2018

/cc

bart0sh pushed a commit to bart0sh/kubernetes that referenced this issue Jan 25, 2018

Merge pull request kubernetes#58780 from misterikkit/invalCache
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Fix equivalence cache invalidation of Node condition.

Equivalence cache for CheckNodeConditionPred becomes invalid when
Node.Spec.Unschedulable changes. This can happen even if
Node.Status.Conditions does not change, so move the logic around.

This logic is covered by integration test
"test/integration/scheduler".TestUnschedulableNodes but equivalence
cache is currently skipped when test pods have no OwnerReference.

The test failure is exposed by kubernetes#58555 



**What this PR does / why we need it**:

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:

/ref kubernetes#58222


**Special notes for your reviewer**:

**Release note**:

```release-note
NONE
```
/sig scheduling
@kubernetes/sig-scheduling-pr-reviews

k8s-merge-robot added a commit that referenced this issue Jan 25, 2018

Merge pull request #58555 from misterikkit/equivHash
Automatic merge from submit-queue (batch tested with PRs 58302, 58782, 58555, 58741). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Change equivalence class hashing function

**What this PR does / why we need it**:
This updates the Pod equivalence class hashing function to hash pod fields which are read by scheduler predicates. Until now, we used a pod's OwnerReference as a shorthand for equivalence, but not all controllers will create homogeneous sets of pods.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
/ref #58222 

**Performance impact**:
Hashing is not expensive enough to impact scheduling performance.

|Test|Result|
|---|---|
| Before | `BenchmarkEquivalenceHash-40       200000              7722 ns/op` |
| After | `BenchmarkEquivalenceHash-40        10000            114184 ns/op` |

**Special notes for your reviewer**:

**Release note**:

```release-note
NONE
```
/sig scheduling

k8s-merge-robot added a commit that referenced this issue Feb 15, 2018

Merge pull request #59335 from resouer/equiv-vol
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Fixes volume predicate handler for equiv class

**What this PR does / why we need it**:

Per discussion in #58797 , we are missing some predicate handler in factory.go.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #58797

Ref #58222

**Special notes for your reviewer**:

Kindly ping @msau42

**Release note**:

```release-note
Fixes volume predicate handler for equiv class
```

k8s-merge-robot added a commit that referenced this issue Mar 27, 2018

Merge pull request #61644 from resouer/fix-deadlock
Automatic merge from submit-queue (batch tested with PRs 61644, 61624, 61743, 61019, 61287). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Use inline func to ensure unlock is executed

**What this PR does / why we need it**:

Per discussion: #61621 (comment)

**Special notes for your reviewer**:

Ref: #58222

**Release note**:

```release-note
Use inline func to ensure unlock is executed
```

k8s-merge-robot added a commit that referenced this issue Apr 26, 2018

Merge pull request #63174 from misterikkit/equivHash
Automatic merge from submit-queue (batch tested with PRs 62937, 63105, 63031, 63174). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Revert "Revert "Revert revert of equivalence class hash calculation i…

…n scheduler""

This reverts commit 4386751.



**What this PR does / why we need it**:
This re-introduces the change from #58555 which changes how the scheduler computes equivalence classes of pods. I believe we have fixed the flakiness observed previously (#61512, #62921). I have run the test in question a few dozen times without a failure.

```bash
make test-integration WHAT="./test/integration/scheduler" KUBE_TEST_ARGS="-run TestPreemptionStarvation" GOFLAGS="-v"
```

/ref #58222

**Special notes for your reviewer**:
I had to resolve several merge conflicts. I think I resolved them correctly, but keep an eye out for anything silly.

**Release note**:

```release-note
NONE
```
/sig scheduling
@k82cn

This comment has been minimized.

Show comment
Hide comment
@k82cn

k82cn May 11, 2018

Member

are we still target to 1.11 for this feature?

Member

k82cn commented May 11, 2018

are we still target to 1.11 for this feature?

@misterikkit

This comment has been minimized.

Show comment
Hide comment
@misterikkit

misterikkit May 11, 2018

Contributor

The goal is yes. Although I haven't created a PR yet to move it over.
I think we probably want some more tests in place to make sure that we're not missing weird edge cases. Also, we should collect metrics from e2e tests demonstrating the performance benefit.

Contributor

misterikkit commented May 11, 2018

The goal is yes. Although I haven't created a PR yet to move it over.
I think we probably want some more tests in place to make sure that we're not missing weird edge cases. Also, we should collect metrics from e2e tests demonstrating the performance benefit.

@wojtek-t

This comment has been minimized.

Show comment
Hide comment
@wojtek-t

wojtek-t May 30, 2018

Member

I vote for not doing it in 1.11 - this isn't urgent and we don't have full confidence in it.
Instead, we should enable it at the beginning of 1.12 and have it soaked for the whole release cycle.

Member

wojtek-t commented May 30, 2018

I vote for not doing it in 1.11 - this isn't urgent and we don't have full confidence in it.
Instead, we should enable it at the beginning of 1.12 and have it soaked for the whole release cycle.

@k82cn

This comment has been minimized.

Show comment
Hide comment
@k82cn

k82cn May 30, 2018

Member

/milestone 1.12

In last sig-scheduling meeting, we decide to move this to next release.

Member

k82cn commented May 30, 2018

/milestone 1.12

In last sig-scheduling meeting, we decide to move this to next release.

@k8s-ci-robot

This comment has been minimized.

Show comment
Hide comment
@k8s-ci-robot

k8s-ci-robot May 30, 2018

Contributor

@k82cn: The provided milestone is not valid for this repository. Milestones in this repository: [next-candidate, v1.10, v1.11, v1.12, v1.13, v1.4, v1.5, v1.6, v1.7, v1.8, v1.9]

Use /milestone clear to clear the milestone.

In response to this:

/milestone 1.12

In last sig-scheduling meeting, we decide to move this to next release.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Contributor

k8s-ci-robot commented May 30, 2018

@k82cn: The provided milestone is not valid for this repository. Milestones in this repository: [next-candidate, v1.10, v1.11, v1.12, v1.13, v1.4, v1.5, v1.6, v1.7, v1.8, v1.9]

Use /milestone clear to clear the milestone.

In response to this:

/milestone 1.12

In last sig-scheduling meeting, we decide to move this to next release.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k82cn

This comment has been minimized.

Show comment
Hide comment
@k82cn

k82cn May 30, 2018

Member

/milestone clear

Member

k82cn commented May 30, 2018

/milestone clear

@k8s-ci-robot k8s-ci-robot removed this from the v1.11 milestone May 30, 2018

@misterikkit

This comment has been minimized.

Show comment
Hide comment
@misterikkit

misterikkit Jun 4, 2018

Contributor

The current plan is to do this in 1.12. Can someone update the milestone?

Contributor

misterikkit commented Jun 4, 2018

The current plan is to do this in 1.12. Can someone update the milestone?

@resouer

This comment has been minimized.

Show comment
Hide comment
@resouer

resouer Jun 4, 2018

Member

Milestone already cleared

Member

resouer commented Jun 4, 2018

Milestone already cleared

@k82cn

This comment has been minimized.

Show comment
Hide comment
@k82cn

k82cn Jun 5, 2018

Member

/milestone v1.12

Member

k82cn commented Jun 5, 2018

/milestone v1.12

@k8s-ci-robot k8s-ci-robot added this to the v1.12 milestone Jun 5, 2018

@k8s-merge-robot

This comment has been minimized.

Show comment
Hide comment
@k8s-merge-robot

k8s-merge-robot Jun 20, 2018

Contributor

[MILESTONENOTIFIER] Milestone Issue: Up-to-date for process

@misterikkit @resouer

Issue Labels
  • sig/scheduling: Issue will be escalated to these SIGs if needed.
  • priority/important-soon: Escalate to the issue owners and SIG owner; move out of milestone after several unsuccessful escalation attempts.
  • kind/feature: New functionality.
Help
Contributor

k8s-merge-robot commented Jun 20, 2018

[MILESTONENOTIFIER] Milestone Issue: Up-to-date for process

@misterikkit @resouer

Issue Labels
  • sig/scheduling: Issue will be escalated to these SIGs if needed.
  • priority/important-soon: Escalate to the issue owners and SIG owner; move out of milestone after several unsuccessful escalation attempts.
  • kind/feature: New functionality.
Help
@resouer

This comment has been minimized.

Show comment
Hide comment
@resouer

resouer Jul 25, 2018

Member

[Quick Update]

The issue is up-to-date after #65714

The on going tasks are #62226 and #66148 to add more integration tests for volume related predicates.

Member

resouer commented Jul 25, 2018

[Quick Update]

The issue is up-to-date after #65714

The on going tasks are #62226 and #66148 to add more integration tests for volume related predicates.

@dims

This comment has been minimized.

Show comment
Hide comment
@dims

dims Aug 24, 2018

Member

@misterikkit @resouer Does this still need the v1.12 milestone?

Member

dims commented Aug 24, 2018

@misterikkit @resouer Does this still need the v1.12 milestone?

@resouer

This comment has been minimized.

Show comment
Hide comment
@resouer

resouer Aug 25, 2018

Member

Though the blocker issue in 1.11 has been fixed, I don't think we can catch 1.12 code freeze since we are discussing re-org ecache to predicates pkg in #67241, which still need some agreement.

Member

resouer commented Aug 25, 2018

Though the blocker issue in 1.11 has been fixed, I don't think we can catch 1.12 code freeze since we are discussing re-org ecache to predicates pkg in #67241, which still need some agreement.

@dims

This comment has been minimized.

Show comment
Hide comment
@dims

dims Aug 31, 2018

Member

ack. let me clear the milestone then @resouer

/milestone clear

Member

dims commented Aug 31, 2018

ack. let me clear the milestone then @resouer

/milestone clear

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment