Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

plugin: Transaction-based speculative reserve and logic unification #936

Merged
merged 2 commits into from
Jun 16, 2024

Conversation

sharnoff
Copy link
Member

@sharnoff sharnoff commented May 20, 2024

Trying out this approach as a potential way to make implementing overcommit (#517) easier, to avoid dealing with rounding errors when responding to the autoscaler-agents.

I think this "speculative reserve" may be a reasonable change in isolation, and reducing the number of separate implementations of the same logic should also help with overcommit (because of how it affects every single resource calculation).

Not 100% sure we want to go this route, but it's my best guess so far.


Tested this locally and it seemed to be ok; already fixed a couple bugs based on suspicious logs. There might be more.

Copy link
Contributor

@Omrigan Omrigan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial review

pkg/plugin/xact/xact.go Outdated Show resolved Hide resolved
pkg/plugin/state.go Outdated Show resolved Hide resolved
@sharnoff sharnoff force-pushed the sharnoff/plugin-replace-readClusterState branch from c8e250e to ceceb07 Compare May 22, 2024 16:20
Base automatically changed from sharnoff/plugin-replace-readClusterState to main May 22, 2024 17:13
@sharnoff sharnoff force-pushed the sharnoff/plugin-transactions branch 3 times, most recently from 921c0d4 to 51f1d82 Compare May 24, 2024 21:18
)
}

overbudget = r.node.Reserved > r.node.Total
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't it be r.node.Reserved + r.node.Buffer > r.node.Total?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Buffer is included in Reserved, so it should be ok as is

// New returns a new transaction object (called Xact) operating on the given pointer
func New[T any](ptr *T) *Xact[T] {
return &Xact[T]{
tmp: *ptr,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing I am a little bit afraid of is T might have pointers, which will not be part of the transaction state.

We may consider pointer members not proper members, therefore, not part of a transaction, and maybe clarify this thing in comments.
Or, we can change this to reflect.DeepCopy.

WDYT?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair. Added a note about this in 0972c4c -- the types I intended to use for this don't have pointers (and at least IMO, that's a reasonable restriction for now).


r.node.Reserved += r.pod.Reserved
r.node.Buffer += r.pod.Buffer
callback := func(oldState, newState resourceState[T]) string {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, perhaps all verdict formatting should be gathered at a single place, like a structure verdictGenerator, which will be used by handleRequestedGeneric.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that idea -- I think it may end up somewhat complex; wdyt about leaving it as a follow-up?

@@ -150,26 +134,130 @@ func (r resourceTransitioner[T]) handleReserve() (overbudget bool, verdict strin
// Any permitted increases are required to be a multiple of factor.
//
// A pretty-formatted summary of the outcome is returned as the verdict, for logging.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be easier to understand, if those comments would more clearly reflect the difference between handleRequested and handleReserve.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough - added some comments in bbfca61, although not sure if there's more I should elaborate on.

@sharnoff
Copy link
Member Author

Updates:

  • Rebased onto main
  • Add note about limitations w.r.t. pointers: 0972c4c
  • Added comments about the difference between handleRequested vs handleReserve: bbfca61

This commit adds 'pkg/util/xact' with usage in scheduler plugin such
that all places that previously implemented checks for "can this pod fit
on that node" are now done by speculatively reserving the pod, checking
if it left the node over-full, and then deciding whether to accept it.

This is unified through a new method speculativeReserve(). Internally,
speculativeReserve() calls a new method on resourceTransitioner, which
makes follow-up work easier.
This introduces a new method on resourceTransitioner[T],
handleRequestedGeneric(), which implements a superset of the
functionality needed by both handleReserve and handleRequested.

The two latter methods now internally "just" call the new method, albeit
with different parameterizations.

---

Other changes:

- The new method has a notion of "forced approval minimum", which sets
  the minimum value it must approve, even if the node would be
  over-budget.
  - This is used to retain the infallible behavior of handleReserve.
  - This is *also* used to move the lastPermit logic into
    handleRequested.
- Because of the new forced approval minimum, handleLastPermit is no
  longer needed.

---

In general, the motivation behind this change is to reduce the number of
distinct implementations of resource logic, so there's fewer changes
involved in implementing overcommit.
@sharnoff sharnoff force-pushed the sharnoff/plugin-transactions branch from bbfca61 to 148a85d Compare June 16, 2024 22:27
@sharnoff sharnoff enabled auto-merge (rebase) June 16, 2024 22:28
@sharnoff sharnoff merged commit 47880eb into main Jun 16, 2024
15 checks passed
@sharnoff sharnoff deleted the sharnoff/plugin-transactions branch June 16, 2024 22:39
sharnoff added a commit that referenced this pull request Jun 16, 2024
This commit adds 'pkg/util/xact' with usage in scheduler plugin such
that all places that previously implemented checks for "can this pod fit
on that node" are now done by speculatively reserving the pod, checking
if it left the node over-full, and then deciding whether to accept it.

This is unified through a new method speculativeReserve(). Internally,
speculativeReserve() calls a new method on resourceTransitioner, which
makes follow-up work easier.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants