Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: More reliable resource quota enforcement #20113

Closed
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
228 changes: 228 additions & 0 deletions docs/proposals/resource-quota-reservations.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,228 @@
<!-- BEGIN MUNGE: UNVERSIONED_WARNING -->

<!-- BEGIN STRIP_FOR_RELEASE -->

<img src="http://kubernetes.io/img/warning.png" alt="WARNING"
width="25" height="25">
<img src="http://kubernetes.io/img/warning.png" alt="WARNING"
width="25" height="25">
<img src="http://kubernetes.io/img/warning.png" alt="WARNING"
width="25" height="25">
<img src="http://kubernetes.io/img/warning.png" alt="WARNING"
width="25" height="25">
<img src="http://kubernetes.io/img/warning.png" alt="WARNING"
width="25" height="25">

<h2>PLEASE NOTE: This document applies to the HEAD of the source tree</h2>

If you are using a released version of Kubernetes, you should
refer to the docs that go with that version.

Documentation for other releases can be found at
[releases.k8s.io](http://releases.k8s.io).
</strong>
--

<!-- END STRIP_FOR_RELEASE -->

<!-- END MUNGE: UNVERSIONED_WARNING -->

# Resource Quota - reservations

## Problem Description

When a request for creating a resource is received,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do updates affect quota?

(sorry for the questions, I do realize this section is about the way it works right now)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for object count quotas, updates are irrelevant.
for compute resource updates (if / when we actually support them), quota handles them.

the associated resource quota is checked by the
server in admission control. If the request would
not violate the quota, the quota usage is incremented
and the requesting object is allowed to be created.
Quota usage is incremented using compare-and-swap
to allow for optimistic locking. If the quota usage
was stale, the admission control logic runs again
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume "stale" means version number conflict? You should probably mention that explicitly.

to validate the request is still valid with the latest
quota document.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if the creation fails for other reasons afterwards?


A quota controller regularly sweeps the system to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deletions don't return quota?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lavalamp - deletions return quota for pods today immediately. I have code locally that I am opening shortly to get the same behavior for the other resource types.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deletions queue a quota update for all kinds in current code base.

recalculate observed usage. This is primarily useful
for monitoring deleted resources and replenishing
available quota. If the quota controller sees a delta
in its recalculated usage relative to the currently
reported status, it will update the usage. If its update
is stale, it will recalculate usage again.

If the quota controller works has an up to date quota,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some extra or missing words here

but interacts with a latent API server, the quota controller
could under report usage. This is possible in a single
server environment (though the race window is small), but
the risk is commensurate to the number of masters in an
HA configuration and their relative latency.

In pseudo-code:

**API_SERVER**

```
Admission control (quota)
T1. QUOTA = ...
T2. IF QUOTA.USED + REQUEST >= QUOTA.HARD
T3. REJECT
T4. UPDATE QUOTA.STATUS(QUOTA.USED + REQUEST)
T5. ADMIT

RESTStorage
T6. Create Object
```

**QUOTA_CONTROLLER**

```
T1. QUOTA := ...
T2. OBSERVED_USAGE := ... [live queries based on latest data]
T3. UPDATE QUOTA.STATUS.USED IFF OBSERVED USAGE != QUOTA.STATUS.USED
```

If the `QUOTA_CONTROLLER.T1` happens after `API_SERVER.T4`,
and `QUOTA_CONTROLLER.T3` happens before `API_SERVER.T6` completes,
then when the quota controller recalculates usage, it could have calculated
usage without observing the outcome of the create.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There must be something about ResourceVersion I don't understand. If the quota controller writes quota.status in between API_SERVER.T4 and API_SERVER.T6, won't API_SERVER.T6 fail due to resource version conflict? These are all writing the same object, right?


In practice, this scenario is *extremely rare*, but could happen if
there was a network hiccup of some kind that blocked creation of
the object at time `API_SERVER.T6` in etcd without causing the request
to timeout. The API server would have also been responding and reading
from that same etcd to support a full quota recalculation and status
update at time `QUOTA_CONTROLLER.T3`.

The next controller pass would recalculate usage correctly.

A solution should be put in place to mitigate this risk and a long-term
plan to handle should be agreed upon.

## Use Cases

* As a `cluster-admin`, I want more reliable quota enforcement.

### ResourceQuota reservations

The recommended solution is for the quota status to track
observed usage and reserved usage as separate values.

The quota admission controller allows a request to be admitted
if the `status.used + status.reserved + request <= status.hard`.
If it chooses to admit the object, it will add a new reservation
to the quota status that captures the `APIVersion`, `Kind`,
and `UID` of the object that made the reservation. It will also
add a `ExpirationTime` that dictates how long that reservation
is valid.

The quota controller must be updated to become reservation aware
in its synchronization loop. It would calculate observed usage
as before, but it would compare the set of observed resources
against the set of current reservations. If an object is observed,
any corresponding reservation made by that object will be removed
since its consumption will properly be tracked in usage. If a
reservation has expired, it will be removed from the quota document.
A default expirationTime could be applied to sufficiently mitigate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will apiserever.T6 fail if the corresponding quota expires?

risk based on the anticipated quorum window across masters.

In order to implement this solution, an admission controller must
have access to the UID that will be assigned to an object at creation
time. This is currently not the case since UID assignment happens
after admission control, and any UID that is assigned by admission
control is overwritten. The API server code should be updated
to populate ObjectMetaSystemFields prior to invoking admission control.
For this proposal, UID is required since Name would be insufficient
to track against. For a number of other use cases, code is simplified
if Name is set as well (error-messages come to mind).

## Data Model Impact

```
type ResourceQuotaReservation struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to document hidden state in etcd somewhere.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bgrant0607 - i know its been awhile, but what did you mean by hidden state? is the transient reservation concept hidden state in your view?

// ReservedBy is a reference to an object that took the
// reservation. It must have an APIVersion, Kind, and UID.
ReservedBy ObjectReference `json:"reservedBy"`
// Reserved is the incremental set of resources reserved
Reserved ResourceList `json:"reserved,omitempty"`
// ExpirationTime defines when the reservation if not realized is expired
ExpirationTime unversioned.Time `json:"expirationTime,omitempty"`
}

// ResourceQuotaStatus defines the enforced hard limits and observed use
type ResourceQuotaStatus struct {
// Hard is the set of enforced hard limits for each named resource
Hard ResourceList `json:"hard,omitempty"`
// Used is the current observed total usage of the resource in the namespace
Used ResourceList `json:"used,omitempty"`
// Reservations is the reserved usage not yet observed in the namespace
Reservations []ResourceQuotaReservations `json:"reservations,omitempty"`
}
```

## Rest API Impact

None.

## Security Impact

None.

## End User Impact

The `kubectl` commands that render quota should display
reserved usage separate from observed usage.

## Performance Impact

In theory, this feature would allow the quota controller
to run a more frequent loop across its data set to find
reservations and calculate usage specific to a kind
with greater frequency.

It would result in additional writes to the API server as it
expires reservations.

## Developer Impact

None.

## Alternatives

The ability to request a quorum read could be added to the
API server. The quota controller could request quorum reads
to get a more accurate usage count, but this does not eliminate
the race condition between the quorum read and the quota status
update.

Using a reservation model eliminates the race window without
requring quorum reads.

## Implementation

### Assignee

@derekwaynecarr

### Work Items

* Add support for UID assignment in admission control
* Add support for reservation in admission control
* Add support for expiring reservations in quota controller

## Dependencies

None

## Testing

Appropriate unit and e2e testing will be authored.

## Documentation Impact

Existing resource quota documentation and examples will be updated.



<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/resource-quota-reservations.md?pixel)]()
<!-- END MUNGE: GENERATED_ANALYTICS -->