Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Garbage Collector] Umbrella: known issues of garbage collector #26120

Closed
caesarxuchao opened this issue May 23, 2016 · 12 comments
Closed

[Garbage Collector] Umbrella: known issues of garbage collector #26120

caesarxuchao opened this issue May 23, 2016 · 12 comments
Assignees
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery.

Comments

@caesarxuchao
Copy link
Member

caesarxuchao commented May 23, 2016

This is a umbrella issue for Todos and known problems of the garbage collector that we have to solve before graduating GC to beta.

_Functionalities_

  1. [optional]
    This race will cause DeleteOptions.OrphanDependents=true not orphan all dependents. We expect this race to be rare. Here are the details: there is no guarantee on the ordering of the events of different resources, it's possible that the GC observes the orphan request (which is an Update event of the owner resource), before observing the creation/update of the dependents. Consequently, these dependents will not be orphaned. [GarbageCollector] monitor the watch latency #30483 adds a monitor the latency of watch, because the latency is small, so this race is rare.
    Considered solution:
    a. let the GC wait for a short period of time (e.g., 1 min) before carrying out the orphaning procedure, but it won't thoroughly solve the problem, and it will make the deletion slow.
    b. let the user supply a resourceVersion, and before carrying out the orphaning procedure, let GC wait until it has observed events of the dependent resource with a larger resourceVersion. Problem is GC is a client, and a client should treat resourceVersion as opaque.
  2. [optional] Need a discovery mechanism to determine what resources GC should manage. For example, GC needs to determine if it should watch for extensions/v1beta1/job or batch/v1/job. (edit: this use case doesn't matter, because the ownerRef will only point to one version of the object, the other version of the object will just be a shadow)
  3. [DONE] According to the controllerRef proposal Proposal for ControllerReference #25256, "GarbageCollector will remove ControllerRef from objects that no longer points to existing controllers".
  4. [DONE] Update at least one controller to use GC. Now replicaset controller and replicationcontroller manager use GC ([GarbageCollector] Let the RC manager set/remove ControllerRef #27600)
  5. [DONE] [update] we have foreground gc now. Expose the progress of garbage collection. See [RFC][GarbageCollector] expose the progress of garbage collection #29891.
  6. [Fixing, see GC: Fix re-adoption race when orphaning dependents. #42938 ] The design doc said before orphaning dependents, GC should wait for the owner's controller to bump up owner's ObservedGeneration, which means the owner controller has acknowledged that it has observed the deletion of the owner and will stop adoption. Otherwise GC's orphaning process will race with owner controller's adoption process, and results in the deletion of the dependents. We hasn't implemented this yet. We expect this race to be rare, because currently only the replicaset and replication controller does the adoption, and it's triggered by updates to RC or the 10-minute resync. We have an e2e test for orphaning 100 pods controlled by a RC, it never hits this race.
  7. [Done for new resources. For old resources 200 is returned for compatiblity] API server should return 202 if a deletion request is not completed synchronously. (API server should return 202 if the operation is asynchronous #33196)
  8. [ tracked in https://github.com/Garbage collector should support non-core APIs #44507] Supporting non-core APIs, either registered via ThirdParty Resource or kube-aggregator.

_Peformance_

  1. [Mechanism is there, need numbers] benchmark the average queuing time (eventQueue and dirtyQueue) ([GarbageCollector] measure latency #28387)
    2. [Done] Improvement: update the List and Watch to only store the TypeMeta and ObjectMeta ([GarbageCollector] only store typeMeta and objectMeta in the gc store #28480)
  2. [Done] [GarbageCollector] add absent owner cache #31167. Caching known deleted UID. GC contacts API server to see if an owner exists when processing its dependents. If the owner doesn't exist according to API server, it won't exist in the future either, because it's impossible for user to predicate UID of a future object and put it in the ownerRef. Such a cache is very useful in the RC-Pods case, because GC will check for the existence of the RC for every pods it created. Such a cache will help API server as well because a GET request with JSON media type is expensive for the API server.
    4. [RFC] In API server, support resouceVersion as a delete precondition. This will make the Get() in processItem() unnecessary.

References:

@lavalamp @gmarek @derekwaynecarr @kubernetes/sig-api-machinery

@caesarxuchao caesarxuchao added the sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. label May 23, 2016
@caesarxuchao caesarxuchao self-assigned this May 23, 2016
@hongchaodeng
Copy link
Contributor

hongchaodeng commented May 23, 2016

GC observes the orphan request (which is an Update event of the owner resource), before observing the creation/update of some dependents

Can you give an example for "Update event of the owner resource", "creation/update of some dependents"?

Where does the event come from? Why aren't they ordered?

@caesarxuchao
Copy link
Member Author

@hongchaodeng, Let's use a replication controller and its pods as an example.
In the real time, things happen in this order:

  • the replication controller R is created;
  • the replication controller manager creates several pods P1-Pn, which are the dependents of R;
  • user deletes R with DeleteOptions.OrphanDependents=true and expects P1-Pn are deleted.

But from the GC's perspective, it may observe things in this order:

  • the replication controller R is created;
  • user deletes R with DeleteOptions.OrphanDependents=true
  • the replication controller manager creates several pods P1-Pn, which are the dependents of R;

So the GC doesn't know P1-Pn are dependents of R, so it fails to orphan them.

@caesarxuchao
Copy link
Member Author

@deads2k FYI

@caesarxuchao caesarxuchao changed the title Known issues of the garbage collector [Garbage Collector] Unbrella: TODOs of the garbage collector for beta Jul 20, 2016
@caesarxuchao caesarxuchao changed the title [Garbage Collector] Unbrella: TODOs of the garbage collector for beta [Garbage Collector] Umbrella: TODOs of the garbage collector for beta Jul 20, 2016
@smarterclayton
Copy link
Contributor

It's reasonable for certain clients to potentially treat resource version as comparable - GC is certainly one of them. We do want to have UID tombstones eventually, but it's certainly a lot of work to get there. What other options do we have?

@smarterclayton
Copy link
Contributor

Need a discovery mechanism to determine what resources GC should manage.

As discussed today on sig meeting, we need this for the "migrate client" to touch every object at least once as well.

@caesarxuchao
Copy link
Member Author

caesarxuchao commented Jul 25, 2016

It's reasonable for certain clients to potentially treat resource version as comparable - GC is certainly one of them. We do want to have UID tombstones eventually, but it's certainly a lot of work to get there. What other options do we have?

@smarterclayton By UID tombstones do you mean etcd3's ability to store versions of deleted objects or constructing our own API?

@smarterclayton
Copy link
Contributor

smarterclayton commented Jul 26, 2016 via email

@caesarxuchao caesarxuchao changed the title [Garbage Collector] Umbrella: TODOs of the garbage collector for beta [Garbage Collector] Umbrella: known issues of garbage collector Aug 19, 2016
k8s-github-robot pushed a commit that referenced this issue Aug 25, 2016
Automatic merge from submit-queue

[GarbageCollector] add absent owner cache

<!--  Thanks for sending a pull request!  Here are some tips for you:
1. If this is your first time, read our contributor guidelines https://github.com/kubernetes/kubernetes/blob/master/CONTRIBUTING.md and developer guide https://github.com/kubernetes/kubernetes/blob/master/docs/devel/development.md
2. If you want *faster* PR reviews, read how: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/faster_reviews.md
3. Follow the instructions for writing a release note: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/pull-requests.md#release-notes
-->

**What this PR does / why we need it**:
Reducing the Request sent to the API server by the garbage collector to check if an owner exists.

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #

#26120

**Special notes for your reviewer**:

**Release note**:
<!--  Steps to write your release note:
1. Use the release-note-* labels to set the release note state (if you have access) 
2. Enter your extended release note in the below block; leaving it blank means using the PR title as the release note. If no release note is required, just write `NONE`. 
-->
```release-note
```

Currently when processing an item in the dirtyQueue, the garbage collector issues GET to check if any of its owners exist. If the owner is a replication controller with 1000 pods, the garbage collector sends a GET for the RC 1000 times. This PR caches the owner's UID if it does not exist according to the API server. This cuts 1/3 of the garbage collection time of the density test in the gce-500 and gce-scale, where the QPS is the bottleneck.
@0xmichalis
Copy link
Contributor

The design doc said before orphaning dependents, GC should wait for the owner's controller to bump up owner's ObservedGeneration, which means the owner controller has acknowledged that it has observed the deletion of the owner and will stop adoption.

Is deletion incrementing metadata.Generation?

@caesarxuchao
Copy link
Member Author

Is deletion incrementing metadata.Generation?

Yes.

k8s-github-robot pushed a commit that referenced this issue Mar 15, 2017
Automatic merge from submit-queue (batch tested with PRs 43106, 43110)

Wait for garbagecollector to be synced in test

Fix #42952

Without the `cache.WaitForCacheSync` in the test, it's possible for the GC to get a merged event of RC's creation and its update (update to deletionTimestamp != 0), before GC gets the creation event of the pod, so it's possible the GC will handle the foreground deletion of the RC before it adds the Pod to the dependency graph, thus the race.

With the `cache.WaitForCacheSync` in the test, because GC runs a single thread to process graph changes, it's guaranteed the Pod will be added to the dependency graph before GC handles the foreground deletion of the RC.

Note that this pull fixes the race in the test. The race described in the first point of #26120 still exists.
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 31, 2017
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 30, 2018
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery.
Projects
None yet
Development

No branches or pull requests

6 participants