-
Notifications
You must be signed in to change notification settings - Fork 274
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
queueworker: prevent stop event on WorkerSleepException (PROJQUAY-1857) #737
Merged
kleesc
merged 1 commit into
quay:master
from
kleesc:prevent-stop-event-on-lock-exception
Apr 12, 2021
Merged
queueworker: prevent stop event on WorkerSleepException (PROJQUAY-1857) #737
kleesc
merged 1 commit into
quay:master
from
kleesc:prevent-stop-event-on-lock-exception
Apr 12, 2021
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
alecmerdler
previously approved these changes
Apr 10, 2021
b7e134e
to
6368ee0
Compare
@alecmerdler Will need reapproval |
Prevents the queueworker from setting the event to stop the poll_queue job when a WorkerSleepException is raised. On WorkerSleepException, the worker should instead skip this iteration (go to sleep). e.g when the NamespaceGCWorker can't acquire a lock because it is already taken by some other worker. Reverts the gcworkers job timeout from 24h to 3h. In case of a deadlock between processes (for example, redeploying the app will not clear the existing Redis keys), 24h is too long waiting for the locks to expires so that the workers can resume work. Add missing Counter increment for on row deletion on the Manifest table.
6368ee0
to
829cc56
Compare
alecmerdler
approved these changes
Apr 12, 2021
kleesc
added a commit
to kleesc/quay
that referenced
this pull request
Apr 12, 2021
…7) (quay#737) Prevents the queueworker from setting the event to stop the poll_queue job when a WorkerSleepException is raised. On WorkerSleepException, the worker should instead skip this iteration (go to sleep). e.g when the NamespaceGCWorker can't acquire a lock because it is already taken by some other worker. Reverts the gcworkers job timeout from 24h to 3h. In case of a deadlock between processes (for example, redeploying the app will not clear the existing Redis keys), 24h is too long waiting for the locks to expires so that the workers can resume work. Add missing Counter increment for on row deletion on the Manifest table.
kleesc
added a commit
that referenced
this pull request
Apr 12, 2021
* gc: fix GlobalLock ttl unit and increase gc workers lock timeout (#712) Correctly converts the given ttl from seconds to milliseconds when passed to Redis (redlock uses 'px', not 'ex'). Also increase the lock timeout of gc workers to 1 day. Some iteration, for repos with large numbers of tags (1000s), will take more than 15 minutes to complete. This change will prevent multiple workers GCing the same repo, and one possibly preempting another. GlobalLock's ttl will make the lock available again when expired, but will not actually stop execution of the current GC iteration until the GlobalLock context is done. Having a 1 day timeout should be enough. NOTE: The correct solution would have GlobalLock should either renew the lock until the caller is done, or signal that it is no longer valid to the caller. * gc: add metrics for deleted resources (#711) Add counters for the number of resources deleted by the gc worker, the repository gc worker and the namespace gc worker. * queueworker: prevent stop event on WorkerSleepException (PROJQUAY-1857) (#737) Prevents the queueworker from setting the event to stop the poll_queue job when a WorkerSleepException is raised. On WorkerSleepException, the worker should instead skip this iteration (go to sleep). e.g when the NamespaceGCWorker can't acquire a lock because it is already taken by some other worker. Reverts the gcworkers job timeout from 24h to 3h. In case of a deadlock between processes (for example, redeploying the app will not clear the existing Redis keys), 24h is too long waiting for the locks to expires so that the workers can resume work. Add missing Counter increment for on row deletion on the Manifest table.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Prevents the queueworker from setting the event to stop the poll_queue
job when a WorkerSleepException is raised. On WorkerSleepException,
the worker should instead skip this iteration (go to sleep). e.g when
the NamespaceGCWorker can't acquire a lock because it is already taken
by some other worker.
Reverts the gcworkers job timeout from 24h to 3h. In case of a
deadlock between processes (for example, redeploying the app will not
clear the existing Redis keys), 24h is too long waiting for the locks to
expires so that the workers can resume work.
Add missing Counter increment for on row deletion on the Manifest table.