Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The Dark Reaper should run continuously instead of exiting early when there’s nothing to do #4377

Closed
dchristidis opened this issue Mar 4, 2021 · 2 comments · Fixed by #4379

Comments

@dchristidis
Copy link
Contributor

Motivation

The Dark Reaper has its own implementation of list_rses() which only returns RSEs that have at least one quarantined replica. This is not optimal for two reasons:

  1. If there are no quarantined replicas, then the Reaper exists early. Daemons running on Kubernetes are expected to run continuously.
  2. If quarantined replicas are added for an RSE that didn’t have any when the daemon started, then they won’t be processed until the next restart.

Modification

Replace the custom list_rses() implementation with the one from the core.

@dchristidis dchristidis self-assigned this Mar 4, 2021
dchristidis added a commit to dchristidis/rucio that referenced this issue Mar 4, 2021
This is a quick and dirty solution to running the Dark Reaper on
Kubernetes. We can improve it with some refactoring in the future.
dchristidis added a commit to dchristidis/rucio that referenced this issue Mar 4, 2021
This is a quick and dirty solution to running the Dark Reaper on
Kubernetes. We can improve it with some refactoring in the future.
bari12 added a commit that referenced this issue Mar 16, 2021
…ow_the_Dark_Reaper_to_run_continuously

Consistency: Allow the Dark Reaper to run continuously #4377
bari12 pushed a commit that referenced this issue Mar 16, 2021
This is a quick and dirty solution to running the Dark Reaper on
Kubernetes. We can improve it with some refactoring in the future.
@bari12 bari12 added this to the 1.25.1 milestone Mar 16, 2021
dchristidis added a commit to dchristidis/rucio that referenced this issue Apr 7, 2021
Otherwise, the original list is lost and the Dark Reaper will not pick
up newly-added quarantined replicas.
bari12 pushed a commit that referenced this issue Apr 12, 2021
* Consistency: Do not overwrite the list of RSEs #4377

Otherwise, the original list is lost and the Dark Reaper will not pick
up newly-added quarantined replicas.

* Consistency: Correct function definition and documentation #4377
bari12 pushed a commit that referenced this issue Apr 12, 2021
* Consistency: Do not overwrite the list of RSEs #4377

Otherwise, the original list is lost and the Dark Reaper will not pick
up newly-added quarantined replicas.

* Consistency: Correct function definition and documentation #4377
@ericvaandering
Copy link
Contributor

@dchristidis We still see issues with this in 1.25.2. The DR runs continuously, it just refuses to pick up new work from the database.

More specifically, we add a file to quarantined_replicas for Purdue and start DR. It deletes the file. We add another file to QR and DR says it has nothing to do. Then we restart DR and it picks up the work.

Thoughts?

@dchristidis
Copy link
Contributor Author

Rucio 1.25.1 introduced a partial fix (#4379): the Dark Reaper could now start even if the quarantined replicas table were empty. However, due to an oversight, it still wouldn’t process newly-added replicas.
Rucio 1.25.3 contains an additional fix (#4524) which should correctly address the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants