-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEAT] Add yaml argument to merlin stop-workers #98
Comments
This is a good idea (and not too tough to implement). I think instead of targeting the original spec My only question is whether this would scan the spec for worker names, queue names, or both. |
Ben:
When I run a series, the yaml file in each sub-directory constructs a queue name based on the name of the sub-directory. I want to get rid of all celery tasks associated with the queues named in the yaml file. In all the ensembles I have run so far, I keep the queue name the same if I submit multiple jobs to the batch queue (because I couldn’t finish the ensemble in a single batch time slot). At this stage I want to get rid of all tasks associated with the queue, not just the ones associated with a single batch job, so putting the “cancel tasks” in the directory for the ensemble (not the directory for a single batch run) would be more convenient for me. Your approach would be better if only one batch submission was bad and needed to be cleaned out. Either approach is acceptable to me.
I am not currently using worker names. What function do they serve?
The last time I checked, the tasks associated with a yaml file were processed in a first in, first out manner. I currently use a single queue per yaml file because multiple queues would not let me set “priorities”. If we change that, I might switch to multiple queues per ensemble so that post-processing can happen as soon as a run of the code completes. In that case, I would want to get rid of tasks associated with all queues in the yaml file. I think it would be simplest to scan the yaml file and remove the tasks in all mentioned queues rather than trying to add constructs allowing me finer grained control.
Steve
From: Benjamin Bay <notifications@github.com>
Reply-To: LLNL/merlin <reply@reply.github.com>
Date: Thursday, January 23, 2020 at 12:35 PM
To: LLNL/merlin <merlin@noreply.github.com>
Cc: "Langer, Steve" <langer1@llnl.gov>, Author <author@noreply.github.com>
Subject: Re: [LLNL/merlin] [FEAT] Add yaml argument to merlin stop-workers (#98)
This is a good idea (and not too tough to implement). I think instead of targeting the original spec my_ensemble.yaml, we'd build this to target the provenance spec my_ensemble_20191217-133515/merlin_info/my_ensemble.yaml (similar to merlin restart).
My only question is whether this would scan the spec for worker names, queue names, or both.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub<#98?email_source=notifications&email_token=AOGRLJHOZ6CI56357WOXHSDQ7HWPVA5CNFSM4KK3GZF2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJYSGEY#issuecomment-577839891>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AOGRLJEFPM24S2PT5UQ6YXTQ7HWPVANCNFSM4KK3GZFQ>.
|
@ben-bay If a user gives the yaml file then I would expect that the want to stop the workers listed in that spec. The queues would be associated to those workers. Maybe there could be an option to use the queues instead. The user can give the workers a name though so the code will need to scan the worker_args before using the constructed name. |
This was added in #118 |
🚀 Feature Request
What problem is this feature looking to solve?
Merlin runs sometimes fail and leave celery workers running. I want a convenient way to get rid of the workers from the ensemble that failed.
Describe the solution you'd like
I could use "merlin -f stop-workers", but I only want to get rid of workers associated with the queues from the ensemble that failed. I can supply the --queues argument, but that hard codes the queue name. I want something like "merlin -f stop-workers my_ensemble.yaml".
I organize my merlin runs in series with a sub-directory for each ensemble. There is a yaml file with the same name (but different parameters) in each directory. I can put the above command in a shell script and use it with any ensemble in the series (the yaml file knows how to compute the queue name).
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.
The text was updated successfully, but these errors were encountered: