You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It would be useful to have the possibility to set a batching parameter for grouping fast-running processes within the same HPC scheduler job. My idea would be to have a bunch of instances of the same process run sequentially within a single scheduler job, so to avoid overwhelming the scheduler with many jobs.
I sometimes find myself with what logically should be a single process to be too fast executing to merit allocating a job on the cluster, and if I do I end up with more than one million jobs, hitting limits on the number of concurrent jobs that the cluster admin allows. I could batch within the process execution in some custom way, but this requires ad-hoc code refactoring and it is not very elegant (i.e. a for loop running what should be a single process on a batch of inputs).
I don't know how easy this would be to implement, but was wondering weather anyone would have a similar need or has an idea if this is feasible.
The text was updated successfully, but these errors were encountered:
This is a sensible request. It sounds very similar to the task batching being worked on in PR #3909. That work has not been finalized or merged yet, but would the features described there solve your problem?
This will be addressed by #3909 , until then you can use this pattern which is essentially the custom solution you described. But I think we will try to merge the task batching PR sometime this year.
New feature
It would be useful to have the possibility to set a batching parameter for grouping fast-running processes within the same HPC scheduler job. My idea would be to have a bunch of instances of the same process run sequentially within a single scheduler job, so to avoid overwhelming the scheduler with many jobs.
I sometimes find myself with what logically should be a single process to be too fast executing to merit allocating a job on the cluster, and if I do I end up with more than one million jobs, hitting limits on the number of concurrent jobs that the cluster admin allows. I could batch within the process execution in some custom way, but this requires ad-hoc code refactoring and it is not very elegant (i.e. a for loop running what should be a single process on a batch of inputs).
I don't know how easy this would be to implement, but was wondering weather anyone would have a similar need or has an idea if this is feasible.
The text was updated successfully, but these errors were encountered: