You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Consider the following scenario. A simulation is split into 10 independent (parallel) tasks. The simulation has a bunch of parameters. 4 Engines are up and connected to a single controller. Several clients submit the same simulation but with different simulation parameters to the controller via load balanced views.
What seems to happen now is that the controller schedules the 10 tasks of the first client on it's engines. The 10 tasks of the second client get started after all tasks of the previous client are finished. Lets assume all tasks take exactly the same time T to compute. That means that during the first 8 tasks of the first client all engines are at full capacity. During the remaining 2 tasks, 2 of the engines are idle. That leads to an overall computation time of 6T. If the tasks of all connected clients would share a task pool, the computation time would have been 5T.
The text was updated successfully, but these errors were encountered:
I think this is dealt with by setting high water mark, TaskScheduler.hwm, in the ipcontroller_config profile. I encountered this issue with compute-intensive tasks of varying length, and wound up setting it to 2, instead of the default 0, so there would be no latency. (My numbers are 1728 tasks and 96 engines.) However your case would indicate 1.
It seems odd that the default for load balancing is to effectively disable it, but Min explained that he was concerned with latency.
I was not aware of the HWM setting, but HWM=1 would definitely be what I would expect as default behavior. So with HWM set to 1, does it make any difference whether tasks are submitted from a single client or several?
No, the scheduler does not make any decisions based on who submitted each tasks. All tasks are equal, and ZeroMQ fair-queues requests on the incoming socket, so if two tasks submit a bunch of tasks very quickly at the same time, they will be interleaved. But this requires that they really be submitted at the same time.
Consider the following scenario. A simulation is split into 10 independent (parallel) tasks. The simulation has a bunch of parameters. 4 Engines are up and connected to a single controller. Several clients submit the same simulation but with different simulation parameters to the controller via load balanced views.
What seems to happen now is that the controller schedules the 10 tasks of the first client on it's engines. The 10 tasks of the second client get started after all tasks of the previous client are finished. Lets assume all tasks take exactly the same time T to compute. That means that during the first 8 tasks of the first client all engines are at full capacity. During the remaining 2 tasks, 2 of the engines are idle. That leads to an overall computation time of 6T. If the tasks of all connected clients would share a task pool, the computation time would have been 5T.
The text was updated successfully, but these errors were encountered: