You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
g4rg
changed the title
Dramatically reduced performance when using more than 2 GPUs (Ray)
Reduced performance due to Ray process core pinning
Nov 21, 2023
This is due to an imported package pinning the processor affinity, which Ray then inherits when spawning processes.
One solution would be to use psutil to clear the processor affinity on the main thread after import but before spawning Ray threads.
Another would be to assign each worker its own processor affinity after worker creation.
The former would leave you hopping cores, while the latter is not easily solved in the general case of platforms with unknown physical-vs-virtual core counts. The solution I eventually used was to assign each worker a job that assigned it specific cores, based on deployment details.
When using Ray, one worker process per logical CPU core gets spawned, however, all processes end up bound to the same two cores.
This can be observed with
taskset
after launching the engine. The example below is on a 16T machine.As a workaround, core affinity can be manually changed after launch, e.g. using
taskset -cp <core> <pid>
on each worker process.Example assigning one core per process uniquely:
The effect of this on performance is significant.
Benchmark environment:
4xA100 SXM NVL
aphrodite 0.4.2 openai endpoint
llama2 13b
llmperf default settings
The text was updated successfully, but these errors were encountered: