Reduced performance due to Ray process core pinning #115

g4rg · 2023-11-17T17:55:17Z

When using Ray, one worker process per logical CPU core gets spawned, however, all processes end up bound to the same two cores.

This can be observed with taskset after launching the engine. The example below is on a 16T machine.

$ ps xo '%p %c' | grep ray:: | awk '{print $1;}' | xargs -L1 taskset -cp
pid 23937's current affinity list: 0,8
pid 23938's current affinity list: 0,8
pid 23939's current affinity list: 0,8
pid 23940's current affinity list: 0,8
pid 23941's current affinity list: 0,8
pid 23942's current affinity list: 0,8
pid 23943's current affinity list: 0,8
pid 23944's current affinity list: 0,8
pid 23945's current affinity list: 0,8
pid 23946's current affinity list: 0,8
pid 23947's current affinity list: 0,8
pid 23948's current affinity list: 0,8
pid 23949's current affinity list: 0,8
pid 23951's current affinity list: 0,8
pid 23952's current affinity list: 0,8
pid 24923's current affinity list: 0,8

As a workaround, core affinity can be manually changed after launch, e.g. using taskset -cp <core> <pid> on each worker process.

Example assigning one core per process uniquely:

cpuid=0 ; for pid in $(ps xo '%p %c' | grep ray:: | awk '{print $1;}') ; do taskset -cp $cpuid $pid ; cpuid=$(($cpuid + 1)) ; done

The effect of this on performance is significant.

concurrent requests	avg T/s	avg T/s w/ fix
1	12.45	33.00
4	11.92	28.78
8	11.19	27.85
16	10.13	25.31

Benchmark environment:
4xA100 SXM NVL
aphrodite 0.4.2 openai endpoint
llama2 13b
llmperf default settings

The text was updated successfully, but these errors were encountered:

50h100a · 2023-11-22T03:10:58Z

This is due to an imported package pinning the processor affinity, which Ray then inherits when spawning processes.

One solution would be to use psutil to clear the processor affinity on the main thread after import but before spawning Ray threads.
Another would be to assign each worker its own processor affinity after worker creation.

The former would leave you hopping cores, while the latter is not easily solved in the general case of platforms with unknown physical-vs-virtual core counts. The solution I eventually used was to assign each worker a job that assigned it specific cores, based on deployment details.

50h100a@e47ce80

AlpinDale · 2023-12-30T09:47:56Z

Fixed with #187

We now properly set the affinities at launch.

AlpinDale self-assigned this Nov 17, 2023

g4rg changed the title ~~Dramatically reduced performance when using more than 2 GPUs (Ray)~~ Reduced performance due to Ray process core pinning Nov 21, 2023

KaraKaraWitch mentioned this issue Dec 27, 2023

Set scheduler affinity before initializing ray clusters #186

Closed

AlpinDale linked a pull request Dec 27, 2023 that will close this issue

Set scheduler affinity before initializing ray clusters #186

Closed

KaraKaraWitch mentioned this issue Dec 27, 2023

Set CPU Affinity: Electric Boogaloo V2 #187

Merged

AlpinDale closed this as completed Dec 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduced performance due to Ray process core pinning #115

Reduced performance due to Ray process core pinning #115

g4rg commented Nov 17, 2023 •

edited

50h100a commented Nov 22, 2023 •

edited

AlpinDale commented Dec 30, 2023

Reduced performance due to Ray process core pinning #115

Reduced performance due to Ray process core pinning #115

Comments

g4rg commented Nov 17, 2023 • edited

50h100a commented Nov 22, 2023 • edited

AlpinDale commented Dec 30, 2023

g4rg commented Nov 17, 2023 •

edited

50h100a commented Nov 22, 2023 •

edited