Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduced performance due to Ray process core pinning #115

Closed
g4rg opened this issue Nov 17, 2023 · 2 comments
Closed

Reduced performance due to Ray process core pinning #115

g4rg opened this issue Nov 17, 2023 · 2 comments
Assignees

Comments

@g4rg
Copy link
Contributor

g4rg commented Nov 17, 2023

When using Ray, one worker process per logical CPU core gets spawned, however, all processes end up bound to the same two cores.

This can be observed with taskset after launching the engine. The example below is on a 16T machine.

$ ps xo '%p %c' | grep ray:: | awk '{print $1;}' | xargs -L1 taskset -cp
pid 23937's current affinity list: 0,8
pid 23938's current affinity list: 0,8
pid 23939's current affinity list: 0,8
pid 23940's current affinity list: 0,8
pid 23941's current affinity list: 0,8
pid 23942's current affinity list: 0,8
pid 23943's current affinity list: 0,8
pid 23944's current affinity list: 0,8
pid 23945's current affinity list: 0,8
pid 23946's current affinity list: 0,8
pid 23947's current affinity list: 0,8
pid 23948's current affinity list: 0,8
pid 23949's current affinity list: 0,8
pid 23951's current affinity list: 0,8
pid 23952's current affinity list: 0,8
pid 24923's current affinity list: 0,8

As a workaround, core affinity can be manually changed after launch, e.g. using taskset -cp <core> <pid> on each worker process.

Example assigning one core per process uniquely:

cpuid=0 ; for pid in $(ps xo '%p %c' | grep ray:: | awk '{print $1;}') ; do taskset -cp $cpuid $pid ; cpuid=$(($cpuid + 1)) ; done

The effect of this on performance is significant.

concurrent requests avg T/s avg T/s w/ fix
1 12.45 33.00
4 11.92 28.78
8 11.19 27.85
16 10.13 25.31

Benchmark environment:
4xA100 SXM NVL
aphrodite 0.4.2 openai endpoint
llama2 13b
llmperf default settings

@AlpinDale AlpinDale self-assigned this Nov 17, 2023
@g4rg g4rg changed the title Dramatically reduced performance when using more than 2 GPUs (Ray) Reduced performance due to Ray process core pinning Nov 21, 2023
@50h100a
Copy link
Collaborator

50h100a commented Nov 22, 2023

This is due to an imported package pinning the processor affinity, which Ray then inherits when spawning processes.

One solution would be to use psutil to clear the processor affinity on the main thread after import but before spawning Ray threads.
Another would be to assign each worker its own processor affinity after worker creation.

The former would leave you hopping cores, while the latter is not easily solved in the general case of platforms with unknown physical-vs-virtual core counts. The solution I eventually used was to assign each worker a job that assigned it specific cores, based on deployment details.

50h100a@e47ce80

@AlpinDale
Copy link
Member

Fixed with #187

We now properly set the affinities at launch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants