Using multiple zAIU and multiple CPU threads for operation acceleration #2497

AlexandreEichenberger · 2023-09-12T21:20:17Z

Operations on the CPU are accelerated ultimately using the omp dialect in MLIR. We exploit it using the krnl.parallel which in turn activate the affine.parallel and scf.parallel or scf.forall. As if now, this scheme is not yet operational but support for OpenMP is making good progress for the LinuxOnZ platform.

Right now, the multiple zAIU operations are launched in parallel using the async dialect. This usage pattern is fairly close to the OpenMP omp parallel section scheme.

Right now, the affine/scf dialect parallel construct don't have the concept of thread affinity, which we need for our purpose, as threads for zAIU need to be assigned to cores that access different zAIU, whereas thread for CPU operations that tightly share data would benefit from being assigned to core that share the same cache hierarchy. I have opened a dialog with the MLIR community to include thread affinity within the high-level dialects parallel constructs. https://discourse.llvm.org/t/thread-affinity-in-affine-parallel/73386

Ultimately, the use of multiple threads for zAIU parallelism and CPU thread level parallelism should use the same underlying mechanism, as otherwise we have two software component each requesting threads that don't know about each other.

If we continue to use async for the coarse grain parallelism, then we should ensure that async get maps to OpenMP threads. Or if that is not the case, we should eventually migrate the coarse grain parallelism of zAIUs to OpenMP constructs.

The goal of this issue is to raise awareness to the need to use a common framework for all our parallel thread needs.

The text was updated successfully, but these errors were encountered:

chenqiny · 2023-10-26T06:33:20Z

@AlexandreEichenberger @tungld this feature will be very impormant to LLMs.
With llama.cpp, parallel execution accelerates tokens generation speed significantly.

AlexandreEichenberger · 2023-10-26T13:51:30Z

yes, this feature has become a high priority for us.

robben225 · 2023-11-28T02:08:59Z

@AlexandreEichenberger How to specify the number of threads when using omp in onnx-mlir ( --parallel )?

AlexandreEichenberger assigned tungld, negiyas, imaihal and chentong319 Sep 12, 2023

imaihal mentioned this issue May 1, 2024

Operator-level paralllization with OpenMP (Coase-grained parllelism) #2811

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using multiple zAIU and multiple CPU threads for operation acceleration #2497

Using multiple zAIU and multiple CPU threads for operation acceleration #2497

AlexandreEichenberger commented Sep 12, 2023

chenqiny commented Oct 26, 2023

AlexandreEichenberger commented Oct 26, 2023

robben225 commented Nov 28, 2023 •

edited

Loading

Using multiple zAIU and multiple CPU threads for operation acceleration #2497

Using multiple zAIU and multiple CPU threads for operation acceleration #2497

Comments

AlexandreEichenberger commented Sep 12, 2023

chenqiny commented Oct 26, 2023

AlexandreEichenberger commented Oct 26, 2023

robben225 commented Nov 28, 2023 • edited Loading

robben225 commented Nov 28, 2023 •

edited

Loading