Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using multiple zAIU and multiple CPU threads for operation acceleration #2497

Open
AlexandreEichenberger opened this issue Sep 12, 2023 · 3 comments
Assignees

Comments

@AlexandreEichenberger
Copy link
Collaborator

Operations on the CPU are accelerated ultimately using the omp dialect in MLIR. We exploit it using the krnl.parallel which in turn activate the affine.parallel and scf.parallel or scf.forall. As if now, this scheme is not yet operational but support for OpenMP is making good progress for the LinuxOnZ platform.

Right now, the multiple zAIU operations are launched in parallel using the async dialect. This usage pattern is fairly close to the OpenMP omp parallel section scheme.

Right now, the affine/scf dialect parallel construct don't have the concept of thread affinity, which we need for our purpose, as threads for zAIU need to be assigned to cores that access different zAIU, whereas thread for CPU operations that tightly share data would benefit from being assigned to core that share the same cache hierarchy. I have opened a dialog with the MLIR community to include thread affinity within the high-level dialects parallel constructs. https://discourse.llvm.org/t/thread-affinity-in-affine-parallel/73386

Ultimately, the use of multiple threads for zAIU parallelism and CPU thread level parallelism should use the same underlying mechanism, as otherwise we have two software component each requesting threads that don't know about each other.

If we continue to use async for the coarse grain parallelism, then we should ensure that async get maps to OpenMP threads. Or if that is not the case, we should eventually migrate the coarse grain parallelism of zAIUs to OpenMP constructs.

The goal of this issue is to raise awareness to the need to use a common framework for all our parallel thread needs.

@chenqiny
Copy link

@AlexandreEichenberger @tungld this feature will be very impormant to LLMs.
With llama.cpp, parallel execution accelerates tokens generation speed significantly.

@AlexandreEichenberger
Copy link
Collaborator Author

yes, this feature has become a high priority for us.

@robben225
Copy link

robben225 commented Nov 28, 2023

@AlexandreEichenberger How to specify the number of threads when using omp in onnx-mlir ( --parallel )?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants