-
Notifications
You must be signed in to change notification settings - Fork 13.4k
Closed
Labels
enhancementNew feature or requestNew feature or request
Description
Prerequisites
- I am running the latest code. Mention the version if possible as well.
- I carefully followed the README.md.
- I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- I reviewed the Discussions, and have a new and useful enhancement to share.
Feature Description
Hi all! Amazing work on llama.cpp!
I am an engineer from NVIDIA working on NVPL BLAS (BLAS library designed for NVIDIA Grace CPU).
I would like to add NVPL BLAS as a build option in the Makefile and ggml-blas.cpp.
I have found it to provide better performance over GGML for less than 32 threads when using the prompt test from llama-bench from version b3322.
My changes can be found here https://github.com/nicholaiTukanov/llama.cpp/tree/ntukanov/add-nvpl.
Please let me know if there is anything else I need to do to get this approved. Thank you!
| CPU | Model | Model Size [GiB] | Threads | Test | t/s master | t/s nt/nvpl-blas | Speedup |
|---|---|---|---|---|---|---|---|
| Grace C2 | llama 7B Q8_0 | 6.67 | 1 | pp512 | 5.93 | 7.03 | 1.19 |
| Grace C2 | llama 7B Q8_0 | 6.67 | 2 | pp512 | 12.12 | 13.97 | 1.15 |
| Grace C2 | llama 7B Q8_0 | 6.67 | 4 | pp512 | 24.55 | 27.81 | 1.13 |
| Grace C2 | llama 7B Q8_0 | 6.67 | 8 | pp512 | 50.19 | 55.49 | 1.11 |
| Grace C2 | llama 7B Q8_0 | 6.67 | 16 | pp512 | 100.34 | 107.07 | 1.07 |
| Grace C2 | llama 7B Q8_0 | 6.67 | 32 | pp512 | 197.88 | 205.09 | 1.04 |
| Grace C2 | llama 7B Q8_0 | 6.67 | 64 | pp512 | 371.18 | 355.62 | 0.96 |
| Grace C2 | llama 7B Q8_0 | 6.67 | 72 | pp512 | 398.27 | 364.31 | 0.91 |
Motivation
This will provide better prompt performance for aarch64 users. See table in issue.
Possible Implementation
- Add
GGML_NVPLbuild option intoMakefile - Add
NVPL_ENABLE_CBLAScode path intoggml-blas.cpp- Includes
nvpl_blas.hand sets the number of threads for NVPL BLAS usingnvpl_blas_set_num_threads()
- Includes
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request