Skip to content

Adding NVPL BLAS support #8329

@nicholaiTukanov

Description

@nicholaiTukanov

Prerequisites

  • I am running the latest code. Mention the version if possible as well.
  • I carefully followed the README.md.
  • I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • I reviewed the Discussions, and have a new and useful enhancement to share.

Feature Description

Hi all! Amazing work on llama.cpp!

I am an engineer from NVIDIA working on NVPL BLAS (BLAS library designed for NVIDIA Grace CPU).

I would like to add NVPL BLAS as a build option in the Makefile and ggml-blas.cpp.
I have found it to provide better performance over GGML for less than 32 threads when using the prompt test from llama-bench from version b3322.

My changes can be found here https://github.com/nicholaiTukanov/llama.cpp/tree/ntukanov/add-nvpl.
Please let me know if there is anything else I need to do to get this approved. Thank you!

CPU Model Model Size [GiB] Threads Test t/s master t/s nt/nvpl-blas Speedup
Grace C2 llama 7B Q8_0 6.67 1 pp512 5.93 7.03 1.19
Grace C2 llama 7B Q8_0 6.67 2 pp512 12.12 13.97 1.15
Grace C2 llama 7B Q8_0 6.67 4 pp512 24.55 27.81 1.13
Grace C2 llama 7B Q8_0 6.67 8 pp512 50.19 55.49 1.11
Grace C2 llama 7B Q8_0 6.67 16 pp512 100.34 107.07 1.07
Grace C2 llama 7B Q8_0 6.67 32 pp512 197.88 205.09 1.04
Grace C2 llama 7B Q8_0 6.67 64 pp512 371.18 355.62 0.96
Grace C2 llama 7B Q8_0 6.67 72 pp512 398.27 364.31 0.91

Motivation

This will provide better prompt performance for aarch64 users. See table in issue.

Possible Implementation

  • Add GGML_NVPL build option into Makefile
  • Add NVPL_ENABLE_CBLAS code path into ggml-blas.cpp
    • Includes nvpl_blas.h and sets the number of threads for NVPL BLAS using nvpl_blas_set_num_threads()

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions