Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cuBLAS error 15 at ggml-cuda.cu:7548: the requested functionality is not supported #1587

Closed
themanyone opened this issue Dec 3, 2023 · 8 comments
Labels
bug Something isn't working

Comments

@themanyone
Copy link

The cuBLAS build compiles but does not work.

It seems to be related to issue 1447. But when I run the executable, I get a different error.

system_info: n_threads = 4 / 8 | AVX = 1 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | METAL = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | CUDA = 1 | COREML = 0 | OPENVINO = 0 | 

main: processing './samples/jfk.wav' (176000 samples, 11.0 sec), 4 threads, 1 processors, 5 beams + best of 5, lang = en, task = transcribe, timestamps = 1 ...

cuBLAS error 15 at ggml-cuda.cu:7548: the requested functionality is not supported
current device: 0

I traced it to the following CUBLAS_CHECK. I can probably just comment it out in the code. Will try that later. I got other stu** to do.

    if (r2 == 1 && r3 == 1 && src0->nb[2]*src0->ne[2] == src0->nb[3] && src1->nb[2]*src1->ne
[2] == src1->nb[3]) {
        // there is no broadcast and src0, src1 are contiguous across dims 2, 3
        // use cublasGemmStridedBatchedEx
        CUBLAS_CHECK(
        cublasGemmStridedBatchedEx(g_cublas_handles[id], CUBLAS_OP_T, CUBLAS_OP_N,
                ne01, ne11, ne10,
                &alpha_f16, (const char *) src0_as_f16, CUDA_R_16F, nb01/sizeof(half),  src0
->nb[2]/sizeof(half),  // strideA
                            (const char *) src1_as_f16, CUDA_R_16F, nb11/sizeof(float), src1
->nb[2]/sizeof(float), // strideB
                &beta_f16,  (	   char *)     dst_f16, CUDA_R_16F, ne01,		 dst
->nb[2]/sizeof(float), // strideC
                ne12*ne13,
                CUBLAS_COMPUTE_16F,
                CUBLAS_GEMM_DEFAULT_TENSOR_OP));
    } else {
        // use cublasGemmBatchedEx
        const int ne23 = ne12*ne13;

Supplementary system info.

nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Fri_Nov__3_17:16:49_PDT_2023
Cuda compilation tools, release 12.3, V12.3.103
Build cuda_12.3.r12.3/compiler.33492891_0

nvidia-smi
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.23.08              Driver Version: 545.23.08    CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Quadro M3000M                  Off | 00000000:01:00.0  On |                  N/A |
| N/A   66C    P0              31W /  75W |   1356MiB /  4096MiB |     95%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      1241      G   /usr/libexec/Xorg                           152MiB |
|    0   N/A  N/A      2206      G   /usr/lib64/firefox/firefox                  209MiB |
|    0   N/A  N/A     57120      C   python                                      985MiB |
+---------------------------------------------------------------------------------------+
@bobqianic
Copy link
Collaborator

CublasGemmStridedBatchedEx requires a GPU architecture with capabilities of 5.0 or higher. It's strange, though, because the Quadro M3000M has a compute capacity of 5.0, so this error shouldn't be occurring.

image

@bobqianic bobqianic added the bug Something isn't working label Dec 3, 2023
@themanyone
Copy link
Author

Youchat has this to say re. cuBLAS Error Code 15

The error code 15 in cuBLAS is associated with the CUBLAS_STATUS_NOT_INITIALIZED error. This error typically occurs when attempting to use cuBLAS without initializing it properly. Here are some snippets from the search results that provide information about this error:

From Source [1]](https://github.com/PaddlePaddle/PaddleOCR/issues/9084), an OSError: (External) CUBLAS error(15) is mentioned, with a hint to search for the error code(15) on the NVIDIA cuBLAS documentation website.
Source also mentions the CUBLAS_STATUS_NOT_INITIALIZED error in a comment related to failed to create cublas handle.

Based on the provided search snippets, it is clear that the error code 15 in cuBLAS is related to the CUBLAS_STATUS_NOT_INITIALIZED error, indicating a failure to initialize cuBLAS properly.

@themanyone
Copy link
Author

themanyone commented Dec 3, 2023

If I run with ./main -ng flag, it works. And...strangely. It is much faster and does use some of the GPU.
About 132Mb, as shown by nvidia-smi.

Verified again. Running cuBLAS-enabled version with -ng flag is indeed over 5x faster than then one compiled without cuBLAS support.

@themanyone
Copy link
Author

themanyone commented Dec 5, 2023

I neglected to mention that I had to modify Makefile

NVCCFLAGS = -allow-unsupported-compiler ...

Because of this:

/usr/local/cuda/include/crt/host_config.h:143:2: error: #error -- unsupported GNU version! gcc versions later than 12 are not supported! The nvcc flag '-allow-unsupported-compiler' can be used to override this version check; however, using an unsupported host compiler may cause compilation failure or incorrect run time execution. Use at your own risk.
  143 | #error -- unsupported GNU version! gcc versions later than 12 are not supported! The nvcc flag '-allow-unsupported-compiler' can be used to override this version check; however, using an unsupported host compiler may cause compilation failure or incorrect run time execution. Use at your own risk.

And also, I was using CUDA_ARCH_FLAG=compute_50 due to arch=native not working in that build. I just did a git pull however, and this change is no longer required.

The bug is still present in the latest pull.
cuBLAS error 15 at ggml-cuda.cu:7548: the requested functionality is not supported
current device: 0

The -ng flag still works around the bug like before.

@pjuhasz
Copy link

pjuhasz commented Dec 27, 2023

I have the exact same issue: the ./main program crashes with same error message (expect now it refers to ggml-cuda.cu:8456 with git version 37a70).

I have a Quadro M2000M in a Thinkpad P50.

I can also confirm that the same cuBLAS-enabled executable does not crash with the -ng switch, and that it uses the GPU to some extent and it is faster than the regular CPU-only binary (except on my machine the speedup is only 2-3x).

@pjuhasz
Copy link

pjuhasz commented Dec 27, 2023

Possibly related: ggerganov/llama.cpp#4395

@bobqianic
Copy link
Collaborator

I can also confirm that the same cuBLAS-enabled executable does not crash with the -ng switch, and that it uses the GPU to some extent and it is faster than the regular CPU-only binary (except on my machine the speedup is only 2-3x).

See #1688 (comment)

@themanyone
Copy link
Author

The error, as well as the need for -ng workaround, are fixed as of release : v1.5.3. Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants