Fix mul-mat error for older GPUs #669

bssrdf · 2023-12-28T04:53:18Z

This PR fixed issue 668. The two test cases test-conv1d and test-conv2d passed with this PR on a GTX 1070 with CUDA v12.1. It also fixed problems downstream in other projects which use ggml.

Green-Sky · 2023-12-28T10:34:46Z

src/ggml-cuda.cu

@@ -7615,16 +7615,25 @@ static void ggml_cuda_op_mul_mat_cublas(
        const to_fp32_cuda_t to_fp32_cuda = ggml_get_to_fp32_cuda(GGML_TYPE_F16);
        to_fp32_cuda(dst_f16.get(), dst_dd_i, row_diff*src1_ncols, stream);
    }
-    else {
+    else {        


please fix this trailing whitespace

Cyberhan123 · 2023-12-28T11:22:09Z

src/ggml-cuda.cu

            cublasSgemm(g_cublas_handles[id], CUBLAS_OP_T, CUBLAS_OP_N,
                    row_diff, src1_ncols, ne10,
                    &alpha, src0_ddf_i, ne00,
-                            src1_ddf_i, ne10,
+                            src1_ddf1_i, ne10,
                    &beta,  dst_dd_i,   ldc));


Suggested change

cublasSgemm(g_cublas_handles[id], CUBLAS_OP_T, CUBLAS_OP_N,

row_diff, src1_ncols, ne10,

&alpha, src0_ddf_i, ne00,

src1_ddf_i, ne10,

src1_ddf1_i, ne10,

&beta, dst_dd_i, ldc));

CUBLAS_CHECK(

cublasGemmEx(g_cublas_handles[id], CUBLAS_OP_T, CUBLAS_OP_N,

row_diff, src1_ncols, ne10,

&alpha, src0_ddf_i, CUDA_R_32F, ne00,

src1_ddf1_i, CUDA_R_32F, ne10,

&beta, dst_ddf1, CUDA_R_32F, ldc,

CUBLAS_COMPUTE_32F,

CUBLAS_GEMM_DEFAULT_TENSOR_OP));

@Green-Sky Can you test the modifications I suggested above? I am not a professional CUDA developer, but after my testing, such modifications will greatly reduce the probability of bad images. However, I am not sure whether there is a compatibility problem with cublasGemmEx.

where does the dst_ddf1 come frome?

/build/xxx-source/ggml/src/ggml-cuda.cu(7456): error: identifier "dst_ddf1" is undefined

Sorry it‘s dst_dd_i

I don't think this is necessary.

i cant really see any significant correlation to the issues i observe. Since they look like synchronization issues, a slightly different invocation can accidentally make them go away or less likely.

You can use stable-diffusion.cpp
Will save_tensor_to_file in sample export the tensor? The export fails and succeeds in the case of the same seed. Yes, then export the failed and successful ones at decode.

When I tested, the tensor in the sample method successed and the failed tensor were equal.

But in the case of decoding, they have a large deviation. After I added the correction of cublasGemmEx, the deviation became smaller, but I am more confused as to why such a deviation occurs.

Thanks, @slaren, for the style fix and other updates.
@Cyberhan123, I'll leave it to @slaren and others to decide whether replacing cublasSgemm with cublasGemmEx is a good idea. I am not an expert on CUDA/CUBlas.

ggml-ci

FSSRepo · 2023-12-28T16:32:35Z

@bssrdf i am going to review and test this fix on my GPU, even though it works fine for me, just to ensure there is no impact on the current performance.

bssrdf · 2023-12-28T17:08:53Z

@bssrdf i am going to review and test this fix on my GPU, even though it works fine for me, just to ensure there is no impact on the current performance.

@FSSRepo, sounds good. Thanks for the test.

ggerganov

Thank you for digging into this and resolving the issues. I should have used more GGML_ASSERTs to avoid such kind of issues - will try to do so in the future

bssrdf added 2 commits December 27, 2023 23:31

fixed mul-mat error for old GPUs

076d2b1

merge with master

a1ceca4

bssrdf changed the title ~~Fix mul-mat error for old GPUs~~ Fix mul-mat error for older GPUs Dec 28, 2023

Green-Sky suggested changes Dec 28, 2023

View reviewed changes

Cyberhan123 reviewed Dec 28, 2023

View reviewed changes

style fixes

b59656f

slaren requested a review from ggerganov December 28, 2023 12:28

add mul mat src1 f16 test cases, fix more cases

8f137dd

ggml-ci

slaren force-pushed the fix_mul_mat_cublas_for_cp_lt_70 branch from 55ba78b to 8f137dd Compare December 28, 2023 12:29

slaren mentioned this pull request Dec 28, 2023

Testing : Compare CPU backend with GPU backend ggerganov/whisper.cpp#1692

Closed

ggerganov approved these changes Dec 29, 2023

View reviewed changes

ggerganov merged commit dbd0295 into ggerganov:master Dec 29, 2023
4 checks passed

bssrdf deleted the fix_mul_mat_cublas_for_cp_lt_70 branch December 29, 2023 13:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix mul-mat error for older GPUs #669

Fix mul-mat error for older GPUs #669

bssrdf commented Dec 28, 2023 •

edited

Loading

Green-Sky Dec 28, 2023

Cyberhan123 Dec 28, 2023

Cyberhan123 Dec 28, 2023

Green-Sky Dec 28, 2023

Cyberhan123 Dec 28, 2023

slaren Dec 28, 2023

Green-Sky Dec 28, 2023

Cyberhan123 Dec 28, 2023

Cyberhan123 Dec 28, 2023 •

edited

Loading

Cyberhan123 Dec 28, 2023

bssrdf Dec 28, 2023 •

edited

Loading

FSSRepo commented Dec 28, 2023

bssrdf commented Dec 28, 2023

ggerganov left a comment

Fix mul-mat error for older GPUs #669

Fix mul-mat error for older GPUs #669

Conversation

bssrdf commented Dec 28, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Cyberhan123 Dec 28, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bssrdf Dec 28, 2023 • edited Loading

Choose a reason for hiding this comment

FSSRepo commented Dec 28, 2023

bssrdf commented Dec 28, 2023

ggerganov left a comment

Choose a reason for hiding this comment

bssrdf commented Dec 28, 2023 •

edited

Loading

Cyberhan123 Dec 28, 2023 •

edited

Loading

bssrdf Dec 28, 2023 •

edited

Loading