[cuBLAS] Add cublas_gemm_batched and use cublasSetStream to set stream to the current stream in all cublas API calls #423

yudi0201 · 2024-01-31T18:47:45Z

No description provided.

yaoyaoding

It looks good to me.

yaoyaoding · 2024-02-01T03:30:07Z

src/hidet/runtime/cuda/cuda.cpp

@@ -79,3 +88,18 @@ DLL void hidet_cuda_set_device(int device) {
    lazy_load_cuda_runtime();
    CHECK_CUDA(cudaSetDevice(device));
 }
+
+DLL void hidet_cuda_malloc(void **devPtr, size_t size) {


Consider directly return the allocated memory address like

DLL void* hidet_cuda_malloc(size_t size) { ... }

Like hidet_cuda_get_device(...).

…blas API calls

yaoyaoding

Thanks @yudi0201!

yaoyaoding · 2024-02-08T16:01:25Z

src/hidet/runtime/cuda/cublas.cpp

+    // Allocate device memory
+    // first use synchronous versions of malloc and memcpy, later switch to async versions
+    if (cur_device_ptr_size != 0 && b > cur_device_ptr_size) {
+        hidet_cuda_free((void *)ptr_a_device);


Why not hidet_cuda_free_async?

The following logic is more readable to me, just as a reference.

if(b > cur_device_ptr_size) { if(cur_device_ptr_size > 0) { free the three ptrs } alloc three ptrs }

Thanks for the suggestions! I'll modify these in the next revision.

yaoyaoding · 2024-02-08T16:09:12Z

src/hidet/runtime/cuda/cuda.cpp

+
+DLL void* hidet_cuda_malloc(size_t size) {
+    lazy_load_cuda_runtime();
+    void* devPtr = malloc(sizeof(void*));


Suggested change

void* devPtr = malloc(sizeof(void*));

void* devPtr;

We do not need to allocate a memory region.

Maybe I'm missing something, but wouldn't doing this result in devPtr being created on the stack, and then we'd be returning a local stack variable?

When we call

cudaMallocAsync(&devPtr, ...)

we are passing the address of the pointer to the cuda api function, which will update the pointer value. The pointer is a stack variable and will be valid during we calling the cuda api function. We returns the "value" of the pointer instead of the "address" of the pointer to the callee of hidet_cuda_malloc, this is fine.

Got it. Thanks!

yaoyaoding · 2024-02-08T16:09:30Z

src/hidet/runtime/cuda/cuda.cpp

+
+DLL void* hidet_cuda_malloc_async(size_t size, cudaStream_t stream) {
+    lazy_load_cuda_runtime();
+    void* devPtr = malloc(sizeof(void*));


Same as the previous one.

yaoyaoding · 2024-02-12T04:16:45Z

Thanks @yudi0201 !

yudi0201 force-pushed the cublas_gemm_batched branch from e9d69bb to 4a0007f Compare January 31, 2024 18:54

[cublas] Add cublas_gemm_batched

3d58fbd

yudi0201 force-pushed the cublas_gemm_batched branch from 4a0007f to 3d58fbd Compare January 31, 2024 19:00

yaoyaoding approved these changes Feb 1, 2024

View reviewed changes

yudi0201 force-pushed the cublas_gemm_batched branch from 69abe16 to ecd1461 Compare February 6, 2024 23:31

Use cudaMallocAsync and cudaMemcpyAsync for cublas_batched_gemm

a446270

yudi0201 force-pushed the cublas_gemm_batched branch from ecd1461 to a446270 Compare February 6, 2024 23:32

Use cublasSetStream to set the stream to the current stream in all cu…

e4ec611

…blas API calls

yudi0201 changed the title ~~[cublas] Add cublas_gemm_batched~~ [cublas] Add cublas_gemm_batched and use cublasSetStream to set stream to the current stream in all cublas API calls Feb 7, 2024

yaoyaoding changed the title ~~[cublas] Add cublas_gemm_batched and use cublasSetStream to set stream to the current stream in all cublas API calls~~ [cuBLAS] Add cublas_gemm_batched and use cublasSetStream to set stream to the current stream in all cublas API calls Feb 8, 2024

yaoyaoding reviewed Feb 8, 2024

View reviewed changes

[cublas] Use cuda_free_async

87bf01d

yaoyaoding merged commit 5f76caf into main Feb 12, 2024
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[cuBLAS] Add cublas_gemm_batched and use cublasSetStream to set stream to the current stream in all cublas API calls #423

[cuBLAS] Add cublas_gemm_batched and use cublasSetStream to set stream to the current stream in all cublas API calls #423

yudi0201 commented Jan 31, 2024

yaoyaoding left a comment

yaoyaoding Feb 1, 2024

yaoyaoding Feb 1, 2024

yaoyaoding left a comment

yaoyaoding Feb 8, 2024

yaoyaoding Feb 8, 2024

yudi0201 Feb 8, 2024

yaoyaoding Feb 8, 2024

yudi0201 Feb 8, 2024

yaoyaoding Feb 10, 2024

yudi0201 Feb 10, 2024

yaoyaoding Feb 8, 2024

yaoyaoding commented Feb 12, 2024

[cuBLAS] Add cublas_gemm_batched and use cublasSetStream to set stream to the current stream in all cublas API calls #423

[cuBLAS] Add cublas_gemm_batched and use cublasSetStream to set stream to the current stream in all cublas API calls #423

Conversation

yudi0201 commented Jan 31, 2024

yaoyaoding left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yaoyaoding left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yaoyaoding commented Feb 12, 2024