Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[5.3.X][CUDA >= 11.0] hipblasGemmEx doesn't fully match cublasGemmEx #529

Closed
emankov opened this issue Sep 24, 2022 · 2 comments · Fixed by ROCm/HIPIFY#1066
Closed

[5.3.X][CUDA >= 11.0] hipblasGemmEx doesn't fully match cublasGemmEx #529

emankov opened this issue Sep 24, 2022 · 2 comments · Fixed by ROCm/HIPIFY#1066
Assignees

Comments

@emankov
Copy link

emankov commented Sep 24, 2022

The problem is with the penultimate argument hipblasDatatype_t computeType, which doesn't match to cublasComputeType_t computeType. cublasComputeType_t appeared with CUDA 11.0. cublasGemmEx used cudaDataType instead of cublasComputeType_t for its penultimate argument starting with CUDA 8.0 and till CUDA 11.0.

HIPBLAS_EXPORT hipblasStatus_t hipblasGemmEx(hipblasHandle_t    handle,
                                             hipblasOperation_t transA,
                                             hipblasOperation_t transB,
                                             int                m,
                                             int                n,
                                             int                k,
                                             const void*        alpha,
                                             const void*        A,
                                             hipblasDatatype_t  aType,
                                             int                lda,
                                             const void*        B,
                                             hipblasDatatype_t  bType,
                                             int                ldb,
                                             const void*        beta,
                                             void*              C,
                                             hipblasDatatype_t  cType,
                                             int                ldc,
                                             hipblasDatatype_t  computeType,
                                             hipblasGemmAlgo_t  algo);
CUBLASAPI cublasStatus_t CUBLASWINAPI cublasGemmEx(cublasHandle_t handle,
                                                   cublasOperation_t transa,
                                                   cublasOperation_t transb,
                                                   int m,
                                                   int n,
                                                   int k,
                                                   const void* alpha, /* host or device pointer */
                                                   const void* A,
                                                   cudaDataType Atype,
                                                   int lda,
                                                   const void* B,
                                                   cudaDataType Btype,
                                                   int ldb,
                                                   const void* beta, /* host or device pointer */
                                                   void* C,
                                                   cudaDataType Ctype,
                                                   int ldc,
                                                   cublasComputeType_t computeType,
                                                   cublasGemmAlgo_t algo);
typedef enum
{
    HIPBLAS_R_16F = 150, /**< 16 bit floating point, real */
    HIPBLAS_R_32F = 151, /**< 32 bit floating point, real */
    HIPBLAS_R_64F = 152, /**< 64 bit floating point, real */
    HIPBLAS_C_16F = 153, /**< 16 bit floating point, complex */
    HIPBLAS_C_32F = 154, /**< 32 bit floating point, complex */
    HIPBLAS_C_64F = 155, /**< 64 bit floating point, complex */
    HIPBLAS_R_8I  = 160, /**<  8 bit signed integer, real */
    HIPBLAS_R_8U  = 161, /**<  8 bit unsigned integer, real */
    HIPBLAS_R_32I = 162, /**< 32 bit signed integer, real */
    HIPBLAS_R_32U = 163, /**< 32 bit unsigned integer, real */
    HIPBLAS_C_8I  = 164, /**<  8 bit signed integer, complex */
    HIPBLAS_C_8U  = 165, /**<  8 bit unsigned integer, complex */
    HIPBLAS_C_32I = 166, /**< 32 bit signed integer, complex */
    HIPBLAS_C_32U = 167, /**< 32 bit unsigned integer, complex */
    HIPBLAS_R_16B = 168, /**< 16 bit bfloat, real */
    HIPBLAS_C_16B = 169, /**< 16 bit bfloat, complex */
} hipblasDatatype_t;
typedef enum {
  CUBLAS_COMPUTE_16F = 64,           /* half - default */
  CUBLAS_COMPUTE_16F_PEDANTIC = 65,  /* half - pedantic */
  CUBLAS_COMPUTE_32F = 68,           /* float - default */
  CUBLAS_COMPUTE_32F_PEDANTIC = 69,  /* float - pedantic */
  CUBLAS_COMPUTE_32F_FAST_16F = 74,  /* float - fast, allows down-converting inputs to half or TF32 */
  CUBLAS_COMPUTE_32F_FAST_16BF = 75, /* float - fast, allows down-converting inputs to bfloat16 or TF32 */
  CUBLAS_COMPUTE_32F_FAST_TF32 = 77, /* float - fast, allows down-converting inputs to TF32 */
  CUBLAS_COMPUTE_64F = 70,           /* double - default */
  CUBLAS_COMPUTE_64F_PEDANTIC = 71,  /* double - pedantic */
  CUBLAS_COMPUTE_32I = 72,           /* signed 32-bit int - default */
  CUBLAS_COMPUTE_32I_PEDANTIC = 73,  /* signed 32-bit int - pedantic */
} cublasComputeType_t;
@emankov emankov changed the title [5.2.X] hipblasGemmEx doesn't fully match cublasGemmEx [5.2.X][CUDA >= 11.0] hipblasGemmEx doesn't fully match cublasGemmEx Sep 24, 2022
@emankov
Copy link
Author

emankov commented Sep 24, 2022

The same goes to:
cublasGemmBatchedEx -> hipblasGemmBatchedEx
cublasGemmStridedBatchedEx -> hipblasGemmStridedBatchedEx

emankov added a commit to emankov/HIPIFY that referenced this issue Sep 24, 2022
+ Added tests for the following BLAS functions: DGMM, GemmEx, GemmBatchedEx, GemmStridedBatchedEx
+ Gem(Batched|StridedBatched)Ex have two different signatures (before CUDA 11.0 and after CUDA 10.2)
+ [Workaround][ROCm/hipBLAS#529]:
  `cublasComputeType_t` -> `hipblasDatatype_t` (instead of yet unsupported `hipblasComputeType_t`)
+ Regenerate and update hipify-perl and docs
@daineAMD
Copy link
Contributor

Hi @emankov,

Thanks for bringing this up. We will likely make a change to include a similar compute_type in the hipblasGemm*Ex interfaces in a future release. I will keep you updated with any progress.

Thanks,
Daine

@daineAMD daineAMD self-assigned this Sep 28, 2022
@emankov emankov changed the title [5.2.X][CUDA >= 11.0] hipblasGemmEx doesn't fully match cublasGemmEx [5.3.X][CUDA >= 11.0] hipblasGemmEx doesn't fully match cublasGemmEx Nov 7, 2022
emankov added a commit to emankov/HIPIFY that referenced this issue Oct 14, 2023
… - `hipblasDatatype_t` -> `hipblasComputeType_t`

[IMP]
+ In hipBLAS 6.0.0, ROCm/hipBLAS#529 is finally fixed, thus HIPIFY can use `hipblasComputeType_t` instead of `hipblasDatatype_t`, where `cublasComputeType_t` is implied

[TODO]
+ Revise all the hipBLAS functions which use `hipblasDatatype_t` instead of `hipblasComputeType_t`
+ Close ROCm/hipBLAS/issues/529 as implemented with the releasing of hipBLAS 6.0.0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants