Add missing replacement for rocm_hipblaslt. #4406

copybara-service · 2023-07-21T07:28:29Z

Add missing replacement for rocm_hipblaslt.

Also add a missing dependency.
Remove a duplicated dependency.
Move a dependency back into if_rocm_is_configured block.

akuegel · 2023-07-21T11:44:51Z

@i-chaochen I give up now, I will land the fixes that are obvious (BUILD file fixes + fix for rocm_configure.bzl.oss). I hope you can fix the other part. We don't have presubmits for ROCM, so this is how this could slip through. But I was expecting that you did the testing :)

Also add a missing dependency. Remove a duplicated dependency. Move a dependency back into if_rocm_is_configured block. PiperOrigin-RevId: 549859378

i-chaochen · 2023-07-21T15:40:27Z

@i-chaochen I give up now, I will land the fixes that are obvious (BUILD file fixes + fix for rocm_configure.bzl.oss). I hope you can fix the other part. We don't have presubmits for ROCM, so this is how this could slip through. But I was expecting that you did the testing :)

Thanks @akuegel a lot for pointing it out. Yes, I will work on it and make sure it wouldn't happen again.

@akuegel

Imported from GitHub PR #5911 This is a follow-up PR for these two issues: #4406, #3953 We unified hip/cuda blas-lt APIs by providing a common virtual interface defined in xla/stream_executor/gpu/gpu_blas_lt.h/.cc with implementations in xla/stream_executor/cuda/cuda_blas_lt.h/.cc and xla/stream_executor/rocm/hip_blas_lt.h/.cc, respectively. The main design decision was that we made the class MatmulPlan (originally defined in xla/service/gpu/matmul_utils.h/.cc) **polymorphic** and moved it's interface declaration to gpu_blas_lt.h. There are two reasons for that, namely: 1. MatmulPlan provided a public function **ExecuteOnStream** which was implemented in terms of conditional compulation with macros '#if GOOGLE_CUDA' or '#if TF_HIPBLASLT' in order to integrate library-specific data-types. This function becomes now a part of gpu_blas_lt interface. 2. MatmulPlan contained a library-specific member variable 'plan_' of type 'se::gpu::BlasLt::MatmulPlan' which is basically a plain container of MatmulDesc and several MatrixLayouts. These underlying types are again BLASLT library-specific and are **never** used directly, hence there is no need to expose BlasLt::MatmulDesc and BlasLt::MatrixLayout in the public interface. Besides ExecuteOnStream, the class MatmulPlan also provides a number of overloaded 'DoMatmul' member functions (some of them are template functions) which were extracted as a common part from the original BlasLt implementations. These DoMatmul functions are also required for the oncoming integration of Blas-lt interface into Tensorflow: see tensorflow\core\kernels\matmul_util.h/.cc. We also extracted the library-specific argument type-checks from templated DoMatmul functions and moved them into a virtual function MatmulPlan::ValidateInputs(). The polymorphic class gpu::BlasLt (defined in gpu_blas_lt.h) is responsible for constructing the objects of type MatmulPlan, the rest blas-lt functionality is solely handled by MatmulPlan interface. The instantiations of gpu::BlasLt interface, as before, are defined in xla/stream_executor/cuda/cuda_blas.h and xla/stream_executor/rocm/rocm_blas.h, respectively. We have also tried to compile the code with TF_HIPBLASLT=0 to make sure it also works fine if no hipblas-lt is available. @akuegel: can you perhaps have a look at our implementation ? Copybara import of the project: -- db303e003b79c81b9ff21955ea3f7cd8277ca8bd by Pavel Emeliyanenko <pavel.emeliyanenko@amd.com>: Unifying hip/cuda blas-lt APIs work in progress ongoing work make sure the code runs with TF_HIPBLASLT=0 adaptions for CUDA compile moving BlasLt and related stuff to se::gpu namespace hipblas_lt interface cleanup adapted the last blas-lt inteface changes for CUDA -- c4a37302b8448492939f1cb61722f60eb68ad9d1 by Pavel Emeliyanenko <pavel.emeliyanenko@amd.com>: protected code by TF_HIPBLASLT macro to make sure code builds without hipblas-lt too -- d6638c233cac12a800b34d913dd63def54550612 by Pavel Emeliyanenko <pavel.emeliyanenko@amd.com>: resolving conflicts -- d20c7298516a72d7efd83158799eb7428e44d394 by Pavel Emeliyanenko <pavel.emeliyanenko@amd.com>: appliyng reviewer changes Merging this change closes #5911 FUTURE_COPYBARA_INTEGRATE_REVIEW=#5911 from ROCmSoftwarePlatform:unify_blaslt_APIs_v2 d20c7298516a72d7efd83158799eb7428e44d394 PiperOrigin-RevId: 571312652

@akuegel

Imported from GitHub PR #5911 This is a follow-up PR for these two issues: #4406, #3953 We unified hip/cuda blas-lt APIs by providing a common virtual interface defined in xla/stream_executor/gpu/gpu_blas_lt.h/.cc with implementations in xla/stream_executor/cuda/cuda_blas_lt.h/.cc and xla/stream_executor/rocm/hip_blas_lt.h/.cc, respectively. The main design decision was that we made the class MatmulPlan (originally defined in xla/service/gpu/matmul_utils.h/.cc) **polymorphic** and moved it's interface declaration to gpu_blas_lt.h. There are two reasons for that, namely: 1. MatmulPlan provided a public function **ExecuteOnStream** which was implemented in terms of conditional compulation with macros '#if GOOGLE_CUDA' or '#if TF_HIPBLASLT' in order to integrate library-specific data-types. This function becomes now a part of gpu_blas_lt interface. 2. MatmulPlan contained a library-specific member variable 'plan_' of type 'se::gpu::BlasLt::MatmulPlan' which is basically a plain container of MatmulDesc and several MatrixLayouts. These underlying types are again BLASLT library-specific and are **never** used directly, hence there is no need to expose BlasLt::MatmulDesc and BlasLt::MatrixLayout in the public interface. Besides ExecuteOnStream, the class MatmulPlan also provides a number of overloaded 'DoMatmul' member functions (some of them are template functions) which were extracted as a common part from the original BlasLt implementations. These DoMatmul functions are also required for the oncoming integration of Blas-lt interface into Tensorflow: see tensorflow\core\kernels\matmul_util.h/.cc. We also extracted the library-specific argument type-checks from templated DoMatmul functions and moved them into a virtual function MatmulPlan::ValidateInputs(). The polymorphic class gpu::BlasLt (defined in gpu_blas_lt.h) is responsible for constructing the objects of type MatmulPlan, the rest blas-lt functionality is solely handled by MatmulPlan interface. The instantiations of gpu::BlasLt interface, as before, are defined in xla/stream_executor/cuda/cuda_blas.h and xla/stream_executor/rocm/rocm_blas.h, respectively. We have also tried to compile the code with TF_HIPBLASLT=0 to make sure it also works fine if no hipblas-lt is available. @akuegel: can you perhaps have a look at our implementation ? Copybara import of the project: -- daea33c by Pavel Emeliyanenko <pavel.emeliyanenko@amd.com>: Unifying hip/cuda blas-lt APIs work in progress ongoing work make sure the code runs with TF_HIPBLASLT=0 adaptions for CUDA compile moving BlasLt and related stuff to se::gpu namespace hipblas_lt interface cleanup adapted the last blas-lt inteface changes for CUDA -- b4ff019 by Pavel Emeliyanenko <pavel.emeliyanenko@amd.com>: protected code by TF_HIPBLASLT macro to make sure code builds without hipblas-lt too -- 7248f69 by Pavel Emeliyanenko <pavel.emeliyanenko@amd.com>: resolving conflicts -- d48e6ee by Pavel Emeliyanenko <pavel.emeliyanenko@amd.com>: appliyng reviewer changes -- 1d7cc54 by Pavel Emeliyanenko <pavel.emeliyanenko@amd.com>: rebased and adapted API for TF blas-lt part Merging this change closes #5911 COPYBARA_INTEGRATE_REVIEW=#5911 from ROCmSoftwarePlatform:unify_blaslt_APIs_v2 1d7cc54 PiperOrigin-RevId: 573136621

@akuegel

Imported from GitHub PR openxla/xla#5911 This is a follow-up PR for these two issues: openxla/xla#4406, openxla/xla#3953 We unified hip/cuda blas-lt APIs by providing a common virtual interface defined in xla/stream_executor/gpu/gpu_blas_lt.h/.cc with implementations in xla/stream_executor/cuda/cuda_blas_lt.h/.cc and xla/stream_executor/rocm/hip_blas_lt.h/.cc, respectively. The main design decision was that we made the class MatmulPlan (originally defined in xla/service/gpu/matmul_utils.h/.cc) **polymorphic** and moved it's interface declaration to gpu_blas_lt.h. There are two reasons for that, namely: 1. MatmulPlan provided a public function **ExecuteOnStream** which was implemented in terms of conditional compulation with macros '#if GOOGLE_CUDA' or '#if TF_HIPBLASLT' in order to integrate library-specific data-types. This function becomes now a part of gpu_blas_lt interface. 2. MatmulPlan contained a library-specific member variable 'plan_' of type 'se::gpu::BlasLt::MatmulPlan' which is basically a plain container of MatmulDesc and several MatrixLayouts. These underlying types are again BLASLT library-specific and are **never** used directly, hence there is no need to expose BlasLt::MatmulDesc and BlasLt::MatrixLayout in the public interface. Besides ExecuteOnStream, the class MatmulPlan also provides a number of overloaded 'DoMatmul' member functions (some of them are template functions) which were extracted as a common part from the original BlasLt implementations. These DoMatmul functions are also required for the oncoming integration of Blas-lt interface into Tensorflow: see tensorflow\core\kernels\matmul_util.h/.cc. We also extracted the library-specific argument type-checks from templated DoMatmul functions and moved them into a virtual function MatmulPlan::ValidateInputs(). The polymorphic class gpu::BlasLt (defined in gpu_blas_lt.h) is responsible for constructing the objects of type MatmulPlan, the rest blas-lt functionality is solely handled by MatmulPlan interface. The instantiations of gpu::BlasLt interface, as before, are defined in xla/stream_executor/cuda/cuda_blas.h and xla/stream_executor/rocm/rocm_blas.h, respectively. We have also tried to compile the code with TF_HIPBLASLT=0 to make sure it also works fine if no hipblas-lt is available. @akuegel: can you perhaps have a look at our implementation ? Copybara import of the project: -- daea33c73b142340481360d020bab10c4d64c79d by Pavel Emeliyanenko <pavel.emeliyanenko@amd.com>: Unifying hip/cuda blas-lt APIs work in progress ongoing work make sure the code runs with TF_HIPBLASLT=0 adaptions for CUDA compile moving BlasLt and related stuff to se::gpu namespace hipblas_lt interface cleanup adapted the last blas-lt inteface changes for CUDA -- b4ff019b278dfc93c93f17eaab2eccd772852cd3 by Pavel Emeliyanenko <pavel.emeliyanenko@amd.com>: protected code by TF_HIPBLASLT macro to make sure code builds without hipblas-lt too -- 7248f692e0ed1262f11ea8c370c0771e9539b342 by Pavel Emeliyanenko <pavel.emeliyanenko@amd.com>: resolving conflicts -- d48e6ee7bd320de421b7c870af744d1bca160d8b by Pavel Emeliyanenko <pavel.emeliyanenko@amd.com>: appliyng reviewer changes -- 1d7cc54d3ce1df1ba6f798c659b4f87292425869 by Pavel Emeliyanenko <pavel.emeliyanenko@amd.com>: rebased and adapted API for TF blas-lt part Merging this change closes #5911 PiperOrigin-RevId: 573136621

copybara-service bot assigned akuegel Jul 21, 2023

copybara-service bot force-pushed the test_549859378 branch 10 times, most recently from c68d777 to b98fc1d Compare July 21, 2023 11:30

akuegel mentioned this pull request Jul 21, 2023

[ROCm] init hipblas_lt in XLA #3953

Closed

Add missing replacement for rocm_hipblaslt.

8758dd1

Also add a missing dependency. Remove a duplicated dependency. Move a dependency back into if_rocm_is_configured block. PiperOrigin-RevId: 549859378

copybara-service bot force-pushed the test_549859378 branch from b98fc1d to 8758dd1 Compare July 21, 2023 13:34

pemeliya mentioned this pull request Sep 26, 2023

[ROCm] Unifying hip/cuda blas-lt APIs #5911

Closed

copybara-service bot mentioned this pull request Oct 6, 2023

PR #5911: [ROCm] Unifying hip/cuda blas-lt APIs #6132

Closed

copybara-service bot closed this Oct 16, 2023

copybara-service bot deleted the test_549859378 branch October 16, 2023 12:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add missing replacement for rocm_hipblaslt. #4406

Add missing replacement for rocm_hipblaslt. #4406

copybara-service bot commented Jul 21, 2023 •

edited

Loading

akuegel commented Jul 21, 2023

i-chaochen commented Jul 21, 2023 •

edited

Loading

Add missing replacement for rocm_hipblaslt. #4406

Add missing replacement for rocm_hipblaslt. #4406

Conversation

copybara-service bot commented Jul 21, 2023 • edited Loading

akuegel commented Jul 21, 2023

i-chaochen commented Jul 21, 2023 • edited Loading

copybara-service bot commented Jul 21, 2023 •

edited

Loading

i-chaochen commented Jul 21, 2023 •

edited

Loading