-
Notifications
You must be signed in to change notification settings - Fork 434
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add missing replacement for rocm_hipblaslt. #4406
Conversation
c68d777
to
b98fc1d
Compare
@i-chaochen I give up now, I will land the fixes that are obvious (BUILD file fixes + fix for rocm_configure.bzl.oss). I hope you can fix the other part. We don't have presubmits for ROCM, so this is how this could slip through. But I was expecting that you did the testing :) |
Also add a missing dependency. Remove a duplicated dependency. Move a dependency back into if_rocm_is_configured block. PiperOrigin-RevId: 549859378
b98fc1d
to
8758dd1
Compare
Thanks @akuegel a lot for pointing it out. Yes, I will work on it and make sure it wouldn't happen again. |
Imported from GitHub PR #5911 This is a follow-up PR for these two issues: #4406, #3953 We unified hip/cuda blas-lt APIs by providing a common virtual interface defined in xla/stream_executor/gpu/gpu_blas_lt.h/.cc with implementations in xla/stream_executor/cuda/cuda_blas_lt.h/.cc and xla/stream_executor/rocm/hip_blas_lt.h/.cc, respectively. The main design decision was that we made the class MatmulPlan (originally defined in xla/service/gpu/matmul_utils.h/.cc) **polymorphic** and moved it's interface declaration to gpu_blas_lt.h. There are two reasons for that, namely: 1. MatmulPlan provided a public function **ExecuteOnStream** which was implemented in terms of conditional compulation with macros '#if GOOGLE_CUDA' or '#if TF_HIPBLASLT' in order to integrate library-specific data-types. This function becomes now a part of gpu_blas_lt interface. 2. MatmulPlan contained a library-specific member variable 'plan_' of type 'se::gpu::BlasLt::MatmulPlan' which is basically a plain container of MatmulDesc and several MatrixLayouts. These underlying types are again BLASLT library-specific and are **never** used directly, hence there is no need to expose BlasLt::MatmulDesc and BlasLt::MatrixLayout in the public interface. Besides ExecuteOnStream, the class MatmulPlan also provides a number of overloaded 'DoMatmul' member functions (some of them are template functions) which were extracted as a common part from the original BlasLt implementations. These DoMatmul functions are also required for the oncoming integration of Blas-lt interface into Tensorflow: see tensorflow\core\kernels\matmul_util.h/.cc. We also extracted the library-specific argument type-checks from templated DoMatmul functions and moved them into a virtual function MatmulPlan::ValidateInputs(). The polymorphic class gpu::BlasLt (defined in gpu_blas_lt.h) is responsible for constructing the objects of type MatmulPlan, the rest blas-lt functionality is solely handled by MatmulPlan interface. The instantiations of gpu::BlasLt interface, as before, are defined in xla/stream_executor/cuda/cuda_blas.h and xla/stream_executor/rocm/rocm_blas.h, respectively. We have also tried to compile the code with TF_HIPBLASLT=0 to make sure it also works fine if no hipblas-lt is available. @akuegel: can you perhaps have a look at our implementation ? Copybara import of the project: -- db303e003b79c81b9ff21955ea3f7cd8277ca8bd by Pavel Emeliyanenko <pavel.emeliyanenko@amd.com>: Unifying hip/cuda blas-lt APIs work in progress ongoing work make sure the code runs with TF_HIPBLASLT=0 adaptions for CUDA compile moving BlasLt and related stuff to se::gpu namespace hipblas_lt interface cleanup adapted the last blas-lt inteface changes for CUDA -- c4a37302b8448492939f1cb61722f60eb68ad9d1 by Pavel Emeliyanenko <pavel.emeliyanenko@amd.com>: protected code by TF_HIPBLASLT macro to make sure code builds without hipblas-lt too -- d6638c233cac12a800b34d913dd63def54550612 by Pavel Emeliyanenko <pavel.emeliyanenko@amd.com>: resolving conflicts -- d20c7298516a72d7efd83158799eb7428e44d394 by Pavel Emeliyanenko <pavel.emeliyanenko@amd.com>: appliyng reviewer changes Merging this change closes #5911 FUTURE_COPYBARA_INTEGRATE_REVIEW=#5911 from ROCmSoftwarePlatform:unify_blaslt_APIs_v2 d20c7298516a72d7efd83158799eb7428e44d394 PiperOrigin-RevId: 571312652
Imported from GitHub PR #5911 This is a follow-up PR for these two issues: #4406, #3953 We unified hip/cuda blas-lt APIs by providing a common virtual interface defined in xla/stream_executor/gpu/gpu_blas_lt.h/.cc with implementations in xla/stream_executor/cuda/cuda_blas_lt.h/.cc and xla/stream_executor/rocm/hip_blas_lt.h/.cc, respectively. The main design decision was that we made the class MatmulPlan (originally defined in xla/service/gpu/matmul_utils.h/.cc) **polymorphic** and moved it's interface declaration to gpu_blas_lt.h. There are two reasons for that, namely: 1. MatmulPlan provided a public function **ExecuteOnStream** which was implemented in terms of conditional compulation with macros '#if GOOGLE_CUDA' or '#if TF_HIPBLASLT' in order to integrate library-specific data-types. This function becomes now a part of gpu_blas_lt interface. 2. MatmulPlan contained a library-specific member variable 'plan_' of type 'se::gpu::BlasLt::MatmulPlan' which is basically a plain container of MatmulDesc and several MatrixLayouts. These underlying types are again BLASLT library-specific and are **never** used directly, hence there is no need to expose BlasLt::MatmulDesc and BlasLt::MatrixLayout in the public interface. Besides ExecuteOnStream, the class MatmulPlan also provides a number of overloaded 'DoMatmul' member functions (some of them are template functions) which were extracted as a common part from the original BlasLt implementations. These DoMatmul functions are also required for the oncoming integration of Blas-lt interface into Tensorflow: see tensorflow\core\kernels\matmul_util.h/.cc. We also extracted the library-specific argument type-checks from templated DoMatmul functions and moved them into a virtual function MatmulPlan::ValidateInputs(). The polymorphic class gpu::BlasLt (defined in gpu_blas_lt.h) is responsible for constructing the objects of type MatmulPlan, the rest blas-lt functionality is solely handled by MatmulPlan interface. The instantiations of gpu::BlasLt interface, as before, are defined in xla/stream_executor/cuda/cuda_blas.h and xla/stream_executor/rocm/rocm_blas.h, respectively. We have also tried to compile the code with TF_HIPBLASLT=0 to make sure it also works fine if no hipblas-lt is available. @akuegel: can you perhaps have a look at our implementation ? Copybara import of the project: -- daea33c by Pavel Emeliyanenko <pavel.emeliyanenko@amd.com>: Unifying hip/cuda blas-lt APIs work in progress ongoing work make sure the code runs with TF_HIPBLASLT=0 adaptions for CUDA compile moving BlasLt and related stuff to se::gpu namespace hipblas_lt interface cleanup adapted the last blas-lt inteface changes for CUDA -- b4ff019 by Pavel Emeliyanenko <pavel.emeliyanenko@amd.com>: protected code by TF_HIPBLASLT macro to make sure code builds without hipblas-lt too -- 7248f69 by Pavel Emeliyanenko <pavel.emeliyanenko@amd.com>: resolving conflicts -- d48e6ee by Pavel Emeliyanenko <pavel.emeliyanenko@amd.com>: appliyng reviewer changes -- 1d7cc54 by Pavel Emeliyanenko <pavel.emeliyanenko@amd.com>: rebased and adapted API for TF blas-lt part Merging this change closes #5911 COPYBARA_INTEGRATE_REVIEW=#5911 from ROCmSoftwarePlatform:unify_blaslt_APIs_v2 1d7cc54 PiperOrigin-RevId: 573136621
Imported from GitHub PR openxla/xla#5911 This is a follow-up PR for these two issues: openxla/xla#4406, openxla/xla#3953 We unified hip/cuda blas-lt APIs by providing a common virtual interface defined in xla/stream_executor/gpu/gpu_blas_lt.h/.cc with implementations in xla/stream_executor/cuda/cuda_blas_lt.h/.cc and xla/stream_executor/rocm/hip_blas_lt.h/.cc, respectively. The main design decision was that we made the class MatmulPlan (originally defined in xla/service/gpu/matmul_utils.h/.cc) **polymorphic** and moved it's interface declaration to gpu_blas_lt.h. There are two reasons for that, namely: 1. MatmulPlan provided a public function **ExecuteOnStream** which was implemented in terms of conditional compulation with macros '#if GOOGLE_CUDA' or '#if TF_HIPBLASLT' in order to integrate library-specific data-types. This function becomes now a part of gpu_blas_lt interface. 2. MatmulPlan contained a library-specific member variable 'plan_' of type 'se::gpu::BlasLt::MatmulPlan' which is basically a plain container of MatmulDesc and several MatrixLayouts. These underlying types are again BLASLT library-specific and are **never** used directly, hence there is no need to expose BlasLt::MatmulDesc and BlasLt::MatrixLayout in the public interface. Besides ExecuteOnStream, the class MatmulPlan also provides a number of overloaded 'DoMatmul' member functions (some of them are template functions) which were extracted as a common part from the original BlasLt implementations. These DoMatmul functions are also required for the oncoming integration of Blas-lt interface into Tensorflow: see tensorflow\core\kernels\matmul_util.h/.cc. We also extracted the library-specific argument type-checks from templated DoMatmul functions and moved them into a virtual function MatmulPlan::ValidateInputs(). The polymorphic class gpu::BlasLt (defined in gpu_blas_lt.h) is responsible for constructing the objects of type MatmulPlan, the rest blas-lt functionality is solely handled by MatmulPlan interface. The instantiations of gpu::BlasLt interface, as before, are defined in xla/stream_executor/cuda/cuda_blas.h and xla/stream_executor/rocm/rocm_blas.h, respectively. We have also tried to compile the code with TF_HIPBLASLT=0 to make sure it also works fine if no hipblas-lt is available. @akuegel: can you perhaps have a look at our implementation ? Copybara import of the project: -- daea33c73b142340481360d020bab10c4d64c79d by Pavel Emeliyanenko <pavel.emeliyanenko@amd.com>: Unifying hip/cuda blas-lt APIs work in progress ongoing work make sure the code runs with TF_HIPBLASLT=0 adaptions for CUDA compile moving BlasLt and related stuff to se::gpu namespace hipblas_lt interface cleanup adapted the last blas-lt inteface changes for CUDA -- b4ff019b278dfc93c93f17eaab2eccd772852cd3 by Pavel Emeliyanenko <pavel.emeliyanenko@amd.com>: protected code by TF_HIPBLASLT macro to make sure code builds without hipblas-lt too -- 7248f692e0ed1262f11ea8c370c0771e9539b342 by Pavel Emeliyanenko <pavel.emeliyanenko@amd.com>: resolving conflicts -- d48e6ee7bd320de421b7c870af744d1bca160d8b by Pavel Emeliyanenko <pavel.emeliyanenko@amd.com>: appliyng reviewer changes -- 1d7cc54d3ce1df1ba6f798c659b4f87292425869 by Pavel Emeliyanenko <pavel.emeliyanenko@amd.com>: rebased and adapted API for TF blas-lt part Merging this change closes #5911 PiperOrigin-RevId: 573136621
Add missing replacement for rocm_hipblaslt.
Also add a missing dependency.
Remove a duplicated dependency.
Move a dependency back into if_rocm_is_configured block.