New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ROCm] Fix link failure for rocm targets #11205
Conversation
Hi @xla-rotation please check it. |
@@ -146,6 +146,7 @@ cc_library( | |||
"@com_google_absl//absl/functional:any_invocable", | |||
"@com_google_absl//absl/strings", | |||
"//xla/stream_executor", | |||
"//xla/stream_executor:kernel", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can depend on //xla/stream_executor:stream_executor_impl
instead. Otherwise, I think we should still keep private visibility for //xla/stream_executor:kernel
and add an exception for rocm_executor
target.
@ezhulenev Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We actually going in other direction, trying to make SE a "normal" library: (1) delete _impl
(2) export targets for different functionalities, kernel, stream, device_memory, etc. And regular stream_executor
will bring all implementations with it
@@ -193,6 +194,7 @@ cc_library( | |||
"@com_google_absl//absl/base", | |||
"@com_google_absl//absl/memory", | |||
"//xla/stream_executor", # buildcleaner: keep | |||
"//xla/stream_executor:executor_cache", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here. //xla/stream_executor:stream_executor_impl
covers executor_cache
.
cc: @ezhulenev
@@ -3196,7 +3196,7 @@ class RocmConvRunner : public dnn::ConvRunner { | |||
return {{algo_id_, false, workspace_size_}}; | |||
} | |||
|
|||
absl::Status operator()(Stream* stream, dnn::ProfileResult* profile_result, | |||
absl::Status operator()(Stream* stream, dnn::ProfileResult* output_profile_result, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious. Could you please explain what failure renaming this variable fixes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Introduces a bug with using undeclared variable output_profile_result. Since other part of the code uses this variable name, I have renamed in this function to be consistent throughout.
@@ -349,7 +351,7 @@ cc_library( | |||
"@com_google_absl//absl/strings", | |||
"@com_google_absl//absl/types:span", | |||
"@local_config_rocm//rocm:rocm_headers", | |||
"@tsl//tsl/platform:env", | |||
"@tsl//tsl/platform:env_impl", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ddunl do you know why it might be needed here? I suspect "something" might be missing in one of the build files to add it automatically?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This strikes me as odd - I'd expect this to be taken care of by https://github.com/openxla/xla/blob/main/xla/xla.bzl#L52 for all test/binary targets. @hsharsha if you have a build failure log for what happens when you switch this back to :env
I'd be very interested to see it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ddunl Running local_client_aot_test after switching back to :env
with command line
bazel --bazelrc=/usertools/rocm.bazelrc test --test_output all --config=rocm --test_env=TF_TESTS_PER_GPU=1 --run_under=//tools/ci_build/gpu_build:parallel_gpu_execute -- //xla/tests:local_client_aot_test
Results in this error
INFO: Analyzed target //xla/tests:local_client_aot_test (158 packages loaded, 12668 targets configured).
INFO: Found 1 test target...
ERROR: /home/hahavanu/xla_upstream/upstream/xla/xla/tests/BUILD:2441:12: Linking xla/tests/local_client_aot_test failed: (Exit 1): crosstool_wrapper_driver_is_not_gcc failed: error executing command (from target //xla/tests:local_client_aot_test) external/local_config_rocm/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc @bazel-out/k8-opt/bin/xla/tests/local_client_aot_test-2.params
bazel-out/k8-opt/bin/external/tsl/tsl/profiler/backends/cpu/_objs/traceme_recorder_impl/traceme_recorder.o:traceme_recorder.cc:function tsl::profiler::TraceMeRecorder::Record(tsl::profiler::TraceMeRecorder::Event&&): error: undefined reference to 'tsl::Env::Default()'
bazel-out/k8-opt/bin/xla/stream_executor/rocm/_objs/miopen_plugin/rocm_dnn.o:rocm_dnn.cc:function stream_executor::gpu::wrap::DynLoadShim__miopenDestroy::LoadOrDie(): error: undefined reference to 'tsl::Env::Default()'
bazel-out/k8-opt/bin/xla/stream_executor/rocm/_objs/miopen_plugin/rocm_dnn.o:rocm_dnn.cc:function stream_executor::gpu::MIOpenAccess::GetHandle(stream_executor::gpu::GpuExecutor*, stream_executor::Stream*): error: undefined reference to 'tsl::Env::Default()'
bazel-out/k8-opt/bin/xla/stream_executor/rocm/_objs/miopen_plugin/rocm_dnn.o:rocm_dnn.cc:function stream_executor::gpu::MIOpenCTCLossDescriptor::MIOpenCTCLossDescriptor(miopenDataType_t): error: undefined reference to 'tsl::Env::Default()'
collect2: error: ld returned 1 exit status
Target //xla/tests:local_client_aot_test failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 582.787s, Critical Path: 232.26s
INFO: 4791 processes: 263 internal, 4528 local.
FAILED: Build did NOT complete successfully
//xla/tests:local_client_aot_test FAILED TO BUILD
Co-authored-by: Penporn Koanantakool <38085909+penpornk@users.noreply.github.com>
Hi @xla-rotation @ddunl @ezhulenev @penpornk Can we please resolve this? It breaks build with our local CI. |
Imported from GitHub PR openxla/xla#11205 Copybara import of the project: -- 91df7b60a4aac6da70165948a39805d0d3921067 by Harsha HS <harsha.havanurshamsundara@amd.com>: [ROCm] Fix build break due to 4c8a74bb5 -- 85cd084e52340cc2069ab5c4c57c210e466846a1 by Harsha HS <harsha.havanurshamsundara@amd.com>: [ROCm] Fix link failure for rocm targets -- 8ee9b5e90d4e1a3ca7734bbe1ef73140e4d879dd by Harsha H S <hsharsha@users.noreply.github.com>: Update xla/stream_executor/gpu/gpu_timer.cc with review comment Co-authored-by: Penporn Koanantakool <38085909+penpornk@users.noreply.github.com> Merging this change closes #11205 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#11205 from ROCm:ci_link_break_20240404 8ee9b5e90d4e1a3ca7734bbe1ef73140e4d879dd PiperOrigin-RevId: 623141901
…ever destroyed, and the destructor is never run. FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#11205 from ROCm:ci_link_break_20240404 8ee9b5e90d4e1a3ca7734bbe1ef73140e4d879dd PiperOrigin-RevId: 622926049
FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#11205 from ROCm:ci_link_break_20240404 8ee9b5e90d4e1a3ca7734bbe1ef73140e4d879dd PiperOrigin-RevId: 622145436
FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#11205 from ROCm:ci_link_break_20240404 8ee9b5e90d4e1a3ca7734bbe1ef73140e4d879dd PiperOrigin-RevId: 619370979
FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#11205 from ROCm:ci_link_break_20240404 8ee9b5e90d4e1a3ca7734bbe1ef73140e4d879dd PiperOrigin-RevId: 617804064
PjRtFuture<void> (or PjRtFuture<>) is an implicit wrapper for absl::Status that will become available at some point in the future and will signal completion of an async event. FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#11205 from ROCm:ci_link_break_20240404 8ee9b5e90d4e1a3ca7734bbe1ef73140e4d879dd PiperOrigin-RevId: 622941633
Imported from GitHub PR openxla/xla#11205 Copybara import of the project: -- 91df7b60a4aac6da70165948a39805d0d3921067 by Harsha HS <harsha.havanurshamsundara@amd.com>: [ROCm] Fix build break due to 4c8a74bb5 -- 85cd084e52340cc2069ab5c4c57c210e466846a1 by Harsha HS <harsha.havanurshamsundara@amd.com>: [ROCm] Fix link failure for rocm targets -- 8ee9b5e90d4e1a3ca7734bbe1ef73140e4d879dd by Harsha H S <hsharsha@users.noreply.github.com>: Update xla/stream_executor/gpu/gpu_timer.cc with review comment Co-authored-by: Penporn Koanantakool <38085909+penpornk@users.noreply.github.com> Merging this change closes #11205 PiperOrigin-RevId: 623171848
This feature is under development and will likely remain unstable until we finish work on SymbolicTileAnalysis. FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#11205 from ROCm:ci_link_break_20240404 8ee9b5e90d4e1a3ca7734bbe1ef73140e4d879dd PiperOrigin-RevId: 590554483
No description provided.