Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ROCm] Fix link failure for rocm targets #11205

Closed
wants to merge 3 commits into from

Conversation

hsharsha
Copy link
Contributor

@hsharsha hsharsha commented Apr 4, 2024

No description provided.

@i-chaochen
Copy link
Contributor

Hi @xla-rotation please check it.

@penpornk penpornk requested a review from ezhulenev April 5, 2024 10:44
xla/stream_executor/gpu/gpu_timer.cc Outdated Show resolved Hide resolved
@@ -146,6 +146,7 @@ cc_library(
"@com_google_absl//absl/functional:any_invocable",
"@com_google_absl//absl/strings",
"//xla/stream_executor",
"//xla/stream_executor:kernel",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can depend on //xla/stream_executor:stream_executor_impl instead. Otherwise, I think we should still keep private visibility for //xla/stream_executor:kernel and add an exception for rocm_executor target.

@ezhulenev Thoughts?

Copy link
Member

@ezhulenev ezhulenev Apr 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We actually going in other direction, trying to make SE a "normal" library: (1) delete _impl (2) export targets for different functionalities, kernel, stream, device_memory, etc. And regular stream_executor will bring all implementations with it

@@ -193,6 +194,7 @@ cc_library(
"@com_google_absl//absl/base",
"@com_google_absl//absl/memory",
"//xla/stream_executor", # buildcleaner: keep
"//xla/stream_executor:executor_cache",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. //xla/stream_executor:stream_executor_impl covers executor_cache.

cc: @ezhulenev

@@ -3196,7 +3196,7 @@ class RocmConvRunner : public dnn::ConvRunner {
return {{algo_id_, false, workspace_size_}};
}

absl::Status operator()(Stream* stream, dnn::ProfileResult* profile_result,
absl::Status operator()(Stream* stream, dnn::ProfileResult* output_profile_result,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious. Could you please explain what failure renaming this variable fixes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/openxla/xla/pull/11172/files#diff-c40dd5f4ceb7cef0a055862235ddd9f12d0058bb2a09aca3cbb69b31c232c509R3212-R3216

Introduces a bug with using undeclared variable output_profile_result. Since other part of the code uses this variable name, I have renamed in this function to be consistent throughout.

@@ -349,7 +351,7 @@ cc_library(
"@com_google_absl//absl/strings",
"@com_google_absl//absl/types:span",
"@local_config_rocm//rocm:rocm_headers",
"@tsl//tsl/platform:env",
"@tsl//tsl/platform:env_impl",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ddunl do you know why it might be needed here? I suspect "something" might be missing in one of the build files to add it automatically?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This strikes me as odd - I'd expect this to be taken care of by https://github.com/openxla/xla/blob/main/xla/xla.bzl#L52 for all test/binary targets. @hsharsha if you have a build failure log for what happens when you switch this back to :env I'd be very interested to see it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ddunl Running local_client_aot_test after switching back to :env with command line

bazel  --bazelrc=/usertools/rocm.bazelrc test  --test_output all --config=rocm --test_env=TF_TESTS_PER_GPU=1  --run_under=//tools/ci_build/gpu_build:parallel_gpu_execute -- //xla/tests:local_client_aot_test

Results in this error

INFO: Analyzed target //xla/tests:local_client_aot_test (158 packages loaded, 12668 targets configured).
INFO: Found 1 test target...
ERROR: /home/hahavanu/xla_upstream/upstream/xla/xla/tests/BUILD:2441:12: Linking xla/tests/local_client_aot_test failed: (Exit 1): crosstool_wrapper_driver_is_not_gcc failed: error executing command (from target //xla/tests:local_client_aot_test) external/local_config_rocm/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc @bazel-out/k8-opt/bin/xla/tests/local_client_aot_test-2.params
bazel-out/k8-opt/bin/external/tsl/tsl/profiler/backends/cpu/_objs/traceme_recorder_impl/traceme_recorder.o:traceme_recorder.cc:function tsl::profiler::TraceMeRecorder::Record(tsl::profiler::TraceMeRecorder::Event&&): error: undefined reference to 'tsl::Env::Default()'
bazel-out/k8-opt/bin/xla/stream_executor/rocm/_objs/miopen_plugin/rocm_dnn.o:rocm_dnn.cc:function stream_executor::gpu::wrap::DynLoadShim__miopenDestroy::LoadOrDie(): error: undefined reference to 'tsl::Env::Default()'
bazel-out/k8-opt/bin/xla/stream_executor/rocm/_objs/miopen_plugin/rocm_dnn.o:rocm_dnn.cc:function stream_executor::gpu::MIOpenAccess::GetHandle(stream_executor::gpu::GpuExecutor*, stream_executor::Stream*): error: undefined reference to 'tsl::Env::Default()'
bazel-out/k8-opt/bin/xla/stream_executor/rocm/_objs/miopen_plugin/rocm_dnn.o:rocm_dnn.cc:function stream_executor::gpu::MIOpenCTCLossDescriptor::MIOpenCTCLossDescriptor(miopenDataType_t): error: undefined reference to 'tsl::Env::Default()'
collect2: error: ld returned 1 exit status
Target //xla/tests:local_client_aot_test failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 582.787s, Critical Path: 232.26s
INFO: 4791 processes: 263 internal, 4528 local.
FAILED: Build did NOT complete successfully
//xla/tests:local_client_aot_test                               FAILED TO BUILD

Co-authored-by: Penporn Koanantakool <38085909+penpornk@users.noreply.github.com>
@github-actions github-actions bot added the kokoro:force-run Forces CI to rerun label Apr 8, 2024
@kokoro-team kokoro-team removed the kokoro:force-run Forces CI to rerun label Apr 8, 2024
@hsharsha
Copy link
Contributor Author

hsharsha commented Apr 9, 2024

Hi @xla-rotation @ddunl @ezhulenev @penpornk Can we please resolve this? It breaks build with our local CI.

copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Apr 9, 2024
Imported from GitHub PR openxla/xla#11205

Copybara import of the project:

--
91df7b60a4aac6da70165948a39805d0d3921067 by Harsha HS <harsha.havanurshamsundara@amd.com>:

[ROCm] Fix build break due to 4c8a74bb5

--
85cd084e52340cc2069ab5c4c57c210e466846a1 by Harsha HS <harsha.havanurshamsundara@amd.com>:

[ROCm] Fix link failure for rocm targets

--
8ee9b5e90d4e1a3ca7734bbe1ef73140e4d879dd by Harsha H S <hsharsha@users.noreply.github.com>:

Update xla/stream_executor/gpu/gpu_timer.cc with review comment

Co-authored-by: Penporn Koanantakool <38085909+penpornk@users.noreply.github.com>

Merging this change closes #11205

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#11205 from ROCm:ci_link_break_20240404 8ee9b5e90d4e1a3ca7734bbe1ef73140e4d879dd
PiperOrigin-RevId: 623141901
@copybara-service copybara-service bot closed this in d6c50b4 Apr 9, 2024
copybara-service bot pushed a commit that referenced this pull request Apr 9, 2024
…ever destroyed, and the destructor is never run.

FUTURE_COPYBARA_INTEGRATE_REVIEW=#11205 from ROCm:ci_link_break_20240404 8ee9b5e
PiperOrigin-RevId: 622926049
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Apr 9, 2024
…ever destroyed, and the destructor is never run.

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#11205 from ROCm:ci_link_break_20240404 8ee9b5e90d4e1a3ca7734bbe1ef73140e4d879dd
PiperOrigin-RevId: 622926049
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Apr 9, 2024
FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#11205 from ROCm:ci_link_break_20240404 8ee9b5e90d4e1a3ca7734bbe1ef73140e4d879dd
PiperOrigin-RevId: 622145436
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Apr 9, 2024
FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#11205 from ROCm:ci_link_break_20240404 8ee9b5e90d4e1a3ca7734bbe1ef73140e4d879dd
PiperOrigin-RevId: 619370979
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Apr 9, 2024
FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#11205 from ROCm:ci_link_break_20240404 8ee9b5e90d4e1a3ca7734bbe1ef73140e4d879dd
PiperOrigin-RevId: 617804064
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Apr 9, 2024
PjRtFuture<void> (or PjRtFuture<>) is an implicit wrapper for absl::Status that will become available at some point in the future and will signal completion of an async event.

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#11205 from ROCm:ci_link_break_20240404 8ee9b5e90d4e1a3ca7734bbe1ef73140e4d879dd
PiperOrigin-RevId: 622941633
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Apr 9, 2024
Imported from GitHub PR openxla/xla#11205

Copybara import of the project:

--
91df7b60a4aac6da70165948a39805d0d3921067 by Harsha HS <harsha.havanurshamsundara@amd.com>:

[ROCm] Fix build break due to 4c8a74bb5

--
85cd084e52340cc2069ab5c4c57c210e466846a1 by Harsha HS <harsha.havanurshamsundara@amd.com>:

[ROCm] Fix link failure for rocm targets

--
8ee9b5e90d4e1a3ca7734bbe1ef73140e4d879dd by Harsha H S <hsharsha@users.noreply.github.com>:

Update xla/stream_executor/gpu/gpu_timer.cc with review comment

Co-authored-by: Penporn Koanantakool <38085909+penpornk@users.noreply.github.com>

Merging this change closes #11205

PiperOrigin-RevId: 623171848
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Apr 9, 2024
This feature is under development and will likely remain unstable until we finish work on SymbolicTileAnalysis.

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#11205 from ROCm:ci_link_break_20240404 8ee9b5e90d4e1a3ca7734bbe1ef73140e4d879dd
PiperOrigin-RevId: 590554483
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants