Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build fleetbench failed as no permission with Bazel 5.4.0 #6

Closed
esharkwang opened this issue Jan 9, 2023 · 8 comments
Closed

Build fleetbench failed as no permission with Bazel 5.4.0 #6

esharkwang opened this issue Jan 9, 2023 · 8 comments

Comments

@esharkwang
Copy link

Hi,

I want to create aarch64 version fleetbench. However it failed as no permission.

Here is the build log. I had granted the fleetbench folder as 777.
bazel run -c opt fleetbench/swissmap:hot_swissmap_benchmark --verbose_failures
2023/01/09 03:40:53 Downloading https://releases.bazel.build/5.4.0/release/bazel-5.4.0-linux-arm64...
Extracting Bazel installation...
Starting local Bazel server and connecting to it...
INFO: Analyzed target //fleetbench/swissmap:hot_swissmap_benchmark (65 packages loaded, 836 targets configured).
INFO: Found 1 target...
ERROR: /home/nvidia/walter/fleetbench/fleetbench/BUILD:15:11: Compiling fleetbench/benchmark_main.cc failed: (Exit 1): gcc failed: error executing command
(cd /root/.cache/bazel/bazel_root/0bce1989468318c371f4348e6ac4d902/sandbox/linux-sandbox/15/execroot/com_google_fleetbench &&
exec env -
PATH=/root/.cache/bazelisk/downloads/bazelbuild/bazel-5.4.0-linux-arm64/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin
PWD=/proc/self/cwd
/usr/bin/gcc -U_FORTIFY_SOURCE -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 '-D_FORTIFY_SOURCE=1' -DNDEBUG -ffunction-sections -fdata-sections '-std=c++0x' -MD -MF bazel-out/aarch64-opt/bin/fleetbench/objs/benchmark_main/benchmark_main.d '-frandom-seed=bazel-out/aarch64-opt/bin/fleetbench/objs/benchmark_main/benchmark_main.o' -DBENCHMARK_STATIC_DEFINE -iquote . -iquote bazel-out/aarch64-opt/bin -iquote external/com_google_benchmark -iquote bazel-out/aarch64-opt/bin/external/com_google_benchmark -Ibazel-out/aarch64-opt/bin/external/com_google_benchmark/virtual_includes/benchmark '-std=c++17' -fno-canonical-system-headers -Wno-builtin-macro-redefined '-D__DATE="redacted"' '-D__TIMESTAMP
="redacted"' '-D__TIME__="redacted"' -c fleetbench/benchmark_main.cc -o bazel-out/aarch64-opt/bin/fleetbench/_objs/benchmark_main/benchmark_main.o)

Configuration: a0b0f0a2e12d5d8ebd5c1e57a8b5134db01aaef167d6db5c638a140b29cfa08a

Execution platform: @local_config_platform//:host

Use --sandbox_debug to see verbose messages from the sandbox and retain the sandbox build root for debugging
gcc: error: fleetbench/benchmark_main.cc: Permission denied
gcc: fatal error: no input files
compilation terminated.
Target //fleetbench/swissmap:hot_swissmap_benchmark failed to build
INFO: Elapsed time: 17.432s, Critical Path: 1.09s
INFO: 170 processes: 166 internal, 4 linux-sandbox.
FAILED: Build did NOT complete successfully
FAILED: Build did NOT complete successfully
root@nvidia:/home/nvidia/walter/fleetbench# bazel version
Bazelisk version: v1.13.2
Build label: 5.4.0
Build target: bazel-out/aarch64-opt/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar
Build time: Thu Dec 15 16:14:11 2022 (1671120851)
Build timestamp: 1671120851
Build timestamp as int: 1671120851

I did some researches and found that it was caused by a loop soft link. The link didn't point to the correct source file. It pointed to itself. Should I missed some build options or configurations?
image

@esharkwang
Copy link
Author

After some search, I added the --spawn_strategy=local to use local source code. It won't report error. But I will failed to compile the dependency code for ARMv8 aarch64 code.

bazel build -c opt fleetbench/tcmalloc:all --spawn_strategy=local --sandbox_debug
INFO: Analyzed 8 targets (0 packages loaded, 0 targets configured).
INFO: Found 8 targets...
ERROR: /root/.cache/bazel/_bazel_root/0bce1989468318c371f4348e6ac4d902/external/com_google_tcmalloc/tcmalloc/BUILD:297:11: Compiling tcmalloc/global_stats.cc failed: (Exit 1): gcc failed: error executing command /usr/bin/gcc -U_FORTIFY_SOURCE -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 '-D_FORTIFY_SOURCE=1' -DNDEBUG -ffunction-sections ... (remaining 32 arguments skipped)
In file included from external/com_google_tcmalloc/tcmalloc/cpu_cache.h:39,
from external/com_google_tcmalloc/tcmalloc/global_stats.cc:21:
external/com_google_tcmalloc/tcmalloc/internal/percpu_tcmalloc.h: In function 'int tcmalloc::tcmalloc_internal::subtle::percpu::TcmallocSlab_Internal_Push(typename tcmalloc::tcmalloc_internal::subtle::percpu::TcmallocSlab::Slabs*, size_t, void*, tcmalloc::tcmalloc_internal::subtle::percpu::Shift, tcmalloc::tcmalloc_internal::subtle::percpu::OverflowHandler, void*, size_t)':
external/com_google_tcmalloc/tcmalloc/internal/percpu_tcmalloc.h:667:9: error: expected ':' or '::' before '[' token
667 | : [end_ptr] "=&r"(end_ptr), [cpu_id] "=&r"(cpu_id),
| ^
INFO: Elapsed time: 2.486s, Critical Path: 1.92s
INFO: 39 processes: 37 internal, 2 local.

I checked the code external/com_google_tcmalloc/tcmalloc/internal/percpu_tcmalloc.h:66. It is actually the asm code area. I am not sure why it would fail. It should be verified before. Is any special options for aarch64 compilation?
#if TCMALLOC_INTERNAL_PERCPU_USE_RSEQ_ASM_GOTO
"b.le %l[overflow_label]\n"
#else
"b.le 5f\n"
// Important! code below this must not affect any flags (i.e.: ccle)
// If so, the above code needs to explicitly set a ccle return value.
#endif
"str %[item], [%[region_start], %[current], LSL #3]\n"
"add %w[current], %w[current], #1\n"
"strh %w[current], [%[region_start], %[size_class_lsl3]]\n"
// Commit
"5:\n"
: [end_ptr] "=&r"(end_ptr), [cpu_id] "=&r"(cpu_id),
[current] "=&r"(current), [end] "=&r"(end),
[region_start] "=&r"(region_start)

@liyuying0000
Copy link
Contributor

liyuying0000 commented Jan 12, 2023

Hi, @esharkwang,

I'm able to reproduce the same error on a Nvidia Jetson Xavier AGX machine. It turns out this is likely a dependency issue and unrelate to Fleetbench code itself. There are some incompatibilities between the internal and external versions. I am actively looking at it and speaking with TcMalloc team as well.

In the meanwhile, you can try to build with different compilers/compiler version? For example, CC=clang bazel run -c opt fleetbench/swissmap:hot_swissmap_benchmark.

I will keep you posted once I have any update.

@liyuying0000
Copy link
Contributor

liyuying0000 commented Jan 13, 2023

Hi, @esharkwang,

Unfortunately, this is a long-standing issue when build with Bazel 5.4.0 on aarch64 with GCC version < 10, and it is unsupported at this moment.

@esharkwang
Copy link
Author

Hi @liyuying0000
Thanks for the comments. Could the fleetbenct code support GCC 11? If so, i think I could try to upgrade gcc version of bazel 5.4.0 as a workaround. Is it possible?

@esharkwang
Copy link
Author

@liyuying0000
I had tried to use Bazel 6.0.0 with workaround to fix dependency issue. I also raised the gcc to version 11. Now I can build the binary for aarch64. I will give a summary how to work around the issue later.

@liyuying0000
Copy link
Contributor

Hi, @esharkwang
Thanks for the updates. I'm so glad it worked out!
It would be appreciated if you could provide the work around.

@esharkwang
Copy link
Author

esharkwang commented Feb 1, 2023

Hi, @liyuying0000 ,

Here is my steps to workaround the issuel.

  1. First to update the skylib reference as bazel discussion group
    Here is a sample.
    http_archive(
    name = "bazel_skylib",
    sha256 = "74d544d96f4a5bb630d465ca8bbcfe231e3594e5aae57e1edbf17a6eb3ca2506",
    urls = [
    "https://mirror.bazel.build/github.com/bazelbuild/bazel-skylib/releases/download/1.3.0/bazel-skylib-1.3.0.tar.gz",
    "https://github.com/bazelbuild/bazel-skylib/releases/download/1.3.0/bazel-skylib-1.3.0.tar.gz",
    ],
    )
  2. Specify bazelisk to use bazel 6.0.0 with environment variable.
    export USE_BAZEL_VERSION=6.0.0
  3. Install the GCC 11 and mark it as the default compiler.
    apt install -y build-essential
    apt install -y gcc-11 g++-10 cpp-11
    update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-11 100 --slave /usr/bin/g++ g++ /usr/bin/g++-11 --slave /usr/bin/gcov gcov /usr/bin/gcov-11
  4. Then use bazel to compile and run the application. It can work with aarch64 Ubuntu 20.04 version.

@liyuying0000
Copy link
Contributor

Thanks so much for your workaround! @esharkwang

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants