Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jaxlib failure on PPC-CUDA #1095

Open
cdeepali opened this issue Apr 22, 2024 · 5 comments
Open

jaxlib failure on PPC-CUDA #1095

cdeepali opened this issue Apr 22, 2024 · 5 comments
Assignees
Labels
bug Something isn't working
Milestone

Comments

@cdeepali
Copy link
Contributor

cdeepali commented Apr 22, 2024

  /bin/bash -c 'source external/bazel_tools/tools/genrule/genrule-setup.sh; NOT_IMPLEMENTED_FOR_THIS_PLATFORM_OR_ARCHITECTURE')
# Configuration: a56082c93dd48aa014ecd841b6baa57bfb599ec2a3bd73f574476a78d24931ba
# Execution platform: @local_execution_config_platform//:platform
/bin/bash: line 1: NOT_IMPLEMENTED_FOR_THIS_PLATFORM_OR_ARCHITECTURE: command not found
ERROR: <mydir>.cache/bazel/_bazel_builder/53454098b61397863f9da84cfbf7cbae/external/tsl/tsl/cuda/BUILD.bazel:222:10: Executing genrule @tsl//tsl/cuda:cusolver_stub_gen failed: (Exit 127): bash failed: error executing command (from target @tsl//tsl/cuda:cusolver_stub_gen)
  (cd <mydir>.cache/bazel/_bazel_builder/53454098b61397863f9da84cfbf7cbae/execroot/__main__ && \
  exec env - \
    CUDA_TOOLKIT_PATH=/usr/local/cuda \
    CUDNN_INSTALL_PATH=<mycondabld>jaxlib_1713725734023/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placeh \
    GCC_HOST_COMPILER_PATH=<mycondabld>jaxlib_1713725734023/_build_env/bin/powerpc64le-conda-linux-gnu-gcc \
    GCC_HOST_COMPILER_PREFIX=<mycondabld>jaxlib_1713725734023/_build_env/bin \
    PATH=<mycondabld>jaxlib_1713725734023/_build_env/bin:<mycondabld>jaxlib_1713725734023/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placeh/bin:/opt/conda/condabin:<mycondabld>jaxlib_1713725734023/_build_env:<mycondabld>jaxlib_1713725734023/_build_env/bin:<mycondabld>jaxlib_1713725734023/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placeh:<mycondabld>jaxlib_1713725734023/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placeh/bin:/opt/conda/envs/dccuda-21apr/bin:/opt/conda/condabin:<mydir>.local/bin:<mydir>bin:/opt/conda/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin \
    TF_CUDA_COMPUTE_CAPABILITIES=sm_60,sm_70,sm_75,sm_80,sm_86,sm_90,compute_80,compute_86,compute_90 \
    TF_CUDA_PATHS=/usr/local/cuda,<mycondabld>jaxlib_1713725734023/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placeh \
    TF_CUDA_VERSION=12.2 \
    TF_CUDNN_VERSION=8.9.6 \
  /bin/bash -c 'source external/bazel_tools/tools/genrule/genrule-setup.sh; NOT_IMPLEMENTED_FOR_THIS_PLATFORM_OR_ARCHITECTURE')
# Configuration: a56082c93dd48aa014ecd841b6baa57bfb599ec2a3bd73f574476a78d24931ba
# Execution platform: @local_execution_config_platform//:platform
/bin/bash: line 1: NOT_IMPLEMENTED_FOR_THIS_PLATFORM_OR_ARCHITECTURE: command not found
[252 / 5,551] Executing genrule @local_config_cuda//cuda:cuda-lib; 2s local

@cdeepali cdeepali added the bug Something isn't working label Apr 22, 2024
@cdeepali cdeepali self-assigned this Apr 22, 2024
@cdeepali
Copy link
Contributor Author

jaxlib/mosaic/dialect/tpu/transforms/apply_vector_layout.cc: In function 'mlir::LogicalResult mlir::tpu::{anonymous}::vector_multi_reduction_rule(mlir::tpu::RewriteContext&, mlir::Operation&, llvm::ArrayRef<std::optional<mlir::tpu::VectorLayout> >, llvm::ArrayRef<std::optional<mlir::tpu::VectorLayout> >)':
jaxlib/mosaic/dialect/tpu/transforms/apply_vector_layout.cc:2287:33: error: 'MAXF' is not a member of 'mlir::vector::CombiningKind'
 2287 |     case vector::CombiningKind::MAXF: {
      |                                 ^~~~
jaxlib/mosaic/dialect/tpu/transforms/apply_vector_layout.cc:2379:33: error: 'MAXF' is not a member of 'mlir::vector::CombiningKind'
 2379 |     case vector::CombiningKind::MAXF:
      |                                 ^~~~
At global scope:

@cdeepali
Copy link
Contributor Author

[14,354 / 14,492] Compiling xla/service/gpu/kernels/topk_kernel_bfloat16.cu.cc; 741s local ... (69 actions, 68 running)
ERROR: /<mycondabld>/jaxlib_1713768707967/work/jaxlib/mlir/_mlir_libs/BUILD.bazel:197:13: Compiling stablehlo/integrations/python/StablehloModule.cpp failed: undeclared inclusion(s) in rule '//jaxlib/mlir/_mlir_libs:_stablehlo.so':
this rule is missing dependency declarations for the following files included by 'stablehlo/integrations/python/StablehloModule.cpp':
  'external/stablehlo/stablehlo/reference/Api.h'
<mydir>/.cache/bazel/_bazel_builder/d282d8634b5ff875395f6f6d5c45c1bf/execroot/__main__/bazel_toolchain/crosstool_wrapper_driver_is_not_gcc:40: DeprecationWarning: 'pipes' is deprecated and slated for removal in Python 3.13
  import pipes
Target //jaxlib/tools:build_wheel failed to build
INFO: Elapsed time: 843.861s, Critical Path: 764.45s
INFO: 14424 processes: 6205 internal, 8219 local.
FAILED: Build did NOT complete successfully

@cdeepali
Copy link
Contributor Author

Theses issues are fixed, but import jaxlib still fails with segfault. So it is decided to disable jax-env. for 1.11.

@cdeepali
Copy link
Contributor Author

#1099

@cdeepali cdeepali added this to the OpenCE 1.11 milestone Apr 23, 2024
@cdeepali cdeepali modified the milestones: OpenCE 1.11, OpenCE 1.11.1 May 3, 2024
@cdeepali
Copy link
Contributor Author

Moving to 1.12.

@cdeepali cdeepali modified the milestones: OpenCE 1.11.1, OpenCE 1.12 May 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant