Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mix cannot continue when building EXLA #845

Closed
jnnks opened this issue Jul 19, 2022 · 20 comments
Closed

Mix cannot continue when building EXLA #845

jnnks opened this issue Jul 19, 2022 · 20 comments

Comments

@jnnks
Copy link

jnnks commented Jul 19, 2022

The EXLA build fails with:

Unchecked dependencies for environment prod:
* xla (Hex package)
  could not find an app file at "_build/prod/lib/xla/ebin/xla.app". This may happen if the dependency was not yet compiled or the dependency indeed has no app file (then you can pass app: false as option)
** (Mix) Can't continue due to errors on dependencies
Full Log
$ XLA_BUILD=true  MIX_ENV=prod mix compile

==> xla
Compiling 2 files (.ex)
Generated xla app
rm -f /root/.cache/xla_extension/tf-3f878cff5b698b82eea85db2b60d65a2e320850e/tensorflow/compiler/xla/extension && \
        ln -s "/workspaces/exla_compile_test/deps/xla/extension" /root/.cache/xla_extension/tf-3f878cff5b698b82eea85db2b60d65a2e320850e/tensorflow/compiler/xla/extension && \
        cd /root/.cache/xla_extension/tf-3f878cff5b698b82eea85db2b60d65a2e320850e && \
        bazel build --define "framework_shared_object=false" -c opt    //tensorflow/compiler/xla/extension:xla_extension && \
        mkdir -p /root/.cache/xla/0.3.0/cache/build/ && \
        cp -f /root/.cache/xla_extension/tf-3f878cff5b698b82eea85db2b60d65a2e320850e/bazel-bin/tensorflow/compiler/xla/extension/xla_extension.tar.gz /root/.cache/xla/0.3.0/cache/build/xla_extension-x86_64-linux-cpu.tar.gz
Extracting Bazel installation...
Starting local Bazel server and connecting to it...
INFO: Options provided by the client:
  Inherited 'common' options: --isatty=0 --terminal_columns=80
INFO: Reading rc options for 'build' from /root/.cache/xla_extension/tf-3f878cff5b698b82eea85db2b60d65a2e320850e/.bazelrc:
  Inherited 'common' options: --experimental_repo_remote_exec
INFO: Reading rc options for 'build' from /root/.cache/xla_extension/tf-3f878cff5b698b82eea85db2b60d65a2e320850e/.bazelrc:
  'build' options: --define framework_shared_object=true --java_toolchain=@tf_toolchains//toolchains/java:tf_java_toolchain --host_java_toolchain=@tf_toolchains//toolchains/java:tf_java_toolchain --define=use_fast_cpp_protos=true --define=allow_oversize_protos=true --spawn_strategy=standalone -c opt --announce_rc --define=grpc_no_ares=true --noincompatible_remove_legacy_whole_archive --enable_platform_specific_config --define=with_xla_support=true --config=short_logs --config=v2 --define=no_aws_support=true --define=no_hdfs_support=true --experimental_cc_shared_library --deleted_packages=tensorflow/compiler/mlir/tfrt,tensorflow/compiler/mlir/tfrt/benchmarks,tensorflow/compiler/mlir/tfrt/jit/python_binding,tensorflow/compiler/mlir/tfrt/jit/transforms,tensorflow/compiler/mlir/tfrt/python_tests,tensorflow/compiler/mlir/tfrt/tests,tensorflow/compiler/mlir/tfrt/tests/analysis,tensorflow/compiler/mlir/tfrt/tests/jit,tensorflow/compiler/mlir/tfrt/tests/lhlo_to_tfrt,tensorflow/compiler/mlir/tfrt/tests/tf_to_corert,tensorflow/compiler/mlir/tfrt/tests/tf_to_tfrt_data,tensorflow/compiler/mlir/tfrt/tests/saved_model,tensorflow/compiler/mlir/tfrt/transforms/lhlo_gpu_to_tfrt_gpu,tensorflow/core/runtime_fallback,tensorflow/core/runtime_fallback/conversion,tensorflow/core/runtime_fallback/kernel,tensorflow/core/runtime_fallback/opdefs,tensorflow/core/runtime_fallback/runtime,tensorflow/core/runtime_fallback/util,tensorflow/core/tfrt/common,tensorflow/core/tfrt/eager,tensorflow/core/tfrt/eager/backends/cpu,tensorflow/core/tfrt/eager/backends/gpu,tensorflow/core/tfrt/eager/core_runtime,tensorflow/core/tfrt/eager/cpp_tests/core_runtime,tensorflow/core/tfrt/fallback,tensorflow/core/tfrt/gpu,tensorflow/core/tfrt/run_handler_thread_pool,tensorflow/core/tfrt/runtime,tensorflow/core/tfrt/saved_model,tensorflow/core/tfrt/saved_model/tests,tensorflow/core/tfrt/tpu,tensorflow/core/tfrt/utils
INFO: Found applicable config definition build:short_logs in file /root/.cache/xla_extension/tf-3f878cff5b698b82eea85db2b60d65a2e320850e/.bazelrc: --output_filter=DONT_MATCH_ANYTHING
INFO: Found applicable config definition build:v2 in file /root/.cache/xla_extension/tf-3f878cff5b698b82eea85db2b60d65a2e320850e/.bazelrc: --define=tf_api_version=2 --action_env=TF2_BEHAVIOR=1
INFO: Found applicable config definition build:linux in file /root/.cache/xla_extension/tf-3f878cff5b698b82eea85db2b60d65a2e320850e/.bazelrc: --copt=-w --host_copt=-w --define=PREFIX=/usr --define=LIBDIR=$(PREFIX)/lib --define=INCLUDEDIR=$(PREFIX)/include --define=PROTOBUF_INCLUDE_PATH=$(PREFIX)/include --cxxopt=-std=c++14 --host_cxxopt=-std=c++14 --config=dynamic_kernels --distinct_host_configuration=false --experimental_guard_against_concurrent_changes
INFO: Found applicable config definition build:dynamic_kernels in file /root/.cache/xla_extension/tf-3f878cff5b698b82eea85db2b60d65a2e320850e/.bazelrc: --define=dynamic_loaded_kernels=true --copt=-DAUTOLOAD_DYNAMIC_KERNELS
Loading: 
Loading: 0 packages loaded
Loading: 0 packages loaded
Loading: 0 packages loaded
WARNING: Download from https://storage.googleapis.com/mirror.tensorflow.org/github.com/tensorflow/runtime/archive/c3e082762b7664bbc7ffd2c39e86464928e27c0c.tar.gz failed: class com.google.devtools.build.lib.bazel.repository.downloader.UnrecoverableHttpException GET returned 404 Not Found
Loading: 0 packages loaded
Loading: 0 packages loaded
Loading: 0 packages loaded
Loading: 0 packages loaded
Loading: 0 packages loaded
Loading: 0 packages loaded
Loading: 0 packages loaded
Loading: 0 packages loaded
Loading: 0 packages loaded
Loading: 0 packages loaded
Loading: 0 packages loaded
Loading: 0 packages loaded
Loading: 0 packages loaded
Loading: 0 packages loaded
Loading: 0 packages loaded
Loading: 0 packages loaded
Loading: 0 packages loaded
Loading: 0 packages loaded
Analyzing: target //tensorflow/compiler/xla/extension:xla_extension (1 packages loaded, 0 targets configured)
DEBUG: Rule 'io_bazel_rules_docker' indicated that a canonical reproducible form can be obtained by modifying arguments shallow_since = "1596824487 -0400"
DEBUG: Repository io_bazel_rules_docker instantiated at:
  /root/.cache/xla_extension/tf-3f878cff5b698b82eea85db2b60d65a2e320850e/WORKSPACE:23:14: in <toplevel>
  /root/.cache/xla_extension/tf-3f878cff5b698b82eea85db2b60d65a2e320850e/tensorflow/workspace0.bzl:108:34: in workspace
  /root/.cache/bazel/_bazel_root/2be90fa55f2d4383134ffe4aafd91de4/external/bazel_toolchains/repositories/repositories.bzl:35:23: in repositories
Repository rule git_repository defined at:
  /root/.cache/bazel/_bazel_root/2be90fa55f2d4383134ffe4aafd91de4/external/bazel_tools/tools/build_defs/repo/git.bzl:199:33: in <toplevel>
Analyzing: target //tensorflow/compiler/xla/extension:xla_extension (16 packages loaded, 14 targets configured)
Analyzing: target //tensorflow/compiler/xla/extension:xla_extension (187 packages loaded, 16013 targets configured)
INFO: Analyzed target //tensorflow/compiler/xla/extension:xla_extension (188 packages loaded, 16972 targets configured).
INFO: Found 1 target...
[0 / 10] [Prepa] BazelWorkspaceStatusAction stable-status.txt ... (4 actions, 0 running)
[128 / 234] Compiling src/google/protobuf/compiler/objectivec/objectivec_field.cc; 1s local ... (16 actions, 15 running)
[327 / 518] Compiling llvm/lib/TableGen/Record.cpp; 2s local ... (16 actions, 15 running)
[574 / 1,340] Compiling llvm/lib/Support/CommandLine.cpp; 1s local ... (16 actions, 15 running)
[1,151 / 1,481] Compiling llvm/utils/TableGen/InstrInfoEmitter.cpp; 3s local ... (16 actions, 15 running)
[1,911 / 7,107] Compiling mlir/lib/IR/Dominance.cpp; 6s local ... (16 actions, 15 running)
[2,247 / 7,107] Compiling llvm/lib/CodeGen/MachineSink.cpp; 7s local ... (16 actions, 15 running)
[2,473 / 7,107] Compiling llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp; 4s local ... (16 actions, 15 running)
[2,646 / 7,107] Compiling llvm/lib/Transforms/Coroutines/CoroFrame.cpp; 8s local ... (16 actions, 15 running)
[2,787 / 7,107] Compiling llvm/lib/Target/X86/X86ISelLowering.cpp; 6s local ... (16 actions, 15 running)
[2,931 / 7,107] Compiling llvm/lib/Target/X86/X86ISelLowering.cpp; 45s local ... (16 actions, 15 running)
[3,090 / 7,107] Compiling mlir/lib/Dialect/Linalg/IR/LinalgDialect.cpp; 17s local ... (16 actions, 15 running)
[3,224 / 7,107] Compiling src/cpu/x64/jit_uni_dw_convolution.cpp; 8s local ... (16 actions, 15 running)
[3,404 / 7,107] Compiling src/cpu/rnn/ref_rnn.cpp; 36s local ... (16 actions running)
[3,808 / 7,107] Compiling llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp; 14s local ... (16 actions running)
[4,132 / 7,107] Compiling tensorflow/compiler/mlir/hlo/lib/Dialect/mhlo/transforms/legalize_to_linalg.cc; 10s local ... (16 actions running)
[4,580 / 7,107] Compiling src/cpu/cpu_convolution_list.cpp; 7s local ... (16 actions, 15 running)
[5,414 / 7,107] Compiling tensorflow/compiler/xla/service/cpu/runtime_single_threaded_matmul.cc; 24s local ... (16 actions, 15 running)
[6,024 / 7,107] Compiling tensorflow/compiler/mlir/tensorflow/ir/tf_ops_a_m.cc; 65s local ... (16 actions running)
[6,298 / 7,107] Compiling tensorflow/core/util/batch_util.cc; 62s local ... (16 actions, 15 running)
[6,603 / 7,107] Compiling tensorflow/compiler/mlir/tensorflow/ir/tf_ops.cc; 208s local ... (16 actions, 15 running)
[6,821 / 7,107] Compiling tensorflow/compiler/mlir/tensorflow/transforms/mark_ops_for_outside_compilation.cc; 84s local ... (16 actions, 15 running)
[6,963 / 7,107] Compiling tensorflow/core/kernels/resource_variable_ops.cc; 133s local ... (16 actions, 15 running)
Target //tensorflow/compiler/xla/extension:xla_extension up-to-date:
  bazel-bin/tensorflow/compiler/xla/extension/xla_extension.tar.gz
INFO: Elapsed time: 1779.200s, Critical Path: 274.62s
INFO: 7107 processes: 574 internal, 6533 local.
INFO: Build completed successfully, 7107 total actions
INFO: Build completed successfully, 7107 total actions
==> complex
Compiling 2 files (.ex)
Generated complex app
==> nx
Compiling 24 files (.ex)
Generated nx app
==> exla
Unpacking /root/.cache/xla/0.3.0/cache/build/xla_extension-x86_64-linux-cpu.tar.gz into /workspaces/exla_compile_test/deps/exla/cache
g++ -fPIC -I/usr/local/lib/erlang/erts-12.3.1/include -Icache/xla_extension/include -O3 -Wall -Wno-sign-compare -Wno-unused-parameter -Wno-missing-field-initializers -Wno-comment -shared -std=c++14 c_src/exla/exla.cc c_src/exla/exla_nif_util.cc c_src/exla/exla_client.cc -o cache/libexla.so -Lcache/xla_extension/lib -lxla_extension -Wl,-rpath,'$ORIGIN/lib'
Compiling 21 files (.ex)
Generated exla app
==> exla_compile_test
Unchecked dependencies for environment prod:
* xla (Hex package)
  could not find an app file at "_build/prod/lib/xla/ebin/xla.app". This may happen if the dependency was not yet compiled or the dependency indeed has no app file (then you can pass app: false as option)
** (Mix) Can't continue due to errors on dependencies

Happens with {:exla, "~> 0.2"} on a new project. The compilation seems to work fine though. XLA service is initialized and StreamExecutor can find a device.

No error is raised for subsequent compiles.

@josevalim
Copy link
Collaborator

This is weird because it even says at the beginning that the app was compiled defined. What happens if you do mix deps.compile xla? What is in "_build/prod/lib/xla"?

@jnnks
Copy link
Author

jnnks commented Jul 19, 2022

What happens if you do mix deps.compile xla?

nothing, no output


What is in "_build/prod/lib/xla"?

see below

@jnnks
Copy link
Author

jnnks commented Jul 19, 2022

What is in "_build/prod/lib/xla"?

Nothing after the first compilation. Only after the second time, contents appear, including _build/prod/lib/xla/ebin/xla.app:

iex -S mix
$ XLA_BUILD=true  MIX_ENV=prod iex -S mix

Erlang/OTP 24 [erts-12.3.1] [source] [64-bit] [smp:16:16] [ds:16:16:10] [async-threads:1] [jit]

==> xla
Compiling 2 files (.ex)
Generated xla app
make: '/root/.cache/xla/0.3.0/cache/build/xla_extension-x86_64-linux-cpu.tar.gz' is up to date.
==> exla_compile_test
Compiling 1 file (.ex)
Generated exla_compile_test app
Interactive Elixir (1.13.4) - press Ctrl+C to exit (type h() ENTER for help)
iex(1)> ExlaCompileTest.hello

13:11:07.188 [info]  XLA service 0x7f6a4c0394e0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
 
13:11:07.201 [info]    StreamExecutor device (0): Host, Default Version
#Nx.Tensor<
  s64
  EXLA.Backend<host:0, 0.2267202891.2216558624.17390> 
  3
>

similar situation with mix deps.compile xla. no error, _build/prod/lib/xla/ebin/xla.app exists afterwards

@josevalim
Copy link
Collaborator

@jnnks can you please try this:

rm -rf _build
rm -rf deps
mix deps.get
XLA_BUILD=true  MIX_ENV=prod mix deps.compile xla
tree _build/prod/lib/xla
XLA_BUILD=true  MIX_ENV=prod mix deps.compile exla
tree _build/prod/lib/xla

I am suspecting exla compilation is the one erasing it somehow.

@jnnks
Copy link
Author

jnnks commented Jul 20, 2022

for some reason the first mix deps.compile xla does not complete, but the second does. (Mix 1.13.4)

logs
$ rm -rf _build
$ rm -rf deps
$ mix deps.get
    Resolving Hex dependencies...
    Dependency resolution completed:
    Unchanged:
    complex 0.4.1
    elixir_make 0.6.3
    exla 0.2.3
    nx 0.2.1
    xla 0.3.0
    * Getting exla (Hex package)
    * Getting elixir_make (Hex package)
    * Getting nx (Hex package)
    * Getting xla (Hex package)
    * Getting complex (Hex package)


$ XLA_BUILD=true  MIX_ENV=prod mix deps.compile xla
    ==> xla
    Compiling 2 files (.ex)
    Generated xla app
    ==> elixir_make
    Compiling 1 file (.ex)
    Generated elixir_make app
    ==> xla
    Unchecked dependencies for environment prod:
    * elixir_make (Hex package)
    the dependency build is outdated, please run "MIX_ENV=prod mix deps.compile"
    could not compile dependency :xla, "mix compile" failed. Errors may have been logged above. You can recompile this dependency with "mix deps.compile xla", update it with "mix deps.update xla" or clean it with "mix deps.clean xla"
    ==> exla_compile_test
    ** (Mix) Can't continue due to errors on dependencies


$ tree _build/prod/lib/xla
    _build/prod/lib/xla
    └── ebin
        ├── Elixir.Mix.Tasks.Xla.Info.beam
        ├── Elixir.XLA.beam
        └── xla.app

    1 directory, 3 files


$ XLA_BUILD=true  MIX_ENV=prod mix deps.compile xla
    ==> xla
    Compiling 2 files (.ex)
    Generated xla app
    rm -f /root/.cache/xla_extension/tf-3f878cff5b698b82eea85db2b60d65a2e320850e/tensorflow/compiler/xla/extension && \
            ln -s "/workspaces/exla_compile_test/deps/xla/extension" /root/.cache/xla_extension/tf-3f878cff5b698b82eea85db2b60d65a2e320850e/tensorflow/compiler/xla/extension && \
            cd /root/.cache/xla_extension/tf-3f878cff5b698b82eea85db2b60d65a2e320850e && \
            bazel build --define "framework_shared_object=false" -c opt    //tensorflow/compiler/xla/extension:xla_extension && \
            mkdir -p /root/.cache/xla/0.3.0/cache/build/ && \
            cp -f /root/.cache/xla_extension/tf-3f878cff5b698b82eea85db2b60d65a2e320850e/bazel-bin/tensorflow/compiler/xla/extension/xla_extension.tar.gz /root/.cache/xla/0.3.0/cache/build/xla_extension-x86_64-linux-cpu.tar.gz
    INFO: Options provided by the client:
    Inherited 'common' options: --isatty=0 --terminal_columns=80
    INFO: Reading rc options for 'build' from /root/.cache/xla_extension/tf-3f878cff5b698b82eea85db2b60d65a2e320850e/.bazelrc:
    Inherited 'common' options: --experimental_repo_remote_exec
    INFO: Reading rc options for 'build' from /root/.cache/xla_extension/tf-3f878cff5b698b82eea85db2b60d65a2e320850e/.bazelrc:
    'build' options: --define framework_shared_object=true --java_toolchain=@tf_toolchains//toolchains/java:tf_java_toolchain --host_java_toolchain=@tf_toolchains//toolchains/java:tf_java_toolchain --define=use_fast_cpp_protos=true --define=allow_oversize_protos=true --spawn_strategy=standalone -c opt --announce_rc --define=grpc_no_ares=true --noincompatible_remove_legacy_whole_archive --enable_platform_specific_config --define=with_xla_support=true --config=short_logs --config=v2 --define=no_aws_support=true --define=no_hdfs_support=true --experimental_cc_shared_library --deleted_packages=tensorflow/compiler/mlir/tfrt,tensorflow/compiler/mlir/tfrt/benchmarks,tensorflow/compiler/mlir/tfrt/jit/python_binding,tensorflow/compiler/mlir/tfrt/jit/transforms,tensorflow/compiler/mlir/tfrt/python_tests,tensorflow/compiler/mlir/tfrt/tests,tensorflow/compiler/mlir/tfrt/tests/analysis,tensorflow/compiler/mlir/tfrt/tests/jit,tensorflow/compiler/mlir/tfrt/tests/lhlo_to_tfrt,tensorflow/compiler/mlir/tfrt/tests/tf_to_corert,tensorflow/compiler/mlir/tfrt/tests/tf_to_tfrt_data,tensorflow/compiler/mlir/tfrt/tests/saved_model,tensorflow/compiler/mlir/tfrt/transforms/lhlo_gpu_to_tfrt_gpu,tensorflow/core/runtime_fallback,tensorflow/core/runtime_fallback/conversion,tensorflow/core/runtime_fallback/kernel,tensorflow/core/runtime_fallback/opdefs,tensorflow/core/runtime_fallback/runtime,tensorflow/core/runtime_fallback/util,tensorflow/core/tfrt/common,tensorflow/core/tfrt/eager,tensorflow/core/tfrt/eager/backends/cpu,tensorflow/core/tfrt/eager/backends/gpu,tensorflow/core/tfrt/eager/core_runtime,tensorflow/core/tfrt/eager/cpp_tests/core_runtime,tensorflow/core/tfrt/fallback,tensorflow/core/tfrt/gpu,tensorflow/core/tfrt/run_handler_thread_pool,tensorflow/core/tfrt/runtime,tensorflow/core/tfrt/saved_model,tensorflow/core/tfrt/saved_model/tests,tensorflow/core/tfrt/tpu,tensorflow/core/tfrt/utils
    INFO: Found applicable config definition build:short_logs in file /root/.cache/xla_extension/tf-3f878cff5b698b82eea85db2b60d65a2e320850e/.bazelrc: --output_filter=DONT_MATCH_ANYTHING
    INFO: Found applicable config definition build:v2 in file /root/.cache/xla_extension/tf-3f878cff5b698b82eea85db2b60d65a2e320850e/.bazelrc: --define=tf_api_version=2 --action_env=TF2_BEHAVIOR=1
    INFO: Found applicable config definition build:linux in file /root/.cache/xla_extension/tf-3f878cff5b698b82eea85db2b60d65a2e320850e/.bazelrc: --copt=-w --host_copt=-w --define=PREFIX=/usr --define=LIBDIR=$(PREFIX)/lib --define=INCLUDEDIR=$(PREFIX)/include --define=PROTOBUF_INCLUDE_PATH=$(PREFIX)/include --cxxopt=-std=c++14 --host_cxxopt=-std=c++14 --config=dynamic_kernels --distinct_host_configuration=false --experimental_guard_against_concurrent_changes
    INFO: Found applicable config definition build:dynamic_kernels in file /root/.cache/xla_extension/tf-3f878cff5b698b82eea85db2b60d65a2e320850e/.bazelrc: --define=dynamic_loaded_kernels=true --copt=-DAUTOLOAD_DYNAMIC_KERNELS
    Loading: 
    Loading: 0 packages loaded
    Analyzing: target //tensorflow/compiler/xla/extension:xla_extension (1 packages loaded, 0 targets configured)
    INFO: Analyzed target //tensorflow/compiler/xla/extension:xla_extension (1 packages loaded, 6 targets configured).
    INFO: Found 1 target...
    [0 / 3] [Prepa] BazelWorkspaceStatusAction stable-status.txt
    Target //tensorflow/compiler/xla/extension:xla_extension up-to-date:
    bazel-bin/tensorflow/compiler/xla/extension/xla_extension.tar.gz
    INFO: Elapsed time: 0.333s, Critical Path: 0.02s
    INFO: 1 process: 1 internal.
    INFO: Build completed successfully, 1 total action
    INFO: Build completed successfully, 1 total action


$ tree _build/prod/lib/xla
    _build/prod/lib/xla
    └── ebin
        ├── Elixir.Mix.Tasks.Xla.Info.beam
        ├── Elixir.XLA.beam
        └── xla.app

    1 directory, 3 files


$ XLA_BUILD=true  MIX_ENV=prod mix deps.compile exla
    ==> xla
    make: '/root/.cache/xla/0.3.0/cache/build/xla_extension-x86_64-linux-cpu.tar.gz' is up to date.
    ==> exla
    Unpacking /root/.cache/xla/0.3.0/cache/build/xla_extension-x86_64-linux-cpu.tar.gz into /workspaces/exla_compile_test/deps/exla/cache
    g++ -fPIC -I/usr/local/lib/erlang/erts-12.3.1/include -Icache/xla_extension/include -O3 -Wall -Wno-sign-compare -Wno-unused-parameter -Wno-missing-field-initializers -Wno-comment -shared -std=c++14 c_src/exla/exla.cc c_src/exla/exla_nif_util.cc c_src/exla/exla_client.cc -o cache/libexla.so -Lcache/xla_extension/lib -lxla_extension -Wl,-rpath,'$ORIGIN/lib'
    Compiling 21 files (.ex)
    warning: @behaviour Nx.Defn.Compiler does not exist (in module EXLA)
    lib/exla.ex:1: EXLA (module)

    warning: got "@impl true" for function __jit__/5 but no behaviour specifies such callback. There are no known callbacks, please specify the proper @behaviour and make sure it defines callbacks
    lib/exla.ex:369: EXLA (module)

    warning: got "@impl true" for function __stream__/7 but no behaviour specifies such callback. There are no known callbacks, please specify the proper @behaviour and make sure it defines callbacks
    lib/exla.ex:372: EXLA (module)


    == Compilation error in file lib/exla/defn/stream.ex ==
    ** (ArgumentError) could not load module Nx.Stream due to reason :unavailable
        (elixir 1.13.4) lib/protocol.ex:315: Protocol.assert_protocol!/2
        lib/exla/defn/stream.ex:58: (module)
    could not compile dependency :exla, "mix compile" failed. Errors may have been logged above. You can recompile this dependency with "mix deps.compile exla", update it with "mix deps.update exla" or clean it with "mix deps.clean exla"


$ tree _build/prod/lib/xla
    _build/prod/lib/xla
    └── ebin
        ├── Elixir.Mix.Tasks.Xla.Info.beam
        ├── Elixir.XLA.beam
        └── xla.app

    1 directory, 3 files

@josevalim
Copy link
Collaborator

josevalim commented Jul 20, 2022

Ok, I missed some deps, sorry! it should have been this instead:

rm -rf _build
rm -rf deps
mix deps.get
XLA_BUILD=true  MIX_ENV=prod mix deps.compile elixir_make xla
tree _build/prod/lib/xla
XLA_BUILD=true  MIX_ENV=prod mix deps.compile complex nx exla
tree _build/prod/lib/xla

maybe complex is not required… but I think XLA will be there on both runs.

@jnnks
Copy link
Author

jnnks commented Jul 20, 2022

_build/prod/lib/xla/ebin/xla.app is present both times

more logs
$ rm -rf _build
$ rm -rf deps
$ mix deps.get
    Resolving Hex dependencies...
    Dependency resolution completed:
    Unchanged:
    complex 0.4.1
    elixir_make 0.6.3
    exla 0.2.3
    nx 0.2.1
    xla 0.3.0
    * Getting exla (Hex package)
    * Getting elixir_make (Hex package)
    * Getting nx (Hex package)
    * Getting xla (Hex package)
    * Getting complex (Hex package)


$ XLA_BUILD=true  MIX_ENV=prod mix deps.compile elixir_make xla
    ==> elixir_make
    Compiling 1 file (.ex)
    Generated elixir_make app
    ==> xla
    Compiling 2 files (.ex)
    Generated xla app
    rm -f /root/.cache/xla_extension/tf-3f878cff5b698b82eea85db2b60d65a2e320850e/tensorflow/compiler/xla/extension && \
            ln -s "/workspaces/exla_compile_test/deps/xla/extension" /root/.cache/xla_extension/tf-3f878cff5b698b82eea85db2b60d65a2e320850e/tensorflow/compiler/xla/extension && \
            cd /root/.cache/xla_extension/tf-3f878cff5b698b82eea85db2b60d65a2e320850e && \
            bazel build --define "framework_shared_object=false" -c opt    //tensorflow/compiler/xla/extension:xla_extension && \
            mkdir -p /root/.cache/xla/0.3.0/cache/build/ && \
            cp -f /root/.cache/xla_extension/tf-3f878cff5b698b82eea85db2b60d65a2e320850e/bazel-bin/tensorflow/compiler/xla/extension/xla_extension.tar.gz /root/.cache/xla/0.3.0/cache/build/xla_extension-x86_64-linux-cpu.tar.gz
    INFO: Options provided by the client:
    Inherited 'common' options: --isatty=0 --terminal_columns=80
    INFO: Reading rc options for 'build' from /root/.cache/xla_extension/tf-3f878cff5b698b82eea85db2b60d65a2e320850e/.bazelrc:
    Inherited 'common' options: --experimental_repo_remote_exec
    INFO: Reading rc options for 'build' from /root/.cache/xla_extension/tf-3f878cff5b698b82eea85db2b60d65a2e320850e/.bazelrc:
    'build' options: --define framework_shared_object=true --java_toolchain=@tf_toolchains//toolchains/java:tf_java_toolchain --host_java_toolchain=@tf_toolchains//toolchains/java:tf_java_toolchain --define=use_fast_cpp_protos=true --define=allow_oversize_protos=true --spawn_strategy=standalone -c opt --announce_rc --define=grpc_no_ares=true --noincompatible_remove_legacy_whole_archive --enable_platform_specific_config --define=with_xla_support=true --config=short_logs --config=v2 --define=no_aws_support=true --define=no_hdfs_support=true --experimental_cc_shared_library --deleted_packages=tensorflow/compiler/mlir/tfrt,tensorflow/compiler/mlir/tfrt/benchmarks,tensorflow/compiler/mlir/tfrt/jit/python_binding,tensorflow/compiler/mlir/tfrt/jit/transforms,tensorflow/compiler/mlir/tfrt/python_tests,tensorflow/compiler/mlir/tfrt/tests,tensorflow/compiler/mlir/tfrt/tests/analysis,tensorflow/compiler/mlir/tfrt/tests/jit,tensorflow/compiler/mlir/tfrt/tests/lhlo_to_tfrt,tensorflow/compiler/mlir/tfrt/tests/tf_to_corert,tensorflow/compiler/mlir/tfrt/tests/tf_to_tfrt_data,tensorflow/compiler/mlir/tfrt/tests/saved_model,tensorflow/compiler/mlir/tfrt/transforms/lhlo_gpu_to_tfrt_gpu,tensorflow/core/runtime_fallback,tensorflow/core/runtime_fallback/conversion,tensorflow/core/runtime_fallback/kernel,tensorflow/core/runtime_fallback/opdefs,tensorflow/core/runtime_fallback/runtime,tensorflow/core/runtime_fallback/util,tensorflow/core/tfrt/common,tensorflow/core/tfrt/eager,tensorflow/core/tfrt/eager/backends/cpu,tensorflow/core/tfrt/eager/backends/gpu,tensorflow/core/tfrt/eager/core_runtime,tensorflow/core/tfrt/eager/cpp_tests/core_runtime,tensorflow/core/tfrt/fallback,tensorflow/core/tfrt/gpu,tensorflow/core/tfrt/run_handler_thread_pool,tensorflow/core/tfrt/runtime,tensorflow/core/tfrt/saved_model,tensorflow/core/tfrt/saved_model/tests,tensorflow/core/tfrt/tpu,tensorflow/core/tfrt/utils
    INFO: Found applicable config definition build:short_logs in file /root/.cache/xla_extension/tf-3f878cff5b698b82eea85db2b60d65a2e320850e/.bazelrc: --output_filter=DONT_MATCH_ANYTHING
    INFO: Found applicable config definition build:v2 in file /root/.cache/xla_extension/tf-3f878cff5b698b82eea85db2b60d65a2e320850e/.bazelrc: --define=tf_api_version=2 --action_env=TF2_BEHAVIOR=1
    INFO: Found applicable config definition build:linux in file /root/.cache/xla_extension/tf-3f878cff5b698b82eea85db2b60d65a2e320850e/.bazelrc: --copt=-w --host_copt=-w --define=PREFIX=/usr --define=LIBDIR=$(PREFIX)/lib --define=INCLUDEDIR=$(PREFIX)/include --define=PROTOBUF_INCLUDE_PATH=$(PREFIX)/include --cxxopt=-std=c++14 --host_cxxopt=-std=c++14 --config=dynamic_kernels --distinct_host_configuration=false --experimental_guard_against_concurrent_changes
    INFO: Found applicable config definition build:dynamic_kernels in file /root/.cache/xla_extension/tf-3f878cff5b698b82eea85db2b60d65a2e320850e/.bazelrc: --define=dynamic_loaded_kernels=true --copt=-DAUTOLOAD_DYNAMIC_KERNELS
    Loading: 
    Loading: 0 packages loaded
    Analyzing: target //tensorflow/compiler/xla/extension:xla_extension (1 packages loaded, 0 targets configured)
    INFO: Analyzed target //tensorflow/compiler/xla/extension:xla_extension (1 packages loaded, 6 targets configured).
    INFO: Found 1 target...
    [0 / 3] [Prepa] BazelWorkspaceStatusAction stable-status.txt
    Target //tensorflow/compiler/xla/extension:xla_extension up-to-date:
    bazel-bin/tensorflow/compiler/xla/extension/xla_extension.tar.gz
    INFO: Elapsed time: 0.300s, Critical Path: 0.02s
    INFO: 1 process: 1 internal.
    INFO: Build completed successfully, 1 total action
    INFO: Build completed successfully, 1 total action


$ tree _build/prod/lib/xla
    _build/prod/lib/xla
    └── ebin
        ├── Elixir.Mix.Tasks.Xla.Info.beam
        ├── Elixir.XLA.beam
        └── xla.app

    1 directory, 3 files


$ XLA_BUILD=true  MIX_ENV=prod mix deps.compile complex nx exla
    ==> complex
    Compiling 2 files (.ex)
    Generated complex app
    ==> nx
    Compiling 24 files (.ex)
    Generated nx app
    ==> xla
    make: '/root/.cache/xla/0.3.0/cache/build/xla_extension-x86_64-linux-cpu.tar.gz' is up to date.
    ==> exla
    Unpacking /root/.cache/xla/0.3.0/cache/build/xla_extension-x86_64-linux-cpu.tar.gz into /workspaces/exla_compile_test/deps/exla/cache
    g++ -fPIC -I/usr/local/lib/erlang/erts-12.3.1/include -Icache/xla_extension/include -O3 -Wall -Wno-sign-compare -Wno-unused-parameter -Wno-missing-field-initializers -Wno-comment -shared -std=c++14 c_src/exla/exla.cc c_src/exla/exla_nif_util.cc c_src/exla/exla_client.cc -o cache/libexla.so -Lcache/xla_extension/lib -lxla_extension -Wl,-rpath,'$ORIGIN/lib'
    Compiling 21 files (.ex)
    Generated exla app


$ tree _build/prod/lib/xla
    _build/prod/lib/xla
    └── ebin
        ├── Elixir.Mix.Tasks.Xla.Info.beam
        ├── Elixir.XLA.beam
        └── xla.app

    1 directory, 3 files

@josevalim
Copy link
Collaborator

So when does it disappear?!?! Only on “mix compile”?

@jnnks
Copy link
Author

jnnks commented Jul 20, 2022

Seems like the problem only appears when building XLA from scratch.
All the other times a cached archive has been used. Could that play a role?

@josevalim
Copy link
Collaborator

Sounds like it but i was hoping the instructions above could reproduce it. If you finally do a mix compile at the end of the last instructions, it is that when xla.app finally disappears?

@jnnks
Copy link
Author

jnnks commented Jul 21, 2022

Nope, still there :)
I'll let the full build run later with a directory watcher to see if the file ever existed

@josevalim
Copy link
Collaborator

Schrodinger's xla.app. 😄

Thank you for digging deeper!

@jnnks
Copy link
Author

jnnks commented Jul 24, 2022

Looks like it was in fact deleted during the build process.

First Run (fails)

XLA_BUILD=true MIX_ENV=prod mix compile
$ XLA_BUILD=true  MIX_ENV=prod mix compile
==> elixir_make
Compiling 1 file (.ex)
Generated elixir_make app
==> xla
Compiling 2 files (.ex)
Generated xla app
mkdir -p /root/.cache/xla_extension/tf-3f878cff5b698b82eea85db2b60d65a2e320850e && \
        cd /root/.cache/xla_extension/tf-3f878cff5b698b82eea85db2b60d65a2e320850e && \
        git init && \
        git remote add origin https://github.com/tensorflow/tensorflow.git && \
        git fetch --depth 1 origin 3f878cff5b698b82eea85db2b60d65a2e320850e && \
        git checkout FETCH_HEAD && \
        rm /root/.cache/xla_extension/tf-3f878cff5b698b82eea85db2b60d65a2e320850e/.bazelversion
hint: Using 'master' as the name for the initial branch. This default branch name
hint: is subject to change. To configure the initial branch name to use in all
hint: of your new repositories, which will suppress this warning, call:
hint: 
hint:   git config --global init.defaultBranch <name>
hint: 
hint: Names commonly chosen instead of 'master' are 'main', 'trunk' and
hint: 'development'. The just-created branch can be renamed via this command:
hint: 
hint:   git branch -m <name>
Initialized empty Git repository in /root/.cache/xla_extension/tf-3f878cff5b698b82eea85db2b60d65a2e320850e/.git/
From https://github.com/tensorflow/tensorflow
 * branch              3f878cff5b698b82eea85db2b60d65a2e320850e -> FETCH_HEAD
Note: switching to 'FETCH_HEAD'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:

  git switch -c <new-branch-name>

Or undo this operation with:

  git switch -

Turn off this advice by setting config variable advice.detachedHead to false

HEAD is now at 3f878cff Merge pull request #54226 from tensorflow-jenkins/version-numbers-2.8.0-22199
rm -f /root/.cache/xla_extension/tf-3f878cff5b698b82eea85db2b60d65a2e320850e/tensorflow/compiler/xla/extension && \
        ln -s "/workspaces/exla_compile_test/deps/xla/extension" /root/.cache/xla_extension/tf-3f878cff5b698b82eea85db2b60d65a2e320850e/tensorflow/compiler/xla/extension && \
        cd /root/.cache/xla_extension/tf-3f878cff5b698b82eea85db2b60d65a2e320850e && \
        bazel build --define "framework_shared_object=false" -c opt    //tensorflow/compiler/xla/extension:xla_extension && \
        mkdir -p /root/.cache/xla/0.3.0/cache/build/ && \
        cp -f /root/.cache/xla_extension/tf-3f878cff5b698b82eea85db2b60d65a2e320850e/bazel-bin/tensorflow/compiler/xla/extension/xla_extension.tar.gz /root/.cache/xla/0.3.0/cache/build/xla_extension-x86_64-linux-cpu.tar.gz
Extracting Bazel installation...
Starting local Bazel server and connecting to it...
INFO: Options provided by the client:
  Inherited 'common' options: --isatty=0 --terminal_columns=80
INFO: Reading rc options for 'build' from /root/.cache/xla_extension/tf-3f878cff5b698b82eea85db2b60d65a2e320850e/.bazelrc:
  Inherited 'common' options: --experimental_repo_remote_exec
INFO: Reading rc options for 'build' from /root/.cache/xla_extension/tf-3f878cff5b698b82eea85db2b60d65a2e320850e/.bazelrc:
  'build' options: --define framework_shared_object=true --java_toolchain=@tf_toolchains//toolchains/java:tf_java_toolchain --host_java_toolchain=@tf_toolchains//toolchains/java:tf_java_toolchain --define=use_fast_cpp_protos=true --define=allow_oversize_protos=true --spawn_strategy=standalone -c opt --announce_rc --define=grpc_no_ares=true --noincompatible_remove_legacy_whole_archive --enable_platform_specific_config --define=with_xla_support=true --config=short_logs --config=v2 --define=no_aws_support=true --define=no_hdfs_support=true --experimental_cc_shared_library --deleted_packages=tensorflow/compiler/mlir/tfrt,tensorflow/compiler/mlir/tfrt/benchmarks,tensorflow/compiler/mlir/tfrt/jit/python_binding,tensorflow/compiler/mlir/tfrt/jit/transforms,tensorflow/compiler/mlir/tfrt/python_tests,tensorflow/compiler/mlir/tfrt/tests,tensorflow/compiler/mlir/tfrt/tests/analysis,tensorflow/compiler/mlir/tfrt/tests/jit,tensorflow/compiler/mlir/tfrt/tests/lhlo_to_tfrt,tensorflow/compiler/mlir/tfrt/tests/tf_to_corert,tensorflow/compiler/mlir/tfrt/tests/tf_to_tfrt_data,tensorflow/compiler/mlir/tfrt/tests/saved_model,tensorflow/compiler/mlir/tfrt/transforms/lhlo_gpu_to_tfrt_gpu,tensorflow/core/runtime_fallback,tensorflow/core/runtime_fallback/conversion,tensorflow/core/runtime_fallback/kernel,tensorflow/core/runtime_fallback/opdefs,tensorflow/core/runtime_fallback/runtime,tensorflow/core/runtime_fallback/util,tensorflow/core/tfrt/common,tensorflow/core/tfrt/eager,tensorflow/core/tfrt/eager/backends/cpu,tensorflow/core/tfrt/eager/backends/gpu,tensorflow/core/tfrt/eager/core_runtime,tensorflow/core/tfrt/eager/cpp_tests/core_runtime,tensorflow/core/tfrt/fallback,tensorflow/core/tfrt/gpu,tensorflow/core/tfrt/run_handler_thread_pool,tensorflow/core/tfrt/runtime,tensorflow/core/tfrt/saved_model,tensorflow/core/tfrt/saved_model/tests,tensorflow/core/tfrt/tpu,tensorflow/core/tfrt/utils
INFO: Found applicable config definition build:short_logs in file /root/.cache/xla_extension/tf-3f878cff5b698b82eea85db2b60d65a2e320850e/.bazelrc: --output_filter=DONT_MATCH_ANYTHING
INFO: Found applicable config definition build:v2 in file /root/.cache/xla_extension/tf-3f878cff5b698b82eea85db2b60d65a2e320850e/.bazelrc: --define=tf_api_version=2 --action_env=TF2_BEHAVIOR=1
INFO: Found applicable config definition build:linux in file /root/.cache/xla_extension/tf-3f878cff5b698b82eea85db2b60d65a2e320850e/.bazelrc: --copt=-w --host_copt=-w --define=PREFIX=/usr --define=LIBDIR=$(PREFIX)/lib --define=INCLUDEDIR=$(PREFIX)/include --define=PROTOBUF_INCLUDE_PATH=$(PREFIX)/include --cxxopt=-std=c++14 --host_cxxopt=-std=c++14 --config=dynamic_kernels --distinct_host_configuration=false --experimental_guard_against_concurrent_changes
INFO: Found applicable config definition build:dynamic_kernels in file /root/.cache/xla_extension/tf-3f878cff5b698b82eea85db2b60d65a2e320850e/.bazelrc: --define=dynamic_loaded_kernels=true --copt=-DAUTOLOAD_DYNAMIC_KERNELS
Loading: 
Loading: 0 packages loaded
Loading: 0 packages loaded
Loading: 0 packages loaded
WARNING: Download from https://storage.googleapis.com/mirror.tensorflow.org/github.com/tensorflow/runtime/archive/c3e082762b7664bbc7ffd2c39e86464928e27c0c.tar.gz failed: class com.google.devtools.build.lib.bazel.repository.downloader.UnrecoverableHttpException GET returned 404 Not Found
Loading: 0 packages loaded
Loading: 0 packages loaded
Loading: 0 packages loaded
Loading: 0 packages loaded
Loading: 0 packages loaded
Loading: 0 packages loaded
Loading: 0 packages loaded
Loading: 0 packages loaded
Loading: 0 packages loaded
Loading: 0 packages loaded
Analyzing: target //tensorflow/compiler/xla/extension:xla_extension (1 packages loaded, 0 targets configured)
DEBUG: Rule 'io_bazel_rules_docker' indicated that a canonical reproducible form can be obtained by modifying arguments shallow_since = "1596824487 -0400"
DEBUG: Repository io_bazel_rules_docker instantiated at:
  /root/.cache/xla_extension/tf-3f878cff5b698b82eea85db2b60d65a2e320850e/WORKSPACE:23:14: in <toplevel>
  /root/.cache/xla_extension/tf-3f878cff5b698b82eea85db2b60d65a2e320850e/tensorflow/workspace0.bzl:108:34: in workspace
  /root/.cache/bazel/_bazel_root/2be90fa55f2d4383134ffe4aafd91de4/external/bazel_toolchains/repositories/repositories.bzl:35:23: in repositories
Repository rule git_repository defined at:
  /root/.cache/bazel/_bazel_root/2be90fa55f2d4383134ffe4aafd91de4/external/bazel_tools/tools/build_defs/repo/git.bzl:199:33: in <toplevel>
Analyzing: target //tensorflow/compiler/xla/extension:xla_extension (146 packages loaded, 4023 targets configured)
INFO: Analyzed target //tensorflow/compiler/xla/extension:xla_extension (188 packages loaded, 16972 targets configured).
INFO: Found 1 target...
[0 / 11] [Prepa] Writing file tensorflow/compiler/xla/extension/xla_extension.args
[74 / 219] Compiling src/google/protobuf/compiler/java/java_message.cc; 4s local ... (8 actions running)
[134 / 219] Compiling src/google/protobuf/descriptor.cc; 13s local ... (8 actions running)
[268 / 670] Compiling mlir/tools/mlir-tblgen/AttrOrTypeDefGen.cpp; 5s local ... (8 actions running)
[383 / 670] Compiling llvm/lib/Support/ItaniumManglingCanonicalizer.cpp; 7s local ... (8 actions running)
[532 / 999] Compiling llvm/lib/Support/SourceMgr.cpp; 2s local ... (8 actions running)
[749 / 999] Compiling mlir/lib/IR/MLIRContext.cpp; 7s local ... (8 actions running)
[1,084 / 1,366] Compiling llvm/lib/Support/Signals.cpp; 1s local ... (8 actions running)
[1,240 / 1,488] Compiling llvm/utils/TableGen/GlobalISelEmitter.cpp; 28s local ... (8 actions running)
[2,217 / 7,107] Compiling tensorflow/core/util/test_log.pb.cc; 7s local ... (8 actions running)
[2,365 / 7,107] Compiling tensorflow/core/framework/variant_op_registry.cc; 8s local ... (8 actions running)
[2,438 / 7,107] Compiling tensorflow/core/util/batch_util.cc; 55s local ... (8 actions running)
[2,578 / 7,107] Compiling tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc; 22s local ... (8 actions, 7 running)
[2,906 / 7,107] Compiling tensorflow/compiler/mlir/xla/transforms/xla_legalize_tf.cc; 37s local ... (7 actions running)
[3,137 / 7,107] Compiling tensorflow/compiler/mlir/xla/transforms/legalize_tf.cc; 81s local ... (8 actions running)
[3,258 / 7,107] Compiling tensorflow/compiler/mlir/xla/transforms/legalize_tf.cc; 218s local ... (8 actions running)
[3,414 / 7,107] Compiling tensorflow/compiler/tf2xla/kernels/categorical_op.cc; 20s local ... (8 actions running)
[3,519 / 7,107] Compiling tensorflow/compiler/mlir/tensorflow/transforms/tpu_dynamic_layout_pass.cc; 152s local ... (8 actions running)
[3,632 / 7,107] Compiling mlir/lib/Dialect/LLVMIR/IR/LLVMDialect.cpp; 113s local ... (8 actions running)
[3,786 / 7,107] Compiling tensorflow/compiler/mlir/tensorflow/transforms/gpu_fusion.cc; 22s local ... (8 actions running)
[3,903 / 7,107] Compiling tensorflow/compiler/mlir/tensorflow/transforms/convert_launch_func_to_tf_call.cc; 36s local ... (8 actions running)
[4,103 / 7,107] Compiling tensorflow/core/kernels/transpose_functor_cpu.cc; 31s local ... (8 actions running)
[4,306 / 7,107] Compiling tensorflow/compiler/xla/service/cpu/runtime_matmul.cc; 42s local ... (8 actions running)
[4,469 / 7,107] Compiling tensorflow/compiler/xla/service/cpu/runtime_matmul.cc; 462s local ... (8 actions running)
[4,685 / 7,107] Compiling tensorflow/core/kernels/resource_variable_ops.cc; 157s local ... (8 actions, 7 running)
[5,185 / 7,107] Compiling tensorflow/compiler/mlir/tensorflow/ir/tf_ops_n_z.cc; 213s local ... (8 actions running)
[5,845 / 7,107] Compiling src/cpu/rnn/ref_rnn.cpp; 79s local ... (8 actions running)
[6,502 / 7,107] Compiling tensorflow/compiler/mlir/tensorflow/ir/tf_ops.cc; 486s local ... (8 actions running)
Target //tensorflow/compiler/xla/extension:xla_extension up-to-date:
  bazel-bin/tensorflow/compiler/xla/extension/xla_extension.tar.gz
INFO: Elapsed time: 6320.181s, Critical Path: 784.24s
INFO: 7107 processes: 574 internal, 6533 local.
INFO: Build completed successfully, 7107 total actions
INFO: Build completed successfully, 7107 total actions
==> complex
Compiling 2 files (.ex)
Generated complex app
==> nx
Compiling 24 files (.ex)
Compiling lib/nx/binary_backend.ex (it's taking more than 10s)
Generated nx app
==> exla
Unpacking /root/.cache/xla/0.3.0/cache/build/xla_extension-x86_64-linux-cpu.tar.gz into /workspaces/exla_compile_test/deps/exla/cache
g++ -fPIC -I/usr/local/lib/erlang/erts-12.3.2.2/include -Icache/xla_extension/include -O3 -Wall -Wno-sign-compare -Wno-unused-parameter -Wno-missing-field-initializers -Wno-comment -shared -std=c++14 c_src/exla/exla.cc c_src/exla/exla_nif_util.cc c_src/exla/exla_client.cc -o cache/libexla.so -Lcache/xla_extension/lib -lxla_extension -Wl,-rpath,'$ORIGIN/lib'
Compiling 21 files (.ex)
Generated exla app
==> exla_compile_test
Unchecked dependencies for environment prod:
* xla (Hex package)
  could not find an app file at "_build/prod/lib/xla/ebin/xla.app". This may happen if the dependency was not yet compiled or the dependency indeed has no app file (then you can pass app: false as option)
** (Mix) Can't continue due to errors on dependencies
inotifywait -m -r .
Watches established.
...
./_build/prod/lib/xla/ ACCESS,ISDIR ebin
./_build/prod/lib/xla/ebin/ ACCESS,ISDIR 
./_build/prod/lib/xla/ CLOSE_NOWRITE,CLOSE,ISDIR ebin
./_build/prod/lib/xla/ebin/ CLOSE_NOWRITE,CLOSE,ISDIR 
./_build/prod/lib/xla/ebin/ DELETE Elixir.Mix.Tasks.Xla.Info.beam

./_build/prod/lib/xla/ebin/ DELETE xla.app    <---- HERE

./_build/prod/lib/xla/ebin/ DELETE Elixir.XLA.beam
./_build/prod/lib/xla/ebin/ DELETE_SELF 
./_build/prod/lib/xla/ DELETE,ISDIR ebin
./_build/prod/lib/xla/ OPEN,ISDIR .mix
./_build/prod/lib/xla/.mix/ OPEN,ISDIR 
...

Second Run (success)

XLA_BUILD=true MIX_ENV=prod mix compile
$ XLA_BUILD=true  MIX_ENV=prod mix compile
==> xla
Compiling 2 files (.ex)
Generated xla app
make: '/root/.cache/xla/0.3.0/cache/build/xla_extension-x86_64-linux-cpu.tar.gz' is up to date.
==> exla_compile_test
Compiling 1 file (.ex)
Generated exla_compile_test app

The inotify logs are very long, so I am not posting it in here, but can attach it somewhere if necessary.
See below

@josevalim
Copy link
Collaborator

Awesome @jnnks! Can you please post the 100 entries before and after the DELETE?

@jnnks
Copy link
Author

jnnks commented Jul 24, 2022

Here are the entire logs :D

1st Run: https://gist.github.com/jnnks/88f2cda21064d0bb109a42ec4b701cb2
DELETE is at line 797

2nd Run: https://gist.github.com/jnnks/ad8a25419b3d84a6cef83b9892a926e3

@josevalim
Copy link
Collaborator

@jonatanklosko so this is caused by the explicit deps.compile xla alias inside EXLA. Do you remember why it is needed?

@jonatanklosko
Copy link
Member

jonatanklosko commented Jul 25, 2022

nx/exla/mix.exs

Lines 26 to 29 in 2769f4a

# We want to always trigger XLA compilation when XLA_BUILD is set,
# otherwise its Makefile will run only upon the initial compilation
compilers:
if(xla_build?(), do: [:xla], else: []) ++ [:exla, :elixir_make] ++ Mix.compilers(),

Without that, xla is compiled once and changing XLA_TARGET has no effect, because the Makefile doesn't run again.

@josevalim
Copy link
Collaborator

I think we will have to remove the xla_build? check and tell them that setting it to true requires an explicit call to mix deps.compile xla. Another option is to move use config :xla, :force_build, true | false, because we can at least encode that it compile_env which can warn/raise if you change it and you don't recompile. But for now I would go with docs only. WDYT?

@jonatanklosko
Copy link
Member

jonatanklosko commented Jul 25, 2022

The config would only handle XLA_BUILD changing, but what if XLA_TARGET changes?

Updating the docs sounds good, though this change may cause some confusion for people relying on XLA_BUILD already.

@josevalim
Copy link
Collaborator

The issue is only with mix deps.compile xla and we only call it with XLA_BUILD is set. I will send a PR to make sure we are on the same page. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants