Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compilation of custom operations failing on TF 2.15/CUDA 12 #1523

Open
Icemole opened this issue May 31, 2024 · 5 comments
Open

Compilation of custom operations failing on TF 2.15/CUDA 12 #1523

Icemole opened this issue May 31, 2024 · 5 comments

Comments

@Icemole
Copy link
Collaborator

Icemole commented May 31, 2024

Hi, the compilation of NativeLstm2.cc is failing with TF 2.15/CUDA 12, and it hadn't failed with TF 2.13/CUDA 11. A colleague of mine is also having similar issues when compiling GetCtcFsaFastBwOp.cc.

There are many errors that are being thrown out by the compiler, but most of them are rather "silly", like:

  1. error: expected a ";"
  2. error: function "Ndarray_get_n_total_elements" has already been defined
  3. error: name followed by "::" must be a class or namespace name

This leads me to think that the nvcc compiler might be doing weird stuff here, and as a consequence that the operationss don't work with CUDA 12 as they are. I was also told that TF might play a role here, so I also posted the TF versions. Could there be a redundant file? Maybe incompatible CUDA versions?

nvcc version where the compilation works:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0

nvcc version where the compilation doesn't work:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Aug_15_22:02:13_PDT_2023
Cuda compilation tools, release 12.2, V12.2.140
Build cuda_12.2.r12.2/compiler.33191640_0

Let me know if I can provide any further details. Thanks in advance!

@albertz
Copy link
Member

albertz commented May 31, 2024

Related is also #1513.

Can you post the full output?

@albertz
Copy link
Member

albertz commented May 31, 2024

Can you try to run on CPU only (export DISABLE_CUDA=1)?
Can you try to run test_TFNativeOp.py?

@Icemole
Copy link
Collaborator Author

Icemole commented Jun 4, 2024

Please find here the full output of the compilation on CUDA.

Answering your questions:

  1. A non-CUDA environment, CPU only works!
  2. python3 -m pytest test_TFNativeOp.py also works, but I'm not sure if I'm running the test with CUDA enabled (if that makes any difference for the test). Besides, there are some skipped tests as well as some warnings. I'm running these in a machine that has GPUs available. Please see the results below.
test_TFNativeOp.py ......................................................sssssss                                                                                    [100%] 
                                                                                                                                                                           
============================================================================ warnings summary =============================================================================
../../../../../../../../../../../.venvs/singularity/returnn_test_native_op/lib/python3.10/site-packages/nose/plugins/manager.py:418
  /home/nbeneitez/.venvs/singularity/returnn_test_native_op/lib/python3.10/site-packages/nose/plugins/manager.py:418: DeprecationWarning: pkg_resources is deprecated as an
 API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
    import pkg_resources

../../../../../../../../../../../.venvs/singularity/returnn_test_native_op/lib/python3.10/site-packages/nose/importer.py:12
  /home/nbeneitez/.venvs/singularity/returnn_test_native_op/lib/python3.10/site-packages/nose/importer.py:12: DeprecationWarning: the imp module is deprecated in favour of
 importlib and slated for removal in Python 3.12; see the module's documentation for alternative uses
    from imp import find_module, load_module, acquire_lock, release_lock

../../../../../../../../../../../.venvs/singularity/returnn_test_native_op/lib/python3.10/site-packages/numpy/__config__.py:155
  /home/nbeneitez/.venvs/singularity/returnn_test_native_op/lib/python3.10/site-packages/numpy/__config__.py:155: UserWarning: Install `pyyaml` for better output
    warnings.warn("Install `pyyaml` for better output", stacklevel=1)

tests/test_TFNativeOp.py::test_py_viterbi
  /home/nbeneitez/work/returnn/native_op_issue/work/i6_core/tools/git/CloneGitRepositoryJob.nH5B7CKRCU89/output/repository/tests/test_TFNativeOp.py:2224: RuntimeWarning: d
ivide by zero encountered in log
    am_scores = numpy.log(am_scores)  # in +log space

tests/test_TFNativeOp.py::test_fast_viterbi
  /home/nbeneitez/work/returnn/native_op_issue/work/i6_core/tools/git/CloneGitRepositoryJob.nH5B7CKRCU89/output/repository/tests/test_TFNativeOp.py:2277: RuntimeWarning: d
ivide by zero encountered in log
    am_scores = numpy.log(am_scores)  # in +log space

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
========================================================== 54 passed, 7 skipped, 5 warnings in 193.13s (0:03:13) ==========================================================

Should the test_TFNativeOp.py fail for me? As I said, I might be doing something wrong.

@albertz
Copy link
Member

albertz commented Jun 4, 2024

python3 -m pytest test_TFNativeOp.py also works

I assume you tested that with export DISABLE_CUDA=1, i.e. only for CPU? Can you also try with CUDA?

@albertz
Copy link
Member

albertz commented Jun 4, 2024

Note, the main error is error: name followed by "::" must be a class or namespace name on perftools:

/home/nbeneitez/work/returnn/native_op_issue/work/i6_core/tools/git/CloneGitRepositoryJob.nH5B7CKRCU89/output/repository/returnn/native_op.cpp(240): error: name followed by "::" must be a class or namespace name
  perftools::gputools::DeviceMemory<T> AsDeviceMemory(const T* cuda_memory) {
  ^

I guess they moved/renamed that. I see in other TF code that it is se::DeviceMemory<T> (or maybe tensorflow::se::DeviceMemory<T> or stream_executor::DeviceMemory<T> or so) now.

Similarly, in our static perftools::gputools::blas::Transpose get_transpose, I think it is stream_executor::blas::Transpose or so now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants