Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compile issues #3

Closed
adbeggs opened this issue May 20, 2022 · 39 comments
Closed

Compile issues #3

adbeggs opened this issue May 20, 2022 · 39 comments
Assignees

Comments

@adbeggs
Copy link

adbeggs commented May 20, 2022

Trying to compile on the P24 (also have tried our A100 HPC). getting the following:

-- Found HDF5: /usr/lib/x86_64-linux-gnu/hdf5/serial/libhdf5.so;/usr/lib/x86_64-linux-gnu/libpthread.so;/usr/lib/x86_64-linux-gnu/libsz.so;/usr/lib/x86_64-linux-gnu/libz.so;/usr/lib/x86_64-linux-gnu/libdl.so;/usr/lib/x86_64-linux-gnu/libm.so (found suitable version "1.10.4", minimum required is "1.8.16")
CMake Error at /usr/local/share/cmake-3.23/Modules/FindPackageHandleStandardArgs.cmake:230 (message):
  Could NOT find zstd (missing: ZSTD_LIBRARY ZSTD_INCLUDE_DIR) (Required is
  at least version "1.3.1")
Call Stack (most recent call first):
  /usr/local/share/cmake-3.23/Modules/FindPackageHandleStandardArgs.cmake:594 (_FPHSA_FAILURE_MESSAGE)
  dorado/3rdparty/hdf_plugins/cmake/Findzstd.cmake:40 (find_package_handle_standard_args)
  dorado/3rdparty/hdf_plugins/CMakeLists.txt:146 (find_package)

Any ideas? Initially it was complaining that it couldn't find hdf at all so I sudo apt install libhdf5-dev

Thanks

Andrew

@adbeggs
Copy link
Author

adbeggs commented May 20, 2022

Actually scratch that I solved it with sudo apt install libzstd-dev... there are some undocumented dependencies that need putting on this repo:
cmake
libhdf5-dev
libzstd-dev

I'll add more as I get through the process

@mcrone
Copy link

mcrone commented May 20, 2022

I get this when running this command: cmake --build cmake-build --config Release -- -j

/home/xxx/Documents/dorado/dorado/decode/GPUDecoder.cpp:7:14: fatal error: lib.h: No such file or directory
    7 |     #include "lib.h"

lib.h doesn't seem to exist anywhere in the repository or code, other than in GPUDecoder.cpp.

@vellamike
Copy link
Collaborator

@adbeggs Thanks for flagging this - I documented the dependencies now.

@mcrone we are looking into your issue.

@vellamike vellamike self-assigned this May 20, 2022
@mcrone
Copy link

mcrone commented May 20, 2022

@vellamike I should provide more details. I’m compiling on a new Nvidia Jetson Orin development kit.

@MarkBicknellONT
Copy link
Collaborator

I get this when running this command: cmake --build cmake-build --config Release -- -j

/home/xxx/Documents/dorado/dorado/decode/GPUDecoder.cpp:7:14: fatal error: lib.h: No such file or directory
    7 |     #include "lib.h"

lib.h doesn't seem to exist anywhere in the repository or code, other than in GPUDecoder.cpp.

Hi - this header is part of the koi pre-built library which should be downloaded as part of the cmake configuration process. Please could you attach the output from the cmake -S . -B cmake-build step? Thanks

@MarkBicknellONT
Copy link
Collaborator

@vellamike I should provide more details. I’m compiling on a new Nvidia Jetson Orin development kit.

Ah, this will explain the problem. Arm Linux platforms are not currently supported.

@adbeggs
Copy link
Author

adbeggs commented May 20, 2022

Okay next issue fighting with CUDA (that's fixed now), but now I am getting:

-- Extracting pod5-0.0.14-Linux - done
CMake Error at CMakeLists.txt:79 (add_subdirectory):
  The source directory

    /home/beggsa/dorado/dorado/3rdparty/hdf_plugins

  does not contain a CMakeLists.txt file.
Call Stack (most recent call first):
  CMakeLists.txt:183 (add_hdf_vbz_plugin)

@adbeggs
Copy link
Author

adbeggs commented May 20, 2022

i.e. I meant another dependency is you need nvcc 11. If you install cuda from the Ubuntu repository you get v10

@MarkBicknellONT
Copy link
Collaborator

Okay next issue fighting with CUDA (that's fixed now), but now I am getting:

-- Extracting pod5-0.0.14-Linux - done
CMake Error at CMakeLists.txt:79 (add_subdirectory):
  The source directory

    /home/beggsa/dorado/dorado/3rdparty/hdf_plugins

  does not contain a CMakeLists.txt file.
Call Stack (most recent call first):
  CMakeLists.txt:183 (add_hdf_vbz_plugin)

The dorado/3rdparty/hdf_plugins folder is a submodule which should have been populated during the cmake -S . -B cmake-build step. Can you send us the output from that step please? Is there currently anything in that folder?

@adbeggs
Copy link
Author

adbeggs commented May 20, 2022

No, there's nothing in that folder:

(base) beggsa@PC24B069:~/dorado$ cmake -S . -B cmake-build
-- Submodule update
-- CUDA toolkit dir is /usr/local/cuda
-- Found pod5-0.0.14-Linux
CMake Error at CMakeLists.txt:79 (add_subdirectory):
  The source directory

    /home/beggsa/dorado/dorado/3rdparty/hdf_plugins

  does not contain a CMakeLists.txt file.
Call Stack (most recent call first):
  CMakeLists.txt:183 (add_hdf_vbz_plugin)


-- Found torch-1.10.2-Linux
-- Caffe2: CUDA detected: 11.7
-- Caffe2: CUDA nvcc is: /usr/local/cuda/bin/nvcc
-- Caffe2: CUDA toolkit directory: /usr/local/cuda
-- Caffe2: Header version is: 11.7
-- Found cuDNN: v8..  (include: /home/beggsa/dorado/dorado/3rdparty/fake_cudnn, library: /home/beggsa/dorado/dorado/3rdparty/fake_cudnn)
-- /usr/local/cuda/lib64/libnvrtc.so shorthash is d833c4f3
-- Autodetected CUDA architecture(s):  7.0 7.0
-- Added CUDA NVCC flags for: -gencode;arch=compute_70,code=sm_70
CMake Error at CMakeLists.txt:242 (add_subdirectory):
  The source directory

    /home/beggsa/dorado/dorado/3rdparty/elzip

  does not contain a CMakeLists.txt file.


-- Found koi_lib
Building Koi from prebuilt /home/beggsa/dorado/dorado/3rdparty/koi_lib
-- Configuring incomplete, errors occurred!
See also "/home/beggsa/dorado/cmake-build/CMakeFiles/CMakeOutput.log".
See also "/home/beggsa/dorado/cmake-build/CMakeFiles/CMakeError.log".
``
[cmakeerror.txt](https://github.com/nanoporetech/dorado/files/8738754/cmakeerror.txt)
[cmakelog.txt](https://github.com/nanoporetech/dorado/files/8738755/cmakelog.txt)
`

@MarkBicknellONT
Copy link
Collaborator

It looks like none of the submodules are populated for some reason. It also looks like this is reconfiguring into an existing cmake-build folder. could you try removing the build artefact folder and then re-running:

rm -rf cmake-build
cmake -S . -B cmake-build

@adbeggs
Copy link
Author

adbeggs commented May 20, 2022

Nope, same problem occurs...

@MarkBicknellONT
Copy link
Collaborator

MarkBicknellONT commented May 20, 2022

How about if you try to update the submodules manually:
git submodule update --init --recursive
What output do you get?

@adbeggs
Copy link
Author

adbeggs commented May 20, 2022

Before I got your comment I rm -rf'd the directory and started again and it compiled no probs. Will try running it now! Thanks for your help.

@adbeggs
Copy link
Author

adbeggs commented May 20, 2022

One final Q - are there R10.4.1 and E8.2 models available for Dorado? Or do I just use the 10.4 / e8.1 models?

@MarkBicknellONT
Copy link
Collaborator

There are no E8.2 models available at the moment, but we will be adding more run conditions soon. You can use dorado download --list to see the currently-available models.

@adbeggs
Copy link
Author

adbeggs commented May 20, 2022

Success - it's basecalling off the R10.4.1 data we have produced... I will report back

@adbeggs
Copy link
Author

adbeggs commented May 20, 2022

First observation - blimey it's fast! Even in SAC mode

@iiSeymour
Copy link
Member

@adbeggs Great to hear! I will add the kit 14 models for you tomorrow.

@mcrone
Copy link

mcrone commented May 20, 2022

@MarkBicknellONT any plans for the Linux Arm libraries? 🤞🏻Happy to work with you to get it up and running.

@iiSeymour
Copy link
Member

iiSeymour commented May 20, 2022

@mcrone we have some Orin devkits and will take a look for you next week.

@iiSeymour
Copy link
Member

@adbeggs I've added the latest kit 14 models in 193b134.

@vdejager
Copy link

I'm getting some errors while compiling (solved a lot of others btw, by loading the required packages on our HPC)

`
Building Koi from prebuilt /projects/0/lwc2020006/software/nanoporetech/dorado/dorado/3rdparty/koi_lib
-- Configuring done
WARNING: Target "dorado_lib" requests linking to directory "/projects/0/lwc2020006/software/nanoporetech/dorado/dorado/3rdparty/fake_cudnn". Targets may link only to libraries. CMake is dropping the item.
WARNING: Target "dorado" requests linking to directory "/projects/0/lwc2020006/software/nanoporetech/dorado/dorado/3rdparty/fake_cudnn". Targets may link only to libraries. CMake is dropping the item.
CMake Warning at CMakeLists.txt:327 (add_executable):
Cannot generate a safe runtime search path for target dorado because files
in some directories may conflict with libraries in implicit directories:

runtime library [libcufft.so.10] in /sw/arch/Centos8/EB_production/2021/software/CUDA/11.3.1/lib64 may be hidden by files in:
  /sw/arch/Centos8/EB_production/2021/software/CUDA/11.3.1/lib/stubs
runtime library [libcurand.so.10] in /sw/arch/Centos8/EB_production/2021/software/CUDA/11.3.1/lib64 may be hidden by files in:
  /sw/arch/Centos8/EB_production/2021/software/CUDA/11.3.1/lib/stubs
runtime library [libcublas.so.11] in /sw/arch/Centos8/EB_production/2021/software/CUDA/11.3.1/lib64 may be hidden by files in:
  /sw/arch/Centos8/EB_production/2021/software/CUDA/11.3.1/lib/stubs
runtime library [libnvrtc.so.11.2] in /sw/arch/Centos8/EB_production/2021/software/CUDA/11.3.1/lib may be hidden by files in:
  /sw/arch/Centos8/EB_production/2021/software/CUDA/11.3.1/lib/stubs

Some of these libraries may not be found correctly.

WARNING: Target "dorado_tests" requests linking to directory "/projects/0/lwc2020006/software/nanoporetech/dorado/dorado/3rdparty/fake_cudnn". Targets may link only to libraries. CMake is dropping the item.
CMake Warning at tests/CMakeLists.txt:7 (add_executable):
Cannot generate a safe runtime search path for target dorado_tests because
files in some directories may conflict with libraries in implicit
directories:

runtime library [libcufft.so.10] in /sw/arch/Centos8/EB_production/2021/software/CUDA/11.3.1/lib64 may be hidden by files in:
  /sw/arch/Centos8/EB_production/2021/software/CUDA/11.3.1/lib/stubs
runtime library [libcurand.so.10] in /sw/arch/Centos8/EB_production/2021/software/CUDA/11.3.1/lib64 may be hidden by files in:
  /sw/arch/Centos8/EB_production/2021/software/CUDA/11.3.1/lib/stubs
runtime library [libcublas.so.11] in /sw/arch/Centos8/EB_production/2021/software/CUDA/11.3.1/lib64 may be hidden by files in:
  /sw/arch/Centos8/EB_production/2021/software/CUDA/11.3.1/lib/stubs
runtime library [libnvrtc.so.11.2] in /sw/arch/Centos8/EB_production/2021/software/CUDA/11.3.1/lib may be hidden by files in:
  /sw/arch/Centos8/EB_production/2021/software/CUDA/11.3.1/lib/stubs

Some of these libraries may not be found correctly.

-- Generating done
-- Build files have been written to: /projects/0/lwc2020006/software/nanoporetech/dorado/cmake-build
`

any idea if this affects the final build?

@vellamike
Copy link
Collaborator

@vdejager Can you give details of the system (OS & architecture) you are building on.

It looks like you are getting warnings but no errors? I believe your final build will be OK, though we do need to resolve these warnings.

@vdejager
Copy link

vdejager commented May 23, 2022

Hi Mike, I'm building on our HPC system with A100's available for GPU, which we also use with Bonito.
Red Hat Enterprise Linux release 8.4 (Ootpa) on AMD EPYCs
(although the modules say its CentOS8)

vendor_id	: AuthenticAMD
cpu family	: 23
model		: 49
model name	: AMD EPYC 7F32 8-Core Processor
stepping	: 0
microcode	: 0x830104d
cpu MHz		: 1795.399
cache size	: 512 KB
physical id	: 1
siblings	: 16
core id		: 28
cpu cores	: 8
apicid		: 185
initial apicid	: 185
fpu		: yes
fpu_exception	: yes
cpuid level	: 16
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr wbnoinvd amd_ppin arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif umip rdpid overflow_recov succor smca
bugs		: sysret_ss_attrs spectre_v1 spectre_v2 spec_store_bypass
bogomips	: 7352.32
TLB size	: 3072 4K pages
clflush size	: 64
cache_alignment	: 64
address sizes	: 48 bits physical, 48 bits virtual
power management: ts ttp tm hwpstate cpb eff_freq_ro [13] [14]

Currently I have to load the following modules (as expected) and commands to get a binary in the cmake-build/bin directory

module load 2021
module load CMake/3.20.1-GCCcore-10.3.0
module load zstd/1.4.9-GCCcore-10.3.0
module load HDF5/1.10.7-iimpi-2021a
module load CUDA/11.3.1
module load OpenSSL/1.1

this does not work (paths taken from module show OpenSSL/1.1)

#export OPENSSL_CRYPTO_LIBRARY=/sw/arch/Centos8/EB_production/2021/software/OpenSSL/1.1/lib64
#export OPENSSL_ROOT_DIR=/sw/arch/Centos8/EB_production/2021/software/OpenSSL/1.1

original

#cmake -S . -B cmake-build

setting the libraries on the commandline works

cmake -S . -B cmake-build -DOPENSSL_INCLUDE_DIR=/sw/arch/Centos8/EB_production/2021/software/OpenSSL/1.1/include/openssl -DOPENSSL_SSL_LIBRARY=/sw/arch/Centos8/EB_production/2021/software/OpenSSL/1.1/lib64/libssl.so -DOPENSSL_CRYPTO_LIBRARY=/sw/arch/Centos8/EB_production/2021/software/OpenSSL/1.1/lib64/libcrypto.so

cmake --build cmake-build --config Release -- -j
ctest --test-dir cmake-build$ cmake -S . -B cmake-build
`

cmake --build produces a binary, but ctest just returns the prompt

@gringer
Copy link

gringer commented May 23, 2022

If CUDA is strictly needed, it should be mentioned in the dependencies in README.md. If not, then there's another compile issue:

CUDA_TOOLKIT_ROOT_DIR not found or specified
-- Could NOT find CUDA (missing: CUDA_TOOLKIT_ROOT_DIR CUDA_NVCC_EXECUTABLE CUDA_INCLUDE_DIRS CUDA_CUDART_LIBRARY) 
CMake Warning at dorado/3rdparty/torch-1.10.2-Linux/libtorch/share/cmake/Caffe2/public/cuda.cmake:31 (message):
  Caffe2: CUDA cannot be found.  Depending on whether you are building Caffe2
  or a Caffe2 dependent library, the next warning / error will give you more
  info.
Call Stack (most recent call first):
  dorado/3rdparty/torch-1.10.2-Linux/libtorch/share/cmake/Caffe2/Caffe2Config.cmake:88 (include)
  dorado/3rdparty/torch-1.10.2-Linux/libtorch/share/cmake/Torch/TorchConfig.cmake:68 (find_package)
  CMakeLists.txt:195 (find_package)

@vellamike
Copy link
Collaborator

@vdejager we haven't yet tested on CentOS - we are going to do this and update the README.

Does cmake --build produce the dorado binary? Does it work?

It is odd that ctest just returns the prompt - does it return nothing at all or report that there are no tests to run?

@vellamike
Copy link
Collaborator

@gringer the CUDA dependency is already noted in the DEV.md file but we need to make this more clear in the README.md

@gringer
Copy link

gringer commented Jun 2, 2022

This is compiling for me now, so some of the changes over the past week must have fixed whatever the problem was.

@iiSeymour
Copy link
Member

I think we've sorted all of the issues here now 🎉

@Kirk3gaard Kirk3gaard mentioned this issue May 23, 2023
tijyojwad pushed a commit that referenced this issue Nov 14, 2023
The following crash was seen on CI when compiling with GCC 8.4.0 and
with ASAN enabled:
  ==11805==ERROR: AddressSanitizer: SEGV on unknown address 0x001200000000 (pc 0xffff68c2296c bp 0xffffc1c19400 sp 0xffffc1c19400 T0)
  ==11805==The signal is caused by a WRITE memory access.
    #0 0xffff68c2296b  (/lib/aarch64-linux-gnu/libc.so.6+0x8396b)
    #1 0xffff876bedc3  (/usr/lib/aarch64-linux-gnu/libasan.so.5+0x36dc3)
    #2 0xaaaad510039b in _GLOBAL__sub_I_00099_1__ZN6dorado8splitter7subreadERKNS_11SimplexReadESt8optionalISt4pairImmEES6_ (dorado_tests+0x17639b)
    #3 0xaaaad5c9ea1f in __libc_csu_init (dorado_tests+0xd14a1f)

The static initialiser that calls into libasan is one of ASANs inserted
init functions (__asan_init, __asan_version_mismatch_check_v8, or
__asan_register_globals) though gdb refuses to ptrace so I can't tell
which. None of them should crash so this feels like a compiler bug,
though I'm unable to determine exactly what's triggering it since a
disassembly of the static initialiser doesn't change between the broken
and "fixed" versions of the code (plt addresses notwithstanding).
@krobik26 krobik26 mentioned this issue Jan 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants