Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Umpire Segfaults during initialization of DEVICE_CONST #42

Closed
robinson96 opened this issue Jan 16, 2019 · 2 comments
Closed

Umpire Segfaults during initialization of DEVICE_CONST #42

robinson96 opened this issue Jan 16, 2019 · 2 comments
Assignees

Comments

@robinson96
Copy link
Collaborator

Describe the bug

Umpire Segfaults while creating the DEVICE_CONST allocator.

To Reproduce

I am using CHAI + UMPIRE in a large multiphysics code. Have not attempted to reproduce yet in a smaller executable. The problem occurs during initialization of umpire.

This is on a P8+ P100 system.

Expected behavior

Don't segfault.

Compilers & Libraries (please complete the following information):

  • Compiler & version:
    rzmanta23{probinso}95: /usr/tce/packages/spectrum-mpi/spectrum-mpi-2018.05.18-clang-coral-2018.04.17/bin/mpiclang++ --version
    clang version 3.8.0 (ibmgithub:/CORAL-LLVM-Compilers/clang.git c4747093b1b58b63a096b78ddcd716c7bd7e9c2c) (ibmgithub:/CORAL-LLVM-Compilers/llvm.git aa08e5a3c3670cd86fb4bee034a7626bb26ad57e)
    Target: powerpc64le-unknown-linux-gnu
    Thread model: posix
    InstalledDir: /usr/tce/packages/clang/clang-coral-2018.04.17/ibm/bin

  • CUDA version (if applicable): 9.2.88

Additional context
Umpire version:
f92f367 Merge pull request #39 from LLNL/feature/coalesce-only-when-coalesceable

Stack Trace:

#0 std::operator<< <char, std::char_traits<char>, std::allocator<char> > (__os=..., __str=...) at /usr/tce/packages/gcc/gcc-4.9.3/lib64/gcc/powerpc64le-unknown-linux-gnu/4.9.3/../../../../include/c++/4.9.3/bits/basic_string.h:2777 #1 umpire::util::Logger::logMessage (this=<optimized out>, level=<optimized out>, message=..., fileName=..., line=38) at /g/g18/probinso/ale3d/bugfixday/imports/umpire/src/umpire/util/Logger.cpp:61 #2 0x000000001560bea8 in umpire::resource::CudaConstantMemoryResource::CudaConstantMemoryResource (this=0x4aa1ddd0, name=..., id=<optimized out>, traits=...) at /g/g18/probinso/ale3d/bugfixday/imports/umpire/src/umpire/resource/CudaConstantMemoryResource.cu:38 #3 0x000000001560bb6c in __gnu_cxx::new_allocator<umpire::resource::CudaConstantMemoryResource>::construct<umpire::resource::CudaConstantMemoryResource, char const (&) [13], int&, umpire::resource::MemoryResourceTraits&> (this=<optimized out>, __p=0x4aa1ddd0, __args=..., __args=..., __args=...) at /usr/tce/packages/gcc/gcc-4.9.3/lib64/gcc/powerpc64le-unknown-linux-gnu/4.9.3/../../../../include/c++/4.9.3/ext/new_allocator.h:120 #4 0x000000001560b8cc in std::allocator_traits<std::allocator<umpire::resource::CudaConstantMemoryResource> >::_S_construct<umpire::resource::CudaConstantMemoryResource, char const (&) [13], int&, umpire::resource::MemoryResourceTraits&> ( __p=<optimized out>, __args=..., __args=..., __args=..., __a=...) at /usr/tce/packages/gcc/gcc-4.9.3/lib64/gcc/powerpc64le-unknown-linux-gnu/4.9.3/../../../../include/c++/4.9.3/bits/alloc_traits.h:253 #5 std::allocator_traits<std::allocator<umpire::resource::CudaConstantMemoryResource> >::construct<umpire::resource::CudaConstantMemoryResource, char const (&) [13], int&, umpire::resource::MemoryResourceTraits&> (__p=<optimized out>, __a=..., __args=..., __args=..., __args=...) at /usr/tce/packages/gcc/gcc-4.9.3/lib64/gcc/powerpc64le-unknown-linux-gnu/4.9.3/../../../../include/c++/4.9.3/bits/alloc_traits.h:399 #6 std::_Sp_counted_ptr_inplace<umpire::resource::CudaConstantMemoryResource, std::allocator<umpire::resource::CudaConstantMemoryResource>, (__gnu_cxx::_Lock_policy)2>::_Sp_counted_ptr_inplace<char const (&) [13], int&, umpire::resource::MemoryResourceTraits&> (this=<optimized out>, __a=..., __args=..., __args=..., __args=...) at /usr/tce/packages/gcc/gcc-4.9.3/lib64/gcc/powerpc64le-unknown-linux-gnu/4.9.3/../../../../include/c++/4.9.3/bits/shared_ptr_base.h:515 #7 __gnu_cxx::new_allocator<std::_Sp_counted_ptr_inplace<umpire::resource::CudaConstantMemoryResource, std::allocator<umpire::resource::CudaConstantMemoryResource>, (__gnu_cxx::_Lock_policy)2> >::construct<std::_Sp_counted_ptr_inplace<umpire::resource::CudaConstantMemoryResource, std::allocator<umpire::resource::CudaConstantMemoryResource>, (__gnu_cxx::_Lock_policy)2>, std::allocator<umpire::resource::CudaConstantMemoryResource> const, char const (&) [13], int&, umpire::resource::MemoryResourceTraits&>(std::_Sp_counted_ptr_inplace<umpire::resource::CudaConstantMemoryResource, std::allocator<umpire::resource::CudaConstantMemoryResource>, (__gnu_cxx::_Lock_policy)2>*, std::allocator<umpire::resource::CudaConstantMemoryResource> const&&, char const (&) [13], int&, umpire::resource::MemoryResourceTraits&) (this=<optimized out>, __p=<optimized out>, __args=<optimized out>, __args=<optimized out>, __args=<optimized out>, __args=<optimized out>) at /usr/tce/packages/gcc/gcc-4.9.3/lib64/gcc/powerpc64le-unknown-linux-gnu/4.9.3/../../../../include/c++/4.9.3/ext/new_allocator.h:120 #8 std::allocator_traits<std::allocator<std::_Sp_counted_ptr_inplace<umpire::resource::CudaConstantMemoryResource, std::allocator<umpire::resource::CudaConstantMemoryResource>, (__gnu_cxx::_Lock_policy)2> > >::_S_construct<std::_Sp_counted_ptr_inplace<umpire::resource::CudaConstantMemoryResource, std::allocator<umpire::resource::CudaConstantMemoryResource>, (__gnu_cxx::_Lock_policy)2>, std::allocator<umpire::resource::CudaConstantMemoryResource> const, char const (&) [13], int&, umpire::resource::MemoryResourceTraits&> (__a=..., __p=<optimized out>, __args=<optimized out>, __args=<optimized out>, __args=<optimized out>, __args=<optimized out>) at /usr/tce/packages/gcc/gcc-4.9.3/lib64/gcc/powerpc64le-unknown-linux-gnu/4.9.3/../../../../include/c++/4.9.3/bits/alloc_traits.h:253 #9 std::allocator_traits<std::allocator<std::_Sp_counted_ptr_inplace<umpire::resource::CudaConstantMemoryResource, std::allocator<umpire::resource::CudaConstantMemoryResource>, (__gnu_cxx::_Lock_policy)2> > >::construct<std::_Sp_counted_ptr_inplace<umpire::resource::CudaConstantMemoryResource, std::allocator<umpire::resource::CudaConstantMemoryResource>, (__gnu_cxx::_Lock_policy)2>, std::allocator<umpire::resource::CudaConstantMemoryResource> const, char const (&) [13], int&, umpire::resource::MemoryResourceTraits&> (__a=..., __p=<optimized out>, __args=<optimized out>, __args=<optimized out>, __args=<optimized out>, __args=<optimized out>) at /usr/tce/packages/gcc/gcc-4.9.3/lib64/gcc/powerpc64le-unknown-linux-gnu/4.9.3/../../../../include/c++/4.9.3/bits/alloc_traits.h:399 #10 std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count<umpire::resource::CudaConstantMemoryResource, std::allocator<umpire::resource::CudaConstantMemoryResource>, char const (&) [13], int&, umpire::resource::MemoryResourceTraits&> ( this=0x3fffffffb6e8, __a=..., __args=..., __args=..., __args=...) at /usr/tce/packages/gcc/gcc-4.9.3/lib64/gcc/powerpc64le-unknown-linux-gnu/4.9.3/../../../../include/c++/4.9.3/bits/shared_ptr_base.h:619 #11 0x000000001560b690 in std::__shared_ptr<umpire::resource::CudaConstantMemoryResource, (__gnu_cxx::_Lock_policy)2>::__shared_ptr<std::allocator<umpire::resource::CudaConstantMemoryResource>, char const (&) [13], int&, umpire::resource::MemoryResourceTraits&> (__a=..., __args=..., __args=..., __args=..., this=<optimized out>, __tag=...) at /usr/tce/packages/gcc/gcc-4.9.3/lib64/gcc/powerpc64le-unknown-linux-gnu/4.9.3/../../../../include/c++/4.9.3/bits/shared_ptr_base.h:1089 #12 std::shared_ptr<umpire::resource::CudaConstantMemoryResource>::shared_ptr<std::allocator<umpire::resource::CudaConstantMemoryResource>, char const (&) [13], int&, umpire::resource::MemoryResourceTraits&> (__a=..., __args=<optimized out>, __args=<optimized out>, this=<optimized out>, __tag=..., __args=<optimized out>) at /usr/tce/packages/gcc/gcc-4.9.3/lib64/gcc/powerpc64le-unknown-linux-gnu/4.9.3/../../../../include/c++/4.9.3/bits/shared---Type <return> to continue, or q <return> to quit--- _ptr.h:316 #13 std::allocate_shared<umpire::resource::CudaConstantMemoryResource, std::allocator<umpire::resource::CudaConstantMemoryResource>, char const (&) [13], int&, umpire::resource::MemoryResourceTraits&> (__a=..., __args=<optimized out>, __args=<optimized out>, __args=<optimized out>) at /usr/tce/packages/gcc/gcc-4.9.3/lib64/gcc/powerpc64le-unknown-linux-gnu/4.9.3/../../../../include/c++/4.9.3/bits/shared_ptr.h:587 #14 std::make_shared<umpire::resource::CudaConstantMemoryResource, char const (&) [13], int&, umpire::resource::MemoryResourceTraits&> (__args=<optimized out>, __args=<optimized out>, __args=<optimized out>) at /usr/tce/packages/gcc/gcc-4.9.3/lib64/gcc/powerpc64le-unknown-linux-gnu/4.9.3/../../../../include/c++/4.9.3/bits/shared_ptr.h:603 #15 umpire::resource::CudaConstantMemoryResourceFactory::create (this=<optimized out>, id=4) at /g/g18/probinso/ale3d/bugfixday/imports/umpire/src/umpire/resource/CudaConstantMemoryResourceFactory.cpp:48 #16 0x000000001560acb0 in umpire::resource::MemoryResourceRegistry::makeMemoryResource (this=0x4a9f5da0, name=..., id=<optimized out>) at /g/g18/probinso/ale3d/bugfixday/imports/umpire/src/umpire/resource/MemoryResourceRegistry.cpp:50 #17 0x00000000155f44dc in umpire::ResourceManager::initialize (this=0x4a9e5ab0) at /g/g18/probinso/ale3d/bugfixday/imports/umpire/src/umpire/ResourceManager.cpp:122 #18 0x00000000155f2364 in umpire::ResourceManager::ResourceManager (this=0x4a9e5ab0) at /g/g18/probinso/ale3d/bugfixday/imports/umpire/src/umpire/ResourceManager.cpp:96 #19 0x00000000155f0fe4 in umpire::ResourceManager::getInstance () at /g/g18/probinso/ale3d/bugfixday/imports/umpire/src/umpire/ResourceManager.cpp:50 #20 0x00000000155ebf10 in chai::ArrayManager::ArrayManager (this=0x4a9e59e0) at /g/g18/probinso/ale3d/bugfixday/imports/chai/src/chai/ArrayManager.cpp:66 #21 0x00000000155ebe0c in chai::ArrayManager::getInstance () at /g/g18/probinso/ale3d/bugfixday/imports/chai/src/chai/ArrayManager.cpp:58 #22 0x0000000010787840 in chai::ManagedArray<globalID>::ManagedArray (this=0x46fbff50 <nodemap>)

@davidbeckingsale davidbeckingsale self-assigned this Jan 23, 2019
@davidbeckingsale
Copy link
Member

This was due to running on a node without a GPU.

We will add a better error message to catch and prevent this in the future.

@davidbeckingsale
Copy link
Member

Improved error message added in #44

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants