Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kokkos Threads Backend impl_shared_alloc Broken on Intel 16.1 (Shepard Haswell) #186

Closed
nmhamster opened this issue Feb 9, 2016 · 6 comments
Assignees
Labels
Bug Broken / incorrect code; it could be Kokkos' responsibility, or others’ (e.g., Trilinos)

Comments

@nmhamster
Copy link
Contributor

Doing standard test in develop branch on Shepard with Intel 16.1 compiler.

$ ./KokkosCore_UnitTest_Threads
[==========] Running 44 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 44 tests from threads
Kokkos::Threads KOKKOS_HAVE_PTHREAD threads[2] threads_per_numa[2] threads_per_core[1] ReduceScratch[0] SharedScratch[0]
 Thread[ 1 : 0.0 ] Fan{ [ 0 : 0.0 ] } is_process
 Thread[ 0 : 0.0 ] Fan{ }
[ RUN      ] threads.init
[       OK ] threads.init (0 ms)
[ RUN      ] threads.dispatch
[       OK ] threads.dispatch (23 ms)
[ RUN      ] threads.impl_shared_alloc
Segmentation fault

Backtrace Information:

Program received signal SIGSEGV, Segmentation fault.
0x00000000005f469f in Kokkos::Experimental::Impl::SharedAllocationRecord<void, void>::is_sane(Kokkos::Experimental::Impl::SharedAllocationRecord<void, void>*) ()
(gdb) bt
#0  0x00000000005f469f in Kokkos::Experimental::Impl::SharedAllocationRecord<void, void>::is_sane(Kokkos::Experimental::Impl::SharedAllocationRecord<void, void>*) ()
#1  0x0000000000483e64 in void Test::test_shared_alloc<Kokkos::HostSpace, Kokkos::Threads>() ()
#2  0x00000000005dd77d in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) ()
#3  0x0000000000578899 in testing::Test::Run() ()
#4  0x0000000000576e16 in testing::TestInfo::Run() ()
#5  0x0000000000575db8 in testing::TestCase::Run() ()
#6  0x000000000057484a in testing::internal::UnitTestImpl::RunAllTests() ()
#7  0x00000000005694b6 in testing::UnitTest::Run() ()
#8  0x000000000055eeba in main ()
@crtrott
Copy link
Member

crtrott commented Feb 9, 2016

Looking into it

crtrott added a commit that referenced this issue Feb 10, 2016
This is the first step in addressing issue #186 by fixing a segfault
error in the error reporting mechanism. I believe that the test itself
is not supposed to enter the error reporting mechanism, so something else
probably went wrong before.
@crtrott
Copy link
Member

crtrott commented Feb 11, 2016

I fixed the actual crash, but the underlying reason for the crash is unexpected (incorrect) behaviour. So the test still fails.

@crtrott crtrott added the Bug Broken / incorrect code; it could be Kokkos' responsibility, or others’ (e.g., Trilinos) label Mar 15, 2016
@crtrott
Copy link
Member

crtrott commented Mar 15, 2016

Tracked this down to the static initialization of the root SharedAllocRecord.
This constructor does not end up actually having the same value for this, m_root, m_prev and m_next.

  constexpr SharedAllocationRecord()
    : m_alloc_ptr( 0 )
    , m_alloc_size( 0 )
    , m_dealloc( 0 )
    , m_root( this )
    , m_prev( this )
    , m_next( this )
    , m_count( 0 )
    {}

Possible workaround: do a dynamic initialization in the HostSpace constructors protected by some static variable to make sure its only done ones. For example in the HostSpace default constructor:

  static int check = 0;
  Experimental::Impl::SharedAllocationRecord< void , void >* r = &Experimental::Impl::SharedAllocationRecord< Kokkos::HostSpace , void >::s_root_record;
  if(check==0) {
    r->m_root=r;
    r->m_next=r;
    r->m_prev=r;
    check=1;
  }

@crtrott
Copy link
Member

crtrott commented Mar 15, 2016

For the record: this is a SHA with the issue: 261a546

hcedwar added a commit that referenced this issue Mar 15, 2016
…cord

confused the Intel/16 optimizer, leading to incorrect initialization.
Removed the qualifier to fix issu3 #186.
@hcedwar
Copy link
Contributor

hcedwar commented Mar 15, 2016

The use of 'this' in a function other than use as a postfix-expression is not permitted within a constrant expression (C++11 spec P5.19). The result is undefined (i.e., implementation defined) and thus non-portable. Removing the 'constexpr' qualifier on the constructor.

@crtrott
Copy link
Member

crtrott commented Mar 16, 2016

Pushed to master and trilinos

@crtrott crtrott closed this as completed Mar 16, 2016
maartenarnst added a commit to uliegecsm/kokkos that referenced this issue Sep 10, 2023
dalg24 added a commit that referenced this issue Sep 11, 2023
(fix for issue #6254) Avoid `#186-D pointless comparison warning`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Broken / incorrect code; it could be Kokkos' responsibility, or others’ (e.g., Trilinos)
Projects
None yet
Development

No branches or pull requests

3 participants