Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug] memory pool. #786

Closed
kyungjoo-kim opened this issue May 10, 2017 · 14 comments
Closed

[bug] memory pool. #786

kyungjoo-kim opened this issue May 10, 2017 · 14 comments
Assignees
Labels
Bug Broken / incorrect code; it could be Kokkos' responsibility, or others’ (e.g., Trilinos)
Milestone

Comments

@kyungjoo-kim
Copy link
Contributor

This simple code

  typedef Kokkos::Serial HostSpaceType;
  typedef Kokkos::MemoryPool<HostSpaceType> memory_pool_type;
  typedef HostSpaceType::memory_space memory_space;

 enum { MemoryCapacity = 16000 };
  enum { MinBlockSize   =   64 };
  enum { MaxBlockSize   = 1024 };
  enum { SuperBlockSize = 1u << 12 };
  memory_pool_type pool(memory_space()
                    , MemoryCapacity
                    , MinBlockSize
                    , MaxBlockSize
                    , SuperBlockSize );

  // const ordinal_type bufsize = 10*10*sizeof(double);
  // auto f_alloc    = Kokkos::host_spawn(Kokkos::TaskSingle(sched), functor_allocate(pool, bufsize));
  // auto f_view     = Kokkos::host_spawn(Kokkos::TaskSingle(f_alloc), functor_test_view(pool, f_alloc, 10, 10));
  // auto f_view_see = Kokkos::host_spawn(Kokkos::TaskSingle(f_view), functor_test_view_see(pool, f_view));
  // Kokkos::host_spawn( Kokkos::TaskSingle(f_view_see), functor_deallocate(pool, f_alloc, bufsize) );

  // Kokkos::wait( sched );
  {
    void *a = NULL;
    printf("initially %lu\n", a);
    a = pool.allocate(100*sizeof(double));
    printf("allocated in %lu\n", a);
    pool.deallocate(a, 100);
    printf("sucess deallocation\n");
  }

returns

initially 0
allocated in 27677760
Kokkos MemoryPool deallocate(0x1a65440) contains(0) block_aligned(0) dealloc_once(0)
Kokkos MemoryPool::deallocate given erroneous pointerAborted (core dumped)
@ibaned
Copy link
Contributor

ibaned commented May 10, 2017

if you pass a size to deallocate, it has to be the same you passed to allocate. Right now they don't match, you have 100*sizeof(double) and 100. You can also just not pass a the second argument to deallocate.

@kyungjoo-kim
Copy link
Contributor Author

kyungjoo-kim commented May 10, 2017

That is typo when I extract the pseudo code.

As you mentioned, the current kokkos memory pool deallocation does not require the second argument so that it does not matter even if the bufsize does not match. The following code still generates the error.

  const auto bufsize = 100*sizeof(double);                                                                            
  auto a  = pool.allocate(bufsize);                                                                                   
  pool.deallocate(a, bufsize);  

// and it returns this error
Kokkos MemoryPool deallocate(0x2744440) contains(0) block_aligned(0) dealloc_once(0)
Kokkos MemoryPool::deallocate given erroneous pointer                                                                                                                       

FYI: the current version of memory pool::deallocate cannot ignore the second argument (it generates compiler error).

@hcedwar
Copy link
Contributor

hcedwar commented May 10, 2017

Error message indicates that the pointer is not contained within the pool.
Please print_state
https://github.com/kokkos/kokkos/blob/develop/core/src/Kokkos_MemoryPool.hpp#L189
immediately after your allocate statement

@kyungjoo-kim
Copy link
Contributor Author

What does this mean ?

[kyukim @bread] host > ./test
pool_size(16384) superblock_size(4096)
Superblock[ 0 / 4 ] { block_size(64) block_count( 0 / 64 )
Superblock[ 1 / 4 ] { block_size(128) block_count( 0 / 32 )
Superblock[ 2 / 4 ] { block_size(256) block_count( 0 / 16 )
Superblock[ 3 / 4 ] { block_size(512) block_count( 0 / 8 )
Kokkos MemoryPool deallocate(0x2405340) contains(0) block_aligned(0) dealloc_once(0)
Kokkos MemoryPool::deallocate given erroneous pointerAborted (core dumped)

@kyungjoo-kim
Copy link
Contributor Author

This is my test code

#include "Kokkos_Core.hpp"

int main(int argc, char *argv[]) {

  Kokkos::initialize(argc, argv);

  typedef Kokkos::Serial SpaceType;
  typedef Kokkos::MemoryPool<SpaceType> memory_pool_type;
  typedef typename SpaceType::memory_space memory_space;

  enum { MemoryCapacity = 16000 };
  enum { MinBlockSize   =   64 };
  enum { MaxBlockSize   = 1024 };
  enum { SuperBlockSize = 1u << 12 };

  memory_pool_type pool(memory_space()
                        , MemoryCapacity
                        , MinBlockSize
                        , MaxBlockSize
                        , SuperBlockSize );


  const auto bufsize = 100*sizeof(double);
  auto a  = pool.allocate(bufsize);
  pool.print_state(std::cout);
  pool.deallocate(a, bufsize);

  Kokkos::finalize();

  return 0;
}

@ibaned
Copy link
Contributor

ibaned commented May 10, 2017

so, 100*sizeof(double) should be 800 (I recommend you hardcode 800 for the purpose of this debugging). The largest power of 2 containing 800 is 1024, but there are only 4 superblocks and none of them have block size 1024, even though you did specify MaxBlockSize = 1024. So there are several things that seem like bugs here:

  1. We didn't create enough superblocks to reach the user's MaxBlockSize request. What should we do in the case that this is bigger than MinBlockSize*(2^(MemoryCapacity/SuperBlockSize)) ? Question for @hcedwar and I.
  2. allocate returned non-zero, but none of the superblocks have any blocks.
  3. deallocate failed despite seemingly correct allocate

I suspect these are all related, in particular the MaxBlockSize thing...
Could you rerun with MemoryCapacity = 32000 ?

@kyungjoo-kim
Copy link
Contributor Author

Still wrong.

[kyukim @bread] host > ./test
pool_size(32768) superblock_size(4096)
Superblock[ 0 / 8 ] { block_size(64) block_count( 0 / 64 )
Superblock[ 1 / 8 ] { block_size(64) block_count( 0 / 64 )
Superblock[ 2 / 8 ] { block_size(128) block_count( 0 / 32 )
Superblock[ 3 / 8 ] { block_size(128) block_count( 0 / 32 )
Superblock[ 4 / 8 ] { block_size(256) block_count( 0 / 16 )
Superblock[ 5 / 8 ] { block_size(256) block_count( 0 / 16 )
Superblock[ 6 / 8 ] { block_size(512) block_count( 0 / 8 )
Superblock[ 7 / 8 ] { block_size(512) block_count( 0 / 8 )
Kokkos MemoryPool deallocate(0x1de03c0) contains(0) block_aligned(0) dealloc_once(0)
Kokkos MemoryPool::deallocate given erroneous pointerAborted (core dumped)

@ibaned
Copy link
Contributor

ibaned commented May 10, 2017

These are definitely bug cases that we'll fix, but please try one more: SuperBlockSize = 1 << 13, MemoryCapacity = 32000. The reason for this one is I recall some logic related to MaxBlockSize/SuperBlockSize < 5, which is the case in your original test.

@kyungjoo-kim
Copy link
Contributor Author

kyungjoo-kim commented May 10, 2017

That case does not work either.

[kyukim @bread] host > ./test
pool_size(32768) superblock_size(8192)
Superblock[ 0 / 4 ] { block_size(64) block_count( 0 / 128 )
Superblock[ 1 / 4 ] { block_size(128) block_count( 0 / 64 )
Superblock[ 2 / 4 ] { block_size(256) block_count( 0 / 32 )
Superblock[ 3 / 4 ] { block_size(512) block_count( 0 / 16 )
Kokkos MemoryPool deallocate(0x1c93f40) contains(0) block_aligned(0) dealloc_once(0)
Kokkos MemoryPool::deallocate given erroneous pointerAborted (core dumped)

Now if I set bufsize = 64 * sizeof(double) = 512, then

[kyukim @bread] host > ./test
pool_size(32768) superblock_size(8192)
Superblock[ 0 / 4 ] { block_size(64) block_count( 0 / 128 )
Superblock[ 1 / 4 ] { block_size(128) block_count( 0 / 64 )
Superblock[ 2 / 4 ] { block_size(256) block_count( 0 / 32 )
Superblock[ 3 / 4 ] { block_size(512) block_count( 1 / 16 )

However, I still fail if I set bufsize = 65 * sizeof(double) = 520. You probably need to add another unit test that randomly allocate and deallocate within a constraint of max capacity. I saw the unit test with parallel reduce but apparently that test is not enough.

@ibaned
Copy link
Contributor

ibaned commented May 11, 2017

Thanks @kyungjoo-kim . We'll get to the bottom of this and add a test to prevent it happening again.

@crtrott crtrott added the Bug Broken / incorrect code; it could be Kokkos' responsibility, or others’ (e.g., Trilinos) label May 11, 2017
@hcedwar hcedwar added this to Backlog in On-node Task DAG May 12, 2017
hcedwar added a commit that referenced this issue May 12, 2017
…ize,

and superblock size resulted in erroneously "allocating" after the end
of the pool.
@hcedwar hcedwar self-assigned this May 12, 2017
@hcedwar hcedwar added this to the 2017-June-end milestone May 12, 2017
@hcedwar hcedwar moved this from Backlog to In Progress in On-node Task DAG May 12, 2017
@hcedwar hcedwar moved this from In Progress to In Develop in On-node Task DAG May 12, 2017
hcedwar added a commit that referenced this issue May 15, 2017
… block size,"

Introduced to-be-resolved race condition

This reverts commit 4e3c666.
@hcedwar
Copy link
Contributor

hcedwar commented May 15, 2017

Reverted due to introduction of a memory race condition to be resolved.

@hcedwar hcedwar moved this from In Develop to In Progress in On-node Task DAG May 15, 2017
hcedwar added a commit that referenced this issue May 16, 2017
…ize,

and superblock size resulted in erroneously "allocating" after the end
of the memory pool.

Fix DynamicView resize_parallel to abort if the memory pool allocation fails.

Fix memory pool race condition when more than one thread attempts to
claim an empty superblock for the same block size then the losing thread
could mistakenly report that no superblock was available.
@hcedwar hcedwar moved this from In Progress to In Develop in On-node Task DAG May 16, 2017
@crtrott crtrott closed this as completed May 27, 2017
@hcedwar hcedwar moved this from In Develop to Done in On-node Task DAG Jun 1, 2017
@kyungjoo-kim
Copy link
Contributor Author

there is a bug in memory pool. The following is the error message. The code requested 9480 and the pool clearly has enough space but it returns a NULL pointer.

requested = 9480
pool_size(4194304) superblock_size(1048576)
Superblock[ 0 / 4 ] { block_size(65536) block_count( 3 / 16 )
Superblock[ 1 / 4 ] { block_size(131072) block_count( 1 / 8 )
Superblock[ 2 / 4 ] { block_size(32768) block_count( 1 / 32 )
Superblock[ 3 / 4 ] { block_size(262144) block_count( 2 / 4 )
>> Error in file /home/kyukim/Work/lib/trilinos/temp/packages/shylu/tacho/refactor/src/TachoExp_TaskFunctor_CholSupernodes.hpp, line 84, error 1 
   bufmemory pool allocation fails

@kyungjoo-kim
Copy link
Contributor Author

Another bug... When it has 0 allocation, it aborts (it should return a NULL).

before allocate 0
Kokkos MemoryPool allocation request exceeded specified maximum allocation size
Program received signal SIGABRT, Aborted.

@ibaned
Copy link
Contributor

ibaned commented Jul 10, 2017

@kyungjoo-kim this is a closed issue, you need to open a new issue for each new bug. I've moved your comments to #939 and answered them there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Broken / incorrect code; it could be Kokkos' responsibility, or others’ (e.g., Trilinos)
Projects
No open projects
Development

No branches or pull requests

4 participants