Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thread safety #8

Open
guinn8 opened this issue Feb 5, 2022 · 2 comments
Open

Thread safety #8

guinn8 opened this issue Feb 5, 2022 · 2 comments

Comments

@guinn8
Copy link

guinn8 commented Feb 5, 2022

Posting for the sake of others information. I attempted to use an array of mutexes to guard segments of a PackedArray to reduce the delay caused by threads waiting single-file to write to the PackedArray. This approach works fine with normal arrays. However with the PackedArray I was getting inconstant results.

This suggests that any invocation of the PackedArray_get / PackedArray_set should be guarded by a mutex, even if the individual index is guarded.

@gpakosz
Copy link
Owner

gpakosz commented Feb 5, 2022

Hello @guinn8 indeed PackedArray is not thread safe and you need to protect access from the outside.

I attempted to use an array of mutexes to guard segments of a PackedArray
While I find that overkill at first sight, if you really want to go that route you need

  • to allocate an array of mutexes of length given by PackedArray_bufferSize()
  • replicate the logic in PackedArray_set to figure out which mutexes to lock/unlock
    • you will have to lock 1 or 2 mutexes depending on whether the logical element spans 2 buffer cells or not
    • or you can pessimistically always lock 2 mutexes (but you don't need a if / else which can be mispredicted)

@guinn8
Copy link
Author

guinn8 commented Feb 11, 2022

Thanks @gpakosz! I got the array of mutexes working and it has massively increased performance.

I did this by allocating an arbitrarily (!= buffer_size) sized array of mutexes in the create function.

  // this will result some locks hanging off the end of the array, not really a big
  // deal that it doesn't perfectly match up as all elements are covered
  a->lock_info.lock_interval = ceil((float)bufferSize / (float)num_locks);
  a->lock_info.locks = calloc(num_locks, sizeof(omp_lock_t));
  for (size_t i = 0; i < num_locks; i++) {
    omp_init_lock(&a->lock_info.locks[i]);
  }

And then defined some functions to lock the mutex corresponding to a index in the underlying buffer (not the PackedArray index).

void PackedArray_lock_offset(PackedArray* a, const uint64_t offset) {
  size_t bufind = ((uint64_t)offset * (uint64_t)a->bitsPerItem) / 32;
  size_t lockind = bufind / a->lock_info.lock_interval;  // using implicit floor
  omp_set_lock(&(a->lock_info.locks[lockind]));
}

void PackedArray_unlock_offset(PackedArray* a, const uint64_t offset) {
  size_t bufind = ((uint64_t)offset * (uint64_t)a->bitsPerItem) / 32;
  size_t lockind = bufind / a->lock_info.lock_interval;  // using implicit floor
  omp_unset_lock(&(a->lock_info.locks[lockind]));
}

To make my life easier I decided to only support 32 divisible by bitsPerItem.

if (bitsPerItem <= bitsAvailable)
  {
    out[0] = (out[0] & ~(mask << startBit)) | (in << startBit);
  }
  else
  {
    PACKEDARRAY_ASSERT(0); // not supporting num_bits that doesnt evenly divide 32
   ...
  }

There is not reason this couldn't work for other bit sizes, I just don't need them so I didn't implement.

I'm happy to push this code somewhere if there is interest :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants