Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multithreaded Migration and the implicit lock ? #4

Closed
mulle-nat opened this issue Mar 5, 2016 · 4 comments
Closed

Multithreaded Migration and the implicit lock ? #4

mulle-nat opened this issue Mar 5, 2016 · 4 comments

Comments

@mulle-nat
Copy link

I am asking this based on the article http://preshing.com/20160222/a-resizable-concurrent-map/ not based on the code. If you coordinate multithreaded migration with a lock, the whole thing can't be fully concurrent anymore (by definition ?).

Also you write: "In the current version of Linear, even get calls don’t read from the new table until the migration is complete." That sounds like a (b)lock to me, if your get API does not want to return the redirect marker.

My current belief is, that you have to do things optimistically otherwise, you lose being concurrent.

@preshing
Copy link
Owner

preshing commented Mar 7, 2016

Hi Nat,

The lock that's located in the Table is only used for DCLI (double-checked lazy initialization). It's taken when the TableMigration object has not been created or is not visible yet. We can't directly control when the TableMigration object becomes visible. It's up to the hardware. But once the TableMigration object is visible to all threads (via the jobCoordinator), threads can participate in the migration without taking that lock.

That said, I also use locks elsewhere. There's a (striped) lock in SimpleJobCoordinator::participate, for example. SimpleJobCoordinator is a general-purpose class that allows multiple threads to wait until work becomes available. The lock is used so that the thread can wait on a condition variable. I experimented with alternatives like Windows events and Linux futexes, but ended up settling on condition variables to keep it simple. Perhaps more scalability could be achieved if one of those alternatives is used.

Notice that I don't use the word "lock-free" on my blog much these days. I've been sticking with "concurrent". "Concurrent" is a different word from "lock-free" or "non-blocking". The way I think of it is that every operation (or class method) has a beginning and an end. If two threads are inside two method calls at the same time, they're concurrent. Simple as that. They might block internally, or they might not. Some people consider it a big priority to remain non-blocking, for interrupt handlers and stuff. I come from game development, so I don't care so much about the blocking property, especially if threads block only rarely. I just want my concurrent data structures to be scalable and fast. And Junction has performed decently in my limited benchmarking so far (but could likely be improved).

You're probably right that you have to do things optimistically if you want to remain non-blocking. I think that's why Cliff Click does so many things optimistically.

In the current version of Linear (and all Junction maps), yeah, get calls participate in the migration if they discover one in progress. Since I published Junction, I've started to think it would be better to let gets (not sets) pass through, without participating in the migration. It's kind of funny because I made it safe for this, but didn't actually try it yet.

@mulle-nat
Copy link
Author

Yes. In my head up to an hour ago concurrent was synonymous for lock-free. I also didn't understand from the article, that get participates in the migration, which confused me. I thought it a bit strange to do so much work to make it almost lock-free, but then not to go the extra mile.

Since I needed a concurrent map like this just this weekend (but as you know in C), I stole the ideas and wrote it in C. As I wrote it optimistically, I didn't need a lock. (My get also migrates BTW)

In the end the hashtable turns out to be wait-free. On the negative side, there can be some superflous but ephemeral mallocs. (Actually, coming to think of it, I could probably mitigate this by looking ahead if the CAS is known to fail)

The structure is similiar to yours, there is an active "storage" and one waiting to be built.

struct mulle_concurrent_hashmap
{
   mulle_atomic_pointer_t   storage;
   mulle_atomic_pointer_t   next_storage;
   struct mulle_allocator   *allocator;
};

Here is a the migration code. The actual copying of the hash table entries in _mulle_concurrent_hashmapstorage_copy is probably exactly the same as yours.

static int  _mulle_concurrent_hashmap_migrate_storage( struct mulle_concurrent_hashmap *map,
                                                       struct mulle_concurrent_hashmapstorage *p)
{
   struct mulle_concurrent_hashmapstorage   *q;
   struct mulle_concurrent_hashmapstorage   *alloced;
   struct mulle_concurrent_hashmapstorage   *previous;

   assert( p);

   // acquire new storage
   alloced = _mulle_concurrent_alloc_hashmapstorage( (p->mask + 1) * 2, map->allocator);
   if( ! alloced)
      return( -1);

   // make this the next world, assume that's still set to 'p' (SIC)
   q = __mulle_atomic_pointer_compare_and_swap( &map->next_storage, alloced, p);
   if( q != p)
   {
      // someone else produced a next world, use that and get rid of 'alloced'
      _mulle_allocator_free( map->allocator, alloced);  // ABA!!
      alloced = NULL;
   }
   else
      q = alloced;

   // this thread can partake in copying
   _mulle_concurrent_hashmapstorage_copy( q, p);

   // now update world, giving it the same value as 'next_world'
   previous = __mulle_atomic_pointer_compare_and_swap( &map->storage, q, p);

   // ok, if we succeed free old, if we fail alloced is
   // already gone. this must be an ABA free (use mulle_aba as allocator)
   if( previous == p)
      _mulle_allocator_free( map->allocator, previous); // ABA!!

   return( 0);
}

The two "tricks" here are:

  • at one point of time you acquire 'p'. Then you find out, that you need to migrate. You keep it around until you have migrated. If it's still valid in the end, then the migration is complete. Otherwise you abort the migration and try anew.
  • the next_world can only be changed, if it's the old world so to speak.

@preshing
Copy link
Owner

preshing commented Mar 7, 2016

That's pretty cool Nat. Glad Junction could serve as inspiration. It looks like you always double the size of the destination table. In doing so, I think you sidestep a couple of issues. One is the possibility of overflowing the destination table during migration. Junction deals with that by supporting multiple source tables and adding the old source and old destination as new sources.

You planning to publish it?

@mulle-nat
Copy link
Author

If you want, you can have the code (and possible put it into junction). It will be part of my Objective-C runtime, which will get released eventually (BSD license), but when I don't really know.

@preshing preshing closed this as completed Mar 8, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants