Skip to content
Kjell Winblad edited this page Feb 18, 2014 · 4 revisions

In this tutorial we will go through the QD locking library with some examples. All functions in the lock library are prefixed with LL_ to avoid interference with other libraries. The text below is written to be read from top to bottom. Full code for the SharedInt example presented below can be found here. Full code for the concurrent queue example presented below can be found here.

Creating, Freeing and Initializing Locks

Here is the most simple way to create the lock data structure:

OOLock * lock = LL_create(MRQD_LOCK);

This will create an MRQD lock. An MRQD lock is a QD lock with support for multiple parallel readers. See the paper Queue Delegation Locking for more information about MRQD locks and QD locks. You can change the lock type by simply exchanging MRQD_LOCK with something else in the code fragment above. See the enum LL_lock_type_name for available lock types. LL_lock_type_name contains lock types prefixed with PLAIN_. At this point, the ones prefixed with PLAIN_ can only be used with the clang compiler since gcc does not yet have a required feature. The PLAIN_ prefix causes the plain lock structure to be returned without the OOLock wrapper. OOLock contains a pointer to a function table with all functions available in the lock API. This makes it possible to use the API in generic way. However, this causes a small overhead compared to calling the function directly since the function has to be accessed from the function table. This overhead can be avoided with the C11 generic construct that unfortunately is not available for gcc yet. When you use the plain variants the lock structure that you pass to the functions in the API needs to be of the correct type so the compiler can replace the API function with the lock specific function. Documentation about which lock names correspond to which types can be found here. Here is how to create the plain version of the MRQD lock:

MRQDLock * lock = LL_create(PLAIN_MRQD_LOCK);

The LL_free function is used to free a lock that is created with LL_create You can also allocate a plain lock statically and initialize it with the LL_initialize function:

MRQDLock  lock;
void initLock(){
  LL_initialize(&lock);
}

Traditional Lock and Unlock Critical Sections

We will present all basic lock functionality with an example that implements a shared integer (SharedInt), that can be accessed concurrently by several threads. All operations on SharedInts are supposed to be atomic. A SharedInt can be represented with the following C structure which employs an MRQD lock to coordinate concurrent accesses to it.

typedef struct {
  MRQDLock lock;
  int value;
} SharedInt;

Now consider a mult operation that multiplies a SharedInt with an integer value and stores the result back in the SharedInt. Since MRQD locks support the traditional locking operations lock and unlock, the mult function could be implemented like this:

void mult1(SharedInt* v1, int v2) {
  LL_lock(&v1->lock);
  v1->value = v1->value * v2;
  LL_unlock(&v1->lock); 
}

This implementation is easy to understand, but it may not be very efficient when a SharedInt is contented on a multicore machine. One reason is that every thread has to wait for the thread currently holding the lock to release it before its execution can proceed. With thread preemption, the OS can force the lock holding thread to be suspended for an arbitrary amount of time. Most critical sections can be more efficiently implemented with delegated critical section that are described in the next sections. However, the traditional lock and unlock calls can be useful when porting applications to use the queue delegation lock library.

Delegated Critical Sections

Since mult1 does not return any value, we could instead delegate the responsibility of executing the critical section to another thread that has already acquired the lock. This is easy to do using the LL_delegate function from the QD library:

typedef struct {
  SharedInt* v1;
  int v2;
} MultMsg;
 
void mult_cs(unsigned int sz, void* msgP) {
  MultMsg* msg = (MultMsg*)msgP;
  msg->v1->value = msg->v1->value * msg->v2;
}
 
void mult2(SharedInt* v1, int v2) {
  MultMsg msg = {.v1 = v1, .v2 = v2};
  LL_delegate(&v1->lock, mult_cs, sizeof(msg), &msg);
}

The LL_delegate call either executes the mult_cs function in the current thread while holding the lock or delegates the responsibility to execute it to the current lock holder. As long as all accesses to the SharedInt are done by threads that have acquired the lock, mult1 and mult2 are equivalent in the sense that it is impossible to detect which one is used. The parameters to LL_delegate are the lock, the function with the code of the critical section, the message size and a pointer to the message data. The message data and the message size will be passed to the delegated function when it is executed. The programmer cannot rely on that the mult_cs function is executed by the current thread or that the message data is not copied to another location. Many threads can delegate critical sections to the same helper, which has the performance advantage that the manipulated data can stay in the same private cache while it is being manipulated. Note that delegated and wait critical sections can actually only be delegated if the underlying lock type supports it. Currently, the lock types QDLock, MRQDLock and CCSynch supports delegated critical sections.

Delegate and Wait

Extending the example from the previous section, suppose we now also want the result of the multiplication as a return value. We can create the function mult_res for that purpose:

typedef struct { 
  SharedInt* v1;
  int v2;
  int* retval;
} MultResMsg;
 
void mult_res_cs(unsigned int sz, void* msgP) {
  MultResMsg* msg = (MultResMsg*)msgP;
  msg->v1->value = msg->v1->value * msg->v2;
  *msg->retval = msg->v1->value;
}
 
int mult_res(SharedInt* v1, int v2) {
  int res;
  MultResMsg msg = {.v1 = v1, .v2 = v2, .retval = &res};
  LL_delegate_wait(&v1->lock,
                   mult_res_cs,
                   sizeof(msg),
                   &msg);
  return res;
}

The delegated function writes back the result to the location pointed to by retval in MultResMsg. Here we are using a variant of delegate called LL_delegate_wait, which waits for the delegated function's execution before it returns. Unsurprisingly, LL_delegate_wait also guarantees that the lock is held by the thread that executes the delegated function. Instead of returning the actual value from `mult_res} one could return at this point a future that will contain the result when it is ready. However, in a low-level language such as C, constructing this future is something that needs to be done by the program itself. Note that delegated and wait critical sections can actually only be delegated if the underlying lock type supports it. Currently, the lock types QDLock, MRQDLock and CCSynch supports delegated critical sections.

Read only Critical Sections

The QD locking library also supports read-only critical sections:

int read(SharedInt* v) {
  int res;
  LL_rlock(&v->lock);
  res = v->value;
  LL_runlock(&v->lock);
  return res;
}

Using read-only critical sections is a powerful way to support multiple parallel read operations. Mixing read-only critical sections with delegated critical sections can give excellent performance. Since threads can delegate critical sections without waiting for the actual execution they can continue and issue a read-only critical section. Read-critical sections are thus more likely to bulk up so that more can execute in parallel than if a readers-writer lock without delegation was used. Note that read-only critical sections can actually only be run in parallel if the underlying lock type supports it. Currently, the lock types MRQDLock and DRMCS supports parallel read-only critical sections.

Delegate and Copy directly Into the Queue Delegation Queue

We have almost gone through all functions in the locking library. The only ones left are the LL_delegate_or_lock family of functions. These provide the same functionality as LL_delegate but avoid the overhead that LL_delegate has in creating a separate buffer for copying the message for the delegated critical section to the current lock holder. Creating this buffer can be particularly expensive when the size of the buffer is not known at compile time so that it needs to be allocated dynamically. Dynamic memory allocation is expensive and can be a scalability problem on multicore machines. The code at the top of the next page shows how this family of functions is used.

void enqueue_cs(unsigned int sz, void* m) {
  enq(*((Queue**)m), sz - sizeof(Queue*), &((char*)m)[sizeof(Queue*)]);
}
 
void enqueue(Queue* q, int dSize, void* d) {
  void* buff = LL_delegate_or_lock(q->lock, dSize + sizeof(Queue*));
  if (buff == NULL) {
    enq(q, dSize, d);
    LL_delegate_unlock(lock);
  } else {
    memcpy(buff, &q, sizeof(Queue*));
    memcpy(&((char*)buff)[sizeof(Queue*)], d, dSize);
    LL_close_delegate_buffer(q->lock, buff, enqueue_cs);
  }
}

The enqueue function is for a concurrent queue. The function enq is not shown for lack of space but we can assume that it is a non-thread-safe enqueue function which copies the data into the queue data structure. The LL_delegate_or_lock function attempts to get a message buffer of the specified size for a critical section. If LL_delegate_or_lock succeeds a buffer address will be returned, otherwise NULL is returned and the lock is acquired. The function LL_delegate_unlock is used to unlock the lock when no message buffer was acquired. LL_close_delegate_buffer will indicate to the lock holder that the message buffer is fully written. After the LL_close_delegate_buffer call, the delegated function (provided as parameter) can be executed with the message buffer given as parameter. The only locks that have efficient support for LL_delegate_or_lock are currently QDLock and MRQDLock.