Tutorial
In this tutorial we will go through the QD locking library with some examples.
All functions in the lock library are prefixed with LL_
to avoid interference with other libraries.
The text below is written to be read from top to bottom.
Full code for the SharedInt example presented below can be found here.
Full code for the concurrent queue example presented below can be found here.
Here is the most simple way to create the lock data structure:
OOLock * lock = LL_create(MRQD_LOCK);
This will create an MRQD lock.
An MRQD lock is a QD lock with support for multiple parallel readers.
See the paper Queue Delegation Locking for more information about MRQD locks and QD locks.
You can change the lock type by simply exchanging MRQD_LOCK
with something else in the code fragment above.
See the enum LL_lock_type_name
for available lock types.
LL_lock_type_name
contains lock types prefixed with PLAIN_
.
At this point, the ones prefixed with PLAIN_
can only be used with the clang compiler since gcc does not yet have a required feature.
The PLAIN_
prefix causes the plain lock structure to be returned without the OOLock
wrapper.
OOLock
contains a pointer to a function table with all functions available in the lock API.
This makes it possible to use the API in generic way.
However, this causes a small overhead compared to calling the function directly since the function has to be accessed from the function table.
This overhead can be avoided with the C11 generic construct that unfortunately is not available for gcc yet.
When you use the plain variants the lock structure that you pass to the functions in the API needs to be of the correct type so the compiler can replace the API function with the lock specific function.
Documentation about which lock names correspond to which types can be found here.
Here is how to create the plain version of the MRQD lock:
MRQDLock * lock = LL_create(PLAIN_MRQD_LOCK);
The LL_free
function is used to free a lock that is created with LL_create
You can also allocate a plain lock statically and initialize it with the LL_initialize
function:
MRQDLock lock;
void initLock(){
LL_initialize(&lock);
}
We will present all basic lock functionality with an example that implements a shared integer (SharedInt), that can be accessed concurrently by several threads. All operations on SharedInts are supposed to be atomic. A SharedInt can be represented with the following C structure which employs an MRQD lock to coordinate concurrent accesses to it.
typedef struct {
MRQDLock lock;
int value;
} SharedInt;
Now consider a mult
operation that multiplies a SharedInt with an
integer value and stores the result back in the SharedInt.
Since MRQD locks support the traditional locking operations lock
and unlock
, the mult
function could be implemented like this:
void mult1(SharedInt* v1, int v2) {
LL_lock(&v1->lock);
v1->value = v1->value * v2;
LL_unlock(&v1->lock);
}
This implementation is easy to understand, but it may not be very efficient when a SharedInt is contented on a multicore machine. One reason is that every thread has to wait for the thread currently holding the lock to release it before its execution can proceed. With thread preemption, the OS can force the lock holding thread to be suspended for an arbitrary amount of time. Most critical sections can be more efficiently implemented with delegated critical section that are described in the next sections. However, the traditional lock and unlock calls can be useful when porting applications to use the queue delegation lock library.
Since mult1
does not return any value, we could instead delegate the responsibility of executing the critical section to another thread that has already acquired the lock.
This is easy to do using the LL_delegate
function from the QD library:
typedef struct {
SharedInt* v1;
int v2;
} MultMsg;
void mult_cs(unsigned int sz, void* msgP) {
MultMsg* msg = (MultMsg*)msgP;
msg->v1->value = msg->v1->value * msg->v2;
}
void mult2(SharedInt* v1, int v2) {
MultMsg msg = {.v1 = v1, .v2 = v2};
LL_delegate(&v1->lock, mult_cs, sizeof(msg), &msg);
}
The LL_delegate
call either executes the mult_cs
function in the current thread while holding the lock or delegates the responsibility to execute it to the current lock holder.
As long as all accesses to the SharedInt are done by threads that have acquired the lock, mult1
and mult2
are equivalent in the sense that it is impossible to detect which one is used.
The parameters to LL_delegate
are the lock, the function with the code of the critical section, the message size and a pointer to the message data.
The message data and the message size will be passed to the delegated function when it is executed.
The programmer cannot rely on that the mult_cs
function is executed by the current thread or that the message data is not copied to another location.
Many threads can delegate critical sections to the same helper, which has the performance advantage that the manipulated data can stay in the same private cache while it is being manipulated.
Note that delegated and wait critical sections can actually only be delegated if the underlying lock type supports it.
Currently, the lock types QDLock, MRQDLock and CCSynch supports delegated critical sections.
Extending the example from the previous section, suppose we now also want the result of the multiplication as a return value.
We can create the function mult_res
for that purpose:
typedef struct {
SharedInt* v1;
int v2;
int* retval;
} MultResMsg;
void mult_res_cs(unsigned int sz, void* msgP) {
MultResMsg* msg = (MultResMsg*)msgP;
msg->v1->value = msg->v1->value * msg->v2;
*msg->retval = msg->v1->value;
}
int mult_res(SharedInt* v1, int v2) {
int res;
MultResMsg msg = {.v1 = v1, .v2 = v2, .retval = &res};
LL_delegate_wait(&v1->lock,
mult_res_cs,
sizeof(msg),
&msg);
return res;
}
The delegated function writes back the result to the location pointed to by retval
in MultResMsg
.
Here we are using a variant of delegate
called LL_delegate_wait
,
which waits for the delegated function's execution before it returns.
Unsurprisingly, LL_delegate_wait
also guarantees that the lock is held by the thread that executes the delegated function.
Instead of returning the actual value from `mult_res} one could return at this point a future that will contain the result when it is ready.
However, in a low-level language such as C, constructing this future is something that needs to be done by the program itself.
Note that delegated and wait critical sections can actually only be delegated if the underlying lock type supports it.
Currently, the lock types QDLock, MRQDLock and CCSynch supports delegated critical sections.
The QD locking library also supports read-only critical sections:
int read(SharedInt* v) {
int res;
LL_rlock(&v->lock);
res = v->value;
LL_runlock(&v->lock);
return res;
}
Using read-only critical sections is a powerful way to support multiple parallel read operations. Mixing read-only critical sections with delegated critical sections can give excellent performance. Since threads can delegate critical sections without waiting for the actual execution they can continue and issue a read-only critical section. Read-critical sections are thus more likely to bulk up so that more can execute in parallel than if a readers-writer lock without delegation was used. Note that read-only critical sections can actually only be run in parallel if the underlying lock type supports it. Currently, the lock types MRQDLock and DRMCS supports parallel read-only critical sections.
We have almost gone through all functions in the locking library.
The only ones left are the LL_delegate_or_lock
family of functions.
These provide the same functionality as LL_delegate
but avoid the overhead that LL_delegate
has in creating a separate buffer for copying the message for the delegated critical section to the current lock holder.
Creating this buffer can be particularly expensive when the size of the buffer is not known at compile time so that it needs to be allocated dynamically.
Dynamic memory allocation is expensive and can be a scalability problem on multicore machines.
The code at the top of the next page shows how this family of functions is used.
void enqueue_cs(unsigned int sz, void* m) {
enq(*((Queue**)m), sz - sizeof(Queue*), &((char*)m)[sizeof(Queue*)]);
}
void enqueue(Queue* q, int dSize, void* d) {
void* buff = LL_delegate_or_lock(q->lock, dSize + sizeof(Queue*));
if (buff == NULL) {
enq(q, dSize, d);
LL_delegate_unlock(lock);
} else {
memcpy(buff, &q, sizeof(Queue*));
memcpy(&((char*)buff)[sizeof(Queue*)], d, dSize);
LL_close_delegate_buffer(q->lock, buff, enqueue_cs);
}
}
The enqueue
function is for a concurrent queue.
The function enq
is not shown for lack of space but we can assume that it is a non-thread-safe enqueue function which copies the data into the queue data structure.
The LL_delegate_or_lock
function attempts to get a message buffer of the specified size for a critical section.
If LL_delegate_or_lock
succeeds a buffer address will be returned,
otherwise NULL is returned and the lock is acquired.
The function LL_delegate_unlock
is used to unlock the lock when no message buffer was acquired.
LL_close_delegate_buffer
will indicate to the lock holder that the message buffer is fully written.
After the LL_close_delegate_buffer
call,
the delegated function (provided as parameter) can be executed with the message buffer given as parameter.
The only locks that have efficient support for LL_delegate_or_lock
are currently QDLock and MRQDLock.