Implementation of condition variables for Clozure CL is not correct #29

arbv · 2017-03-23T11:18:57Z

The current implementation of the condition variables for Clozure CL is straight forward but seems to be incorrect although might work as expected most of the time. Correct implementation of condition variables on top of semaphores is tricky.

The correct semantic of condition variables implies that CONDITION-WAIT should atomically release the lock and enqueue thread on it. Unfortunately, this implementation does not seem to guarantee that.

Condition variables on top of semaphores implementation strategies (both correct and incorrect ones) are discussed in great detail in the following paper:

Implementing Condition Variables with Semaphores, Andrew Birrell (Microsoft Research)
https://www.microsoft.com/en-us/research/publication/implementing-condition-variables-with-semaphores/

The paper is short (8 pages) and easy to read and understand.

The current strategy is discussed in Getting Started section. A correct one which could be used for Clozure CL as part of the library is discussed in Fixing things up section.

lmj · 2017-04-25T19:32:57Z

@arbv The semantics of condition variables in that paper are different. On page 3 it says,

if a thread calls c.Signal() when no thread is inside c.Wait(), then s.count will be left at 1. This mean that the next thread to call c.Wait() will just decrement s.count and drop through, which isn’t really what the semantics said

condition-notify does not specify what happens when there are no threads waiting. You can do it, but it's implementation-dependent what happens. The natural advice is to not do it, which is easy enough. Bordeaux-threads could handle that case at the expense of being heavier, but the aim of bordeaux-threads (as I see it) is to be a lightweight wrapper. The paper goes on to present a fix for that case (page 4), but I would argue that it doesn't (or shouldn't) apply to bordeaux-threads.

The purpose of the second fix (page 5) is to enforce the semantics of Broadcast as described on page 2,

the threads to be awoken are exactly those that had called c.Wait() before this call of c.Broadcast()

But there is no broadcast functionality in bordeaux-threads, so there's nothing to do here either. The closest thing in bordeaux-threads is to call condition-notify N times. If thread A releases the lock in condition-wait, and then thread B calls ccl:signal-semaphore before thread A can call ccl:wait-on-semaphore, that's fine. Which threads are awoken and which threads go to sleep is indeterminate. There are no guarantees like in the above quote from the paper.

mdbergmann · 2020-06-06T12:20:27Z

In my locking code on top of cl-speedy-queue I have to add this in order to make it work on CCL.

(defmethod popq ((self queue-bounded))
  (with-slots (queue lock cvar) self
    (bt:with-lock-held (lock)
      (log:debug "Lock aquired...")
      #-ccl (if (> (get-queue-count queue) 0)
                (dequeue/no-wait queue)
                (dequeue/wait queue cvar lock))
      #+ccl (dequeue/wait queue cvar lock)
      )))

(defun dequeue/no-wait (queue)
  (log:debug "Dequeue without wait...")
  (speedq:dequeue queue))

(defun dequeue/wait (queue cvar lock)
  (log:debug "Going to sleep...")
  (bt:condition-wait cvar lock)
  (log:debug "Awoken, processing queue...")
  (speedq:dequeue queue))

In CCL I cannot dequeue even if the count is > 0.
In CCL it's only possible to dequeue after 'condition-wait' returned.

Is this related to this ticket here?

mdbergmann · 2020-07-27T07:44:41Z

Will this be fixed as part of APIv2?

sionescu · 2020-07-27T08:24:08Z

I believe so. You should try switching to APIv2 and see if the issue was fixed.

avodonosov · 2021-07-18T12:18:47Z

@mdbergmann , What you observe, is called "spurious wakeup". It is always necessary to use a loop when waiting for a condition (for example, an element present in a queue). See https://en.wikipedia.org/wiki/Spurious_wakeup

(defmethod popq ((self queue-bounded))
  (with-slots (queue lock cvar) self
    (bt:with-lock-held (lock)
      (loop
         while (< (get-queue-count queue) 1)
         do (bt:condition-wait cvar lock))
      (speedq:dequeue queue))))

That's said, I give no warranties about the bordeaux-threads implementation being correct in all aspects. Actually, I found this issue because I felt suspicious when noticed how condition-wait / condition-notify are implemented.

The spurious wake up is not a violation of traditional conditional variables semantics. Although it's a little strange to think of the specific behaviour in bordeaux-threads -
that if notifications come more often than waiting happens, the waiting thread should actively burn down all the accumulated notifications in a loop of unlock ; decrement semaphore ; lock.

BTW, @arbv , the implementation for CCL (the API version 1) is not exactly as in the Getting Started in the paper. The paper uses binary semaphore (limit = 1), while the bordeaux-threads CCL uses general (unlimited) semaphore.

mdbergmann · 2021-07-18T15:11:25Z

@avodonosov when using the loop I'm running into a stack-underflow error.

AFAIR I tried APIv2 and it worked there.

avodonosov · 2021-10-25T15:59:10Z

@mdbergmann , strange, I use loop without problem.

Note, that something worked fine (without spurious wake up) once does not mean it will work that way always.

@sionescu , I think that's a documentation issue in the first place. If spurious wake ups are guaranteed to never happen it is important to document. The posix condition wariables which are explicitly mentioned by bordeaux threads docs as the model, as well as other popular implementations (Java's wait/notify) have spurious wake ups and should be checked from a loop.

sionescu · 2021-12-03T20:29:14Z

I might have mixed things up. Here's what the Linux manpage says:

Features of Mutexes and Condition Variables

It had been suggested that the mutex acquisition and release be decoupled from condition wait. This was rejected because it is the combined nature of the operation that, in fact, facilitates realtime implementations. Those implementations can atomically move a high-priority thread between the condition variable and the mutex in a manner that is transparent to the caller. This can prevent extra context switches and provide more deterministic acquisition of a mutex when the waiting thread is signaled. Thus, fairness and priority issues can be dealt with directly by the scheduling discipline. Furthermore, the current condition wait operation matches existing practice.

In any case, I believe the current APIv2 implementation does respect POSIX semantics so I'm satisfied with that.

avodonosov · 2021-12-04T02:08:11Z

@sionescu, so what are you saying, sporious wakeups are possible or not?

The POSIX man page you refer says:

Spurious wakeups from the pthread_cond_timedwait() or pthread_cond_wait() functions may occur.

sionescu · 2021-12-04T03:52:59Z

Yes, spurious wakeups are always possible.

sionescu · 2021-12-04T18:06:57Z

I'm closing this because I believe that APIv2 solves the issue.

arbv mentioned this issue Mar 23, 2017

New CONDITION-VARIABLE implementation for Clozure CL (fix #29) #30

Closed

sionescu added the apiv2 label Jun 17, 2020

sionescu closed this as completed Dec 4, 2021

hxzrx mentioned this issue Feb 25, 2022

The condition variables in APIv2 still have spurious wakeups when there are many threads #109

Closed

hxzrx mentioned this issue Apr 24, 2022

consider wrap bt:condition-wait in a loop mdbergmann/cl-gserver#52

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementation of condition variables for Clozure CL is not correct #29

Implementation of condition variables for Clozure CL is not correct #29

arbv commented Mar 23, 2017

lmj commented Apr 25, 2017 •

edited

mdbergmann commented Jun 6, 2020

mdbergmann commented Jul 27, 2020

sionescu commented Jul 27, 2020

avodonosov commented Jul 18, 2021 •

edited

mdbergmann commented Jul 18, 2021

avodonosov commented Oct 25, 2021

sionescu commented Dec 3, 2021

avodonosov commented Dec 4, 2021 •

edited

sionescu commented Dec 4, 2021

sionescu commented Dec 4, 2021

Implementation of condition variables for Clozure CL is not correct #29

Implementation of condition variables for Clozure CL is not correct #29

Comments

arbv commented Mar 23, 2017

lmj commented Apr 25, 2017 • edited

mdbergmann commented Jun 6, 2020

mdbergmann commented Jul 27, 2020

sionescu commented Jul 27, 2020

avodonosov commented Jul 18, 2021 • edited

mdbergmann commented Jul 18, 2021

avodonosov commented Oct 25, 2021

sionescu commented Dec 3, 2021

avodonosov commented Dec 4, 2021 • edited

sionescu commented Dec 4, 2021

sionescu commented Dec 4, 2021

lmj commented Apr 25, 2017 •

edited

avodonosov commented Jul 18, 2021 •

edited

avodonosov commented Dec 4, 2021 •

edited