Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of condition variables for Clozure CL is not correct #29

Closed
arbv opened this issue Mar 23, 2017 · 11 comments
Closed

Implementation of condition variables for Clozure CL is not correct #29

arbv opened this issue Mar 23, 2017 · 11 comments
Labels

Comments

@arbv
Copy link

arbv commented Mar 23, 2017

The current implementation of the condition variables for Clozure CL is straight forward but seems to be incorrect although might work as expected most of the time. Correct implementation of condition variables on top of semaphores is tricky.

The correct semantic of condition variables implies that CONDITION-WAIT should atomically release the lock and enqueue thread on it. Unfortunately, this implementation does not seem to guarantee that.

Condition variables on top of semaphores implementation strategies (both correct and incorrect ones) are discussed in great detail in the following paper:

Implementing Condition Variables with Semaphores, Andrew Birrell (Microsoft Research)
https://www.microsoft.com/en-us/research/publication/implementing-condition-variables-with-semaphores/

The paper is short (8 pages) and easy to read and understand.

The current strategy is discussed in Getting Started section. A correct one which could be used for Clozure CL as part of the library is discussed in Fixing things up section.

@lmj
Copy link
Contributor

lmj commented Apr 25, 2017

@arbv The semantics of condition variables in that paper are different. On page 3 it says,

if a thread calls c.Signal() when no thread is inside c.Wait(), then s.count will be left at 1. This mean that the next thread to call c.Wait() will just decrement s.count and drop through, which isn’t really what the semantics said

condition-notify does not specify what happens when there are no threads waiting. You can do it, but it's implementation-dependent what happens. The natural advice is to not do it, which is easy enough. Bordeaux-threads could handle that case at the expense of being heavier, but the aim of bordeaux-threads (as I see it) is to be a lightweight wrapper. The paper goes on to present a fix for that case (page 4), but I would argue that it doesn't (or shouldn't) apply to bordeaux-threads.

The purpose of the second fix (page 5) is to enforce the semantics of Broadcast as described on page 2,

the threads to be awoken are exactly those that had called c.Wait() before this call of c.Broadcast()

But there is no broadcast functionality in bordeaux-threads, so there's nothing to do here either. The closest thing in bordeaux-threads is to call condition-notify N times. If thread A releases the lock in condition-wait, and then thread B calls ccl:signal-semaphore before thread A can call ccl:wait-on-semaphore, that's fine. Which threads are awoken and which threads go to sleep is indeterminate. There are no guarantees like in the above quote from the paper.

@mdbergmann
Copy link

In my locking code on top of cl-speedy-queue I have to add this in order to make it work on CCL.

(defmethod popq ((self queue-bounded))
  (with-slots (queue lock cvar) self
    (bt:with-lock-held (lock)
      (log:debug "Lock aquired...")
      #-ccl (if (> (get-queue-count queue) 0)
                (dequeue/no-wait queue)
                (dequeue/wait queue cvar lock))
      #+ccl (dequeue/wait queue cvar lock)
      )))

(defun dequeue/no-wait (queue)
  (log:debug "Dequeue without wait...")
  (speedq:dequeue queue))

(defun dequeue/wait (queue cvar lock)
  (log:debug "Going to sleep...")
  (bt:condition-wait cvar lock)
  (log:debug "Awoken, processing queue...")
  (speedq:dequeue queue))

In CCL I cannot dequeue even if the count is > 0.
In CCL it's only possible to dequeue after 'condition-wait' returned.

Is this related to this ticket here?

@sionescu sionescu added the apiv2 label Jun 17, 2020
@mdbergmann
Copy link

Will this be fixed as part of APIv2?

@sionescu
Copy link
Owner

I believe so. You should try switching to APIv2 and see if the issue was fixed.

@avodonosov
Copy link

avodonosov commented Jul 18, 2021

@mdbergmann , What you observe, is called "spurious wakeup". It is always necessary to use a loop when waiting for a condition (for example, an element present in a queue). See https://en.wikipedia.org/wiki/Spurious_wakeup

(defmethod popq ((self queue-bounded))
  (with-slots (queue lock cvar) self
    (bt:with-lock-held (lock)
      (loop
         while (< (get-queue-count queue) 1)
         do (bt:condition-wait cvar lock))
      (speedq:dequeue queue))))

That's said, I give no warranties about the bordeaux-threads implementation being correct in all aspects. Actually, I found this issue because I felt suspicious when noticed how condition-wait / condition-notify are implemented.

The spurious wake up is not a violation of traditional conditional variables semantics. Although it's a little strange to think of the specific behaviour in bordeaux-threads -
that if notifications come more often than waiting happens, the waiting thread should actively burn down all the accumulated notifications in a loop of unlock ; decrement semaphore ; lock.

BTW, @arbv , the implementation for CCL (the API version 1) is not exactly as in the Getting Started in the paper. The paper uses binary semaphore (limit = 1), while the bordeaux-threads CCL uses general (unlimited) semaphore.

@mdbergmann
Copy link

@avodonosov when using the loop I'm running into a stack-underflow error.

AFAIR I tried APIv2 and it worked there.

@avodonosov
Copy link

@mdbergmann , strange, I use loop without problem.

Note, that something worked fine (without spurious wake up) once does not mean it will work that way always.

@sionescu , I think that's a documentation issue in the first place. If spurious wake ups are guaranteed to never happen it is important to document. The posix condition wariables which are explicitly mentioned by bordeaux threads docs as the model, as well as other popular implementations (Java's wait/notify) have spurious wake ups and should be checked from a loop.

@sionescu
Copy link
Owner

sionescu commented Dec 3, 2021

I might have mixed things up. Here's what the Linux manpage says:

Features of Mutexes and Condition Variables

It had been suggested that the mutex acquisition and release be decoupled from condition wait. This was rejected because it is the combined nature of the operation that, in fact, facilitates realtime implementations. Those implementations can atomically move a high-priority thread between the condition variable and the mutex in a manner that is transparent to the caller. This can prevent extra context switches and provide more deterministic acquisition of a mutex when the waiting thread is signaled. Thus, fairness and priority issues can be dealt with directly by the scheduling discipline. Furthermore, the current condition wait operation matches existing practice.

In any case, I believe the current APIv2 implementation does respect POSIX semantics so I'm satisfied with that.

@avodonosov
Copy link

avodonosov commented Dec 4, 2021

@sionescu, so what are you saying, sporious wakeups are possible or not?

The POSIX man page you refer says:

Spurious wakeups from the pthread_cond_timedwait() or pthread_cond_wait() functions may occur.

@sionescu
Copy link
Owner

sionescu commented Dec 4, 2021

Yes, spurious wakeups are always possible.

@sionescu
Copy link
Owner

sionescu commented Dec 4, 2021

I'm closing this because I believe that APIv2 solves the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants