### Barrier Synchronization with Semaphores (Python) [with Solution]

Here is a solution to barrier synchronization that is symmetric in how each worker synchronizes; each worker first notifies the other worker of their arrival at the barrier and then waits for the other worker to arrive:

```
var barrier1, barrier2: semaphore = 0, 0
```
<div style="display:table">
  <div style = "display:table-cell; border-left:2em solid transparent">

```algorithm
process worker1
    while true do
        task 1
        V(barrier1)
        P(barrier2)
```
  </div>
  <div style = "display:table-cell; border-left:2em solid transparent">

```algorithm
process worker2
    while true do
        task 2
        V(barrier2)
        P(barrier1)
```
  </div>
</div>
<br>

To argue for correctness, two ghost variables, `c1` and `c2`, are introduced for the number of times each worker has arrived at their barrier. The barrier property states that `c1` and `c2` cannot differ by more than `1`:

```
BR: |c1 – c2| ≤ 1
```

The global invariant, `P`, includes the conditions that hold after each of the two `V` operations:

```algorithm
var barrier1, barrier2: semaphore = 0, 0
var c1, c2: integer = 0, 0
{P: BR ∧ (barrier1 > 0 ⇒ c1 = c2 ∨ c1 = c2 + 1) ∧ (barrier2 > 0 ⇒ c1 = c2 ∨ c1 + 1 = c2)}
```
<div style="display:table">
  <div style = "display:table-cell; border-left:2em solid transparent">

```algorithm
process worker1
    {P ∧ (c1 = c2 ∨ c1 + 1 = c2)}
    while true do
        task 1
        ⟨barrier1, c1 := barrier1 + 1, c1 + 1⟩ 
        {P ∧ (c1 = c2 ∨ c1 = c2 + 1)}
        ⟨await barrier2 > 0 then
            barrier2 := barrier2 – 1⟩
        {P ∧ (c1 = c2 ∨ c1 + 1 = c2)}
```
  </div>
  <div style = "display:table-cell; border-left:2em solid transparent">

```algorithm
process worker2
    {P ∧ (c1 = c2 ∨ c1 = c2 + 1)}
    while true do
        task 2
        ⟨barrier2, c2 := barrier2 + 1, c2 + 1⟩ 
        {P ∧ (c1 = c2 ∨ c1 + 1 = c2)}
        ⟨await barrier1 > 0 then
            barrier1 := barrier1 – 1⟩
        {P ∧ (c1 = c2 ∨ c1 = c2 + 1)}
```
  </div>
</div>

Here is an asymmetric scheme for barrier synchronization with semaphores for two processes, with locations marked by `🄰` and `🄱`:

```
var barrier1, barrier2: semaphore = 0, 0
```
<div style="display:table">
  <div style = "display:table-cell; border-left:2em solid transparent">

```algorithm
process worker1
    while true do
        task 1
        P(barrier2) 🄰
        V(barrier1) 🄱
```
  </div>
  <div style = "display:table-cell; border-left:2em solid transparent">

```algorithm
process worker2
    while true do
        task 2
        V(barrier2) 🄰
        P(barrier1) 🄱
```
  </div>
</div>
<br>

For `worker1` to be at `🄰`, `worker2` must have passed `🄰`. For `worker2` to be at `🄱`, `worker1` must have passed `🄱`. Hence, for a process to leave its barrier, the other must have also arrived at its barrier. To show this formally, two ghost variables, `c1` and `c2`, are introduced with global invariant `P`:

```algorithm
var barrier1, barrier2: semaphore = 0, 0
var c1, c2: integer = 0, 0
{P: BR ∧ (barrier1 > 0 ⇒ c1 = c2) ∧ (barrier2 > 0 ⇒ c1 + 1 = c2)}
```
<div style="display:table">
  <div style = "display:table-cell; border-left:2em solid transparent">

```algorithm
process worker1
    {P ∧ (c1 = c2 ∨ c1 + 1 = c2)}
    while true do
        task 1
        ⟨await barrier2 > 0 then
            barrier2, c1 := barrier2 – 1, c1 + 1⟩
        {P ∧ c1 = c2}
        ⟨barrier1, c1 := barrier1 + 1, c1 + 1⟩
        {P ∧ (c1 = c2 ∨ c1 + 1 = c2)}
```
  </div>
  <div style = "display:table-cell; border-left:2em solid transparent">

```algorithm
process worker2
    {P ∧ c1 = c2}
    while true do
        task 2
        {P ∧ c1 = c2}
        ⟨barrier2, c2 := barrier2 + 1, c2 + 1⟩
        {P ∧ (c1 = c2 ∨ c1 + 1 = c2)}
        ⟨await barrier1 > 0 then barrier1 := barrier1 – 1⟩
        {P ∧ c1 = c2}
```
  </div>
</div>
<br>

Note that `BR` can, in the asymmetric version, be strengthened to `0 ≤ c2 – c1 ≤ 1`.

Generalize barrier synchronization to three processes!

```algorithm
YOUR ANSWER HERE
```

_Solution 1:_  
This solution uses one semaphore per worker. It is a generalization of the symmetrical solution. All workers first notify their arrival at the barrier to all other workers and then wait for all other workers to arrive at their barrier.

```
var barrier1, barrier2, barrier3: semaphore = 0, 0, 0
```
<div style="display:table">
  <div style = "display:table-cell; border-left:2em solid transparent">

```algorithm
process worker1
    while true do
        task 1
        V(barrier2) 🄰
        V(barrier3) 🄱
        P(barrier1) 🄲
        P(barrier1) 🄳
```
  </div>
  <div style = "display:table-cell; border-left:2em solid transparent">

```algorithm
process worker2
    while true do
        task 2
        V(barrier1) 🄰
        V(barrier3) 🄱
        P(barrier2) 🄲
        P(barrier2) 🄳
```
  </div>
  <div style = "display:table-cell; border-left:2em solid transparent">

```algorithm
process worker3
    while true do
        task 3
        V(barrier1) 🄰
        V(barrier2) 🄱
        P(barrier3) 🄲
        P(barrier3) 🄳
```
  </div>
</div>
<br>

For `worker1` to be at `🄳`, `worker2` must have passed `🄰` and `worker3` must have passed `🄰`; similar reasoning holds for `worker2` to be at `🄳` and `worker3` to be at `🄳`. Once `worker1` is at `🄳`, it can continue with its task while `worker2` is still at `🄰` and `worker3` is still at `🄰`, but then will get blocked at `🄱` and `🄲` until `worker1` and `worker2` complete their task. No overtaking is possible; similar reasoning holds for `worker2` and `worker3`. When generalizing to `N` workers, each worker has to execute `N – 1` times `V` operations followed by `N – 1` times `P` operations.

_Solution 2:_  
This solution uses one semaphore per worker. It is a generalization of the the above asymmetrical solution in that one worker executes `P` and then `V` and one worker `V` and then `P`. With three (or more) workers, the `P` operation is always on the semaphore of the "next" worker and the `V` operation is on the semaphore of the worker itself, causing a chain of notifications. Here is an incomplete solution:

```
var barrier1, barrier2, barrier3: semaphore = 0, 0, 0
```
<div style="display:table">
  <div style = "display:table-cell; border-left:2em solid transparent">

```algorithm
process worker1
    while true do
        task 1
        P(barrier1) 🄰
```
  </div>
  <div style = "display:table-cell; border-left:2em solid transparent">

```algorithm
process worker2
    while true do
        task 2
        P(barrier2) 🄰
        V(barrier1) 🄱
```
  </div>
  <div style = "display:table-cell; border-left:2em solid transparent">

```algorithm
process worker3
    while true do
        task 3
        V(barrier2) 🄰
```
  </div>
</div>

For `worker1` to be at `🄰`, `worker2` must have been at `🄱` and therefore at `🄰`, for which `worker3` must have been at `🄰`. Thus, the notifications work like a chain that can be extended to more than three workers. However, `worker3` could run away and repeatedly pass its barrier; since `barrier2` is incremented each time, `worker2` also could repeatedly pass its barrier while `worker1` has not reached it. To prevent that, we add another notification chain in the other direction: 

```
var barrier1, barrier2, barrier3: semaphore = 0, 0, 0
```
<div style="display:table">
  <div style = "display:table-cell; border-left:2em solid transparent">

```algorithm
process worker1
    while true do
        task 1
        P(barrier1) 🄰
        V(barrier2) 🄱
```
  </div>
  <div style = "display:table-cell; border-left:2em solid transparent">

```algorithm
process worker2
    while true do
        task 2
        P(barrier2) 🄰
        V(barrier1) 🄱
        P(barrier2) 🄲
        V(barrier3) 🄳
```
  </div>
  <div style = "display:table-cell; border-left:2em solid transparent">

```algorithm
process worker3
    while true do
        task 3
        V(barrier2) 🄰
        P(barrier3) 🄱
```
  </div>
</div>
<br>

When generalizing to `N` workers, `N` semaphores are needed, as in the previous solution. The first and last workers execute two `P` or `V` operations, while all others execute four `P` or `V` operations. Compared to the previous solution, the number of `P` and `V` operations is reduced.

_Solution 3:_  
This solution reduces the number of `P` and `V` operations by introducing an integer counter for the number of workers that arrived at the barrier. There is still one semaphore per worker:

```algorithm
var mutex, barrier1, barrier2, barrier3: semaphore = 1, 0, 0, 0
var done: integer = 0
```
<div style="display:table">
  <div style = "display:table-cell; border-left:2em solid transparent">

```algorithm
process worker1
    while true do
        task 1
        P(mutex)
        done := done + 1
        if done = 3 then
            done := 0; V(mutex)
            V(barrier2); V(barrier3) 🄰
        else
            V(mutex); P(barrier1) 🄱
```
  </div>
  <div style = "display:table-cell; border-left:2em solid transparent">

```algorithm
process worker2
    while true do
        task 2
        P(mutex)
        done := done + 1
        if done = 3 then
            done := 0; V(mutex)
            V(barrier1); V(barrier3) 🄰
        else
            V(mutex); P(barrier2) 🄱
```
  </div>
  <div style = "display:table-cell; border-left:2em solid transparent">

```algorithm
process worker3
    while true do
        task 3
        P(mutex)
        done := done + 1
        if done = 3 then
            done := 0; V(mutex)
            V(barrier1); V(barrier2) 🄰
        else
            V(mutex); P(barrier3) 🄱
```
  </div>
</div>
<br>

All workers have the same synchronization code. The first two workers who arrive at the barrier get blocked at `🄱`. The third one notifies those and continues at `🄰`. Whichever worker completes its task first gets blocked again at `🄱`. No overtaking is possible.

_Solution 4:_  
This solution uses an integer counter for the number of workers that have arrived at the barrier and a fixed number of semaphores. Here is an incomplete solution:

```algorithm
var mutex, barrier: semaphore = 1, 0
var done: integer = 0

process worker1
    while true do
        task 1
        P(mutex)
            done := done + 1
            if done = 3 then
                V(barrier); V(barrier); V(barrier); done := 0
        V(mutex)
        P(barrier)
```

All other workers have the same synchronization code. The first two workers that arrive at the barrier get blocked at `P(barrier)`. The third one allows those and itself to continue. The problem is that the three `V(barrier)` operations may be signalling the same worker rather than three different ones. Hence, one worker may overtake. The solution is to have two synchronization points. Below, all workers must pass `🄰` before they can pass `🄱`. This technique is called a *two-phase barrier*. Only one integer counter is needed, but each phase has to have its barrier semaphore, called `arrive` and `leave` below:
```algorithm
var mutex, arrive, leave: semaphore = 1, 0, 0
var done: integer = 0
```
<div style="display:table">
  <div style = "display:table-cell; border-left:2em solid transparent">

```algorithm
process worker1
    while true do
        task 1
        P(mutex)
            done := done + 1
            if done = 3 then
                V(arrive)
                V(arrive)
                V(arrive)
        V(mutex)
        P(arrive)
        🄰
        P(mutex)
            done := done – 1
            if done = 0 then
                V(leave)
                V(leave)
                V(leave)
        V(mutex)
        P(leave)
        🄱
```
  </div>
  <div style = "display:table-cell; border-left:2em solid transparent">

```algorithm
process worker2
    while true do
        task 2
        P(mutex)
            done := done + 1
            if done = 3 then
                V(arrive)
                V(arrive)
                V(arrive)
        V(mutex)
        P(arrive)
        🄰
        P(mutex)
            done := done – 1
            if done = 0 then
                V(leave)
                V(leave)
                V(leave)
        V(mutex)
        P(leave)
        🄱
```
  </div>
  <div style = "display:table-cell; border-left:2em solid transparent">

```algorithm
process worker3
    while true do
        task 3
        P(mutex)
            done := done + 1
            if done = 3 then
                V(arrive)
                V(arrive)
                V(arrive)
        V(mutex)
        P(arrive)
        🄰
        P(mutex)
            done := done – 1
            if done = 0 then
                V(leave)
                V(leave)
                V(leave)
        V(mutex)
        P(leave)
        🄱
```
  </div>
</div>

Here is the corresponding implementation in Python. Modify that to three processes as well!

In [None]:
# With 2 workers
from threading import Thread, Semaphore
from time import sleep
from sys import stdout

class Ping(Thread):
    def run(self):
        while True:
            stdout.write('ping\n'); sleep(2)    # task
            barrier2.acquire()                  # wait
            barrier1.release()                  # signal


class Pong(Thread):
    def run(self):
        while True:
            stdout.write('pong\n'); sleep(4)    # task
            barrier2.release()                  # signal
            barrier1.acquire()                  # wait

barrier1, barrier2 = Semaphore(0), Semaphore(0) # create semaphores
Ping().start(); Pong().start()                  # create and run threads

In [None]:
YOUR ANSWER HERE

In [None]:
# With 3 workers, Solution 1. 
from threading import Thread, Semaphore
from time import sleep
from sys import stdout

class Ping1(Thread):
    def run(self):
        for _ in range(10):
            stdout.write('ping 1\n'); stdout.flush(); sleep(1) # task
            barrier2.release()                                 # signal
            barrier3.release()                                 # signal
            barrier1.acquire()                                 # wait
            barrier1.acquire()                                 # wait

class Ping2(Thread):
    def run(self):
        for _ in range(10):
            stdout.write('ping 2\n'); stdout.flush(); sleep(2) # task
            barrier1.release()                                 # signal
            barrier3.release()                                 # signal
            barrier2.acquire()                                 # wait
            barrier2.acquire()                                 # wait

class Ping3(Thread):
    def run(self):
        for _ in range(10):
            stdout.write('ping 3\n'); stdout.flush(); sleep(3) # task
            barrier1.release()                                 # signal
            barrier2.release()                                 # signal
            barrier3.acquire()                                 # wait
            barrier3.acquire()                                 # wait

barrier1, barrier2, barrier3 = Semaphore(0), Semaphore(0), Semaphore(0)
Ping1().start(); Ping2().start(); Ping3().start()              # create and run

In [None]:
# With 3 workers, Solution 2.
from threading import Thread, Semaphore
from time import sleep
from sys import stdout

class Ping1(Thread):
    def run(self):
        for _ in range(10):
            stdout.write('ping 1\n'); stdout.flush(); sleep(1) # task
            barrier1.acquire()                                 # wait
            barrier2.release()                                 # signal

class Ping2(Thread):
    def run(self):
        for _ in range(10):
            stdout.write('ping 2\n'); stdout.flush(); sleep(2) # task
            barrier2.acquire()                                 # wait
            barrier1.release()                                 # signal
            barrier2.acquire()                                 # wait
            barrier3.release()                                 # signal

class Ping3(Thread):
    def run(self):
        for _ in range(10):
            stdout.write('ping 3\n'); stdout.flush(); sleep(3) # task
            barrier2.release()                                 # signal
            barrier3.acquire()                                 # wait

barrier1, barrier2, barrier3 = Semaphore(0), Semaphore(0), Semaphore(0)
Ping1().start(); Ping2().start(); Ping3().start()              # create and run

In [None]:
# With 3 workers, Solution 3. 
from threading import Thread, Semaphore
from time import sleep
from sys import stdout

class Ping1(Thread):
    def run(self):
        global done
        for _ in range(10):
            stdout.write('ping 1\n'); stdout.flush(); sleep(1) # task
            mutex.acquire()
            done += 1
            if done == 3: done = 0; mutex.release(); barrier2.release(); barrier3.release()
            else: mutex.release(); barrier1.acquire()

class Ping2(Thread):
    def run(self):
        global done
        for _ in range(10):
            stdout.write('ping 2\n'); stdout.flush(); sleep(2) # task
            mutex.acquire()
            done += 1
            if done == 3: done = 0; mutex.release(); barrier1.release(); barrier3.release()
            else: mutex.release(); barrier2.acquire()

class Ping3(Thread):
    def run(self):
        global done
        for _ in range(10):
            stdout.write('ping 3\n'); stdout.flush(); sleep(3) # task
            mutex.acquire()
            done += 1
            if done == 3: done = 0; mutex.release(); barrier1.release(); barrier2.release()
            else: mutex.release(); barrier3.acquire()
            

done, mutex = 0, Semaphore(1)
barrier1, barrier2, barrier3 = Semaphore(0), Semaphore(0), Semaphore(0)
Ping1().start(); Ping2().start(); Ping3().start()              # create and run

In [None]:
# With 3 workers, Solution 4. 
from threading import Thread, Semaphore
from time import sleep
from sys import stdout

class Ping1(Thread):
    def run(self):
        global done
        for _ in range(10):
            stdout.write('ping 1\n'); stdout.flush(); sleep(1) # task
            mutex.acquire()
            done += 1
            if done == 3: arrive.release(); arrive.release(); arrive.release() 
            mutex.release()
            arrive.acquire()
            mutex.acquire()
            done -= 1
            if done == 0: leave.release(); leave.release(); leave.release() 
            mutex.release()
            leave.acquire()

class Ping2(Thread):
    def run(self):
        global done
        for _ in range(10):
            stdout.write('ping 2\n'); stdout.flush(); sleep(2) # task
            mutex.acquire()
            done += 1
            if done == 3: arrive.release(); arrive.release(); arrive.release() 
            mutex.release()
            arrive.acquire()
            mutex.acquire()
            done -= 1
            if done == 0: leave.release(); leave.release(); leave.release() 
            mutex.release()
            leave.acquire()

class Ping3(Thread):
    def run(self):
        global done
        for _ in range(10):
            stdout.write('ping 3\n'); stdout.flush(); sleep(3) # task
            mutex.acquire()
            done += 1
            if done == 3: arrive.release(); arrive.release(); arrive.release() 
            mutex.release()
            arrive.acquire()
            mutex.acquire()
            done -= 1
            if done == 0: leave.release(); leave.release(); leave.release() 
            mutex.release()
            leave.acquire()

done = 0
mutex, arrive, leave = Semaphore(1), Semaphore(0), Semaphore(0)
Ping1().start(); Ping2().start(); Ping3().start()              # create and run

Output of `ping` and `pong` may not appear in the order in which they were written due to buffering. Use `stdout.flush()` to force buffers to be flushed. Will that always guarantee output in the same order in which `stdout.write()` occurred? Type `help(type(stdout))` in a code cell to see the interval in which buffers are flushed in the underlying implementation (you are not supposed to make use of that interval, as it may change).

YOUR ANSWER HERE

*Answer:*  
No, `stdout.write()`and `stdout.flush()` are separate operations; another thread may flush in between. Using `flush()` only makes it more likely that the output will be in the order of occurrences of `write()`.