## Critical Sections: The Synchronization Problem

We are now going to take a deeper look inside synchronization of threads/processes and the principles needed to share data safely. There are two typical goals:
* Contention: 
  * How to resolve the conflicts that result from multiple processes trying to access shared resources?
* Cooperation:
  * An action by one process may enable another action by another process
  * In such cases, processes should coordinate their actions
  
Review section 1.2 in Herlihy that describes a protocol based on   The key comment here is:

> This kind of argument by contradiction shows up over and over again, and it is worthwhile 
spending some time convincing ourselves why this claim is true. It is important to note that  we never assumed that “raising my flag” or the “looking at your flag” happens instantaneously, nor did we make any assumptions about how long such activities take. All we care about is when these activities start or end.

So, the protocol is asynchronous and not time dependent. This is an important feature.

Synchronization issues are the major bug in parallel programs
* Deadlock: when a program cannot progress because all of it's threads/processes are waiting on resources.
* Incorrect results: from uncontrolled sharing (race conditions) on data.

The constructs/algorithms underlying critical sections (locks, atomic variables) are complex.
Understanding them will help you use them well.

  
### Mutual Exclusion

Mutual exclusion between is the core problem in synchronization.
It guarantees
  * Exclusive access to a shared resource among competing processes
  * No deadlock
  * Starvation resistance (must make progress eventually)

__Peterson's Algorithm__ gives a solution for two processes.

```python
# process 0
b[0] = True
turn = 0
await(b[1]==False or turn==1)
# critical section
...
b[0] = False
```

```python
# process 1
b[1] = True
turn = 1
await(b[0]==False or turn==0)
# critical section
...
b[1] = False
```

* b[x] indicates process b’s desire for resource x
* Write to turn will resolve who got there first
  * concurrent writes to `turn` can race against each other the variable will be assigned by the later write
  * the last writer gives the mutex to the other party 
  * `turn` is an example of a variable that must be `volatile`
* `await` ensures that either
  * the other party is not contending
  * or the other party wrote `turn` to give precedence
* Peterson's algorithms has the following properties
  * mutual exclusion (of the critical section code)
  * _starvation resistant_: no process can go twice in a row while the other process is waiting
  * _contention free overhead_ of 4 memory accesses.  For process 0
    1. write `b[0]`
    2. write `turn`
    3. read `b[1]`
    4. write `b[0]
  * Uses three shared _atomic_ registers
      * atomic here means atomic read and atomic write
      * In Java, the `volatile` specifier ensures atomicity
    
This algorithm meets the desiderata for mutual exclusion:
  * correct
  * asynchronous (doesn't depend on timing)
  * symmetric (parties have equal chances)
  
But, it only works for 2 parties.
  
__busy waiting__: The process in the `await` loop is called __busy waiting__ or __spinning__. An active processor keeps checking/polling the value of a variable.  This is good when:
* there is low contention for resources
* wait periods are very short

However, it is bad when:
* wait periods are long
* high contention (many processes waiting)
With long waits and high contention, spinning wastes an arbitrary amount of cycles and __power__.  Spinning is a major cause of power consumption in poorly written mobile apps. 

The alternative to spinning is to sleep/restart waiting processes.  This has more overhead to start/stop, but frees processors to do other tasks.  We will look at this in more depth in the future.



### Fast Mutual Exclusion

Scaling mutual exclusion to $n$ processes requires either a lot of overhead or to sacrifice some guarantees.  [Leslie Lamport](https://en.wikipedia.org/wiki/Leslie_Lamport) solved this problem beautifully on his way to winning a Turing Award.  Here is a description with jumps.

```
1. start:  b[i] := true;
2.         x := i;
3.         if y != 0 then b[i] := false;
4.                        await y = 0;
5.                        goto start fi;
6.         y := i;
7.         if x != i then b[i] := false;
8.                        for j := 1 to n do await not b[j] od;
9.                        if y != i then await y = 0;
10.                           goto start fi fi;
11.        ...critical section...
12.        y := 0
13.        b[i] := False
```

This algorithm is complex enough that it is helpful to refer to the flowchart describing it's state

<img src="images/fmex.png" width=384 title="Fast Mutual Exclusion State (citation needed)" />

Breaking down the algorithm:
1. process $i$ indicates it's contending
2. cross first barrier: set $x$ to indicate process i has gone through this step
3. if someone else is contending back off (lines 4 and 5)
6. if no one else is contending, cross second barrier
7. if no one else crossed first barrier, it's yours. otherwise back off.

So, if there is contention two things can happen:
  1. a process gets through both barriers without anyone else crossing the first
     * all others back off at line 3
  2. you are the last process through first barrier and then win at second barrier
 
Fast mutual exclusion has the following properties:
  * Mutual exclusion
  * Deadlock freedom
  * Contention free overhead= ?? accesses  
    * 7, W(1), W(2), R(3), W(6), R(7), W(12), W(13)  
    * Evaluation of if to `false` takes only one instruction
  * Starvation is possible! (of any process)
     * Any process can witness unbounded wait times.  They all race once y is set to 0.  
     * This is particularly problematic when some processes are slower than others, e.g. in a NUMA system, cores with slower access to memory might never win.
    

### Conclusions

* These algorithms are not used in a practice.  
  * There is hardware support for low-level synchronization. 
* But demonstrate fundamental tradeoffs
  * Space (# shared registers), speed, fairness (bounded waiting)
* All of these algorithms rely on atomic registers
  * Not available in distributed memory machines, which leads to whole new families of protocols
