<p style="font-size: 30px; font-weight: bold;">Lock-Free Concurrent Data Structures</p>

# Lock-based concurrent data structures

<img src="img/lock_based_approach.png" alt="Lock-based concurrent data structures" width="80%" style="margin: 0 auto;">

## Blocing vs non-blocking

* Programs that use mutexes, condition variables, and futures are called **blocking**
* They call library functions that will suspend the execution of a thread until another thread performs an action
* Thread can’t progress past this point until block is removed
* Typically, OS will suspend a blocked thread completely
* Programs that don’t use blocking library calls are non-blocking

# Lock-free structures

* More than one thread can access the data structure concurrently
* If one thread is suspended by the scheduler midway through its operation, other threads must still be able to complete their operation
    * Caveat: compare/exchange loops
* Can still result in a thread being subject of starvation
    * Consequences of “wrong” timing
    * One thread makes progress while another one continuously retries its operation

<img src="img/lock_based_vs_lock_free.png" alt="Lock-based concurrent data structures" width="80%" style="margin: 0 auto;">

## Wait-free and Lock-free definitions

**Wait-free:** A method is wait-free if it guarantees that every call to it finishes its execution in a finite number of steps. It is **bounded wait-free** if there is a bound on the number of steps a method call can take.

**Lock-free:** A method is lock-free if it guarantees that infinitely often some method call finishes in a finite number of steps. Clearly, any wait-free method is also lock-free, but not vice versa. Lock-free algorithms admit the possibility that some threads could starve.

## Advantages of lock-free data structures

* Performance – some thread makes progress with every step
    * In a wait-free DS every thread can make forward progress
* Robustness – if a thread dies during an operation, only its data is lost
    * If a thread dies while holding a lock, DS is broken forever
* Deadlocks impossible – although livelocks may occur

## Livelock

* Two threads concurrently try to change the DS
* Actions performed by one cause other to fail and vice versa
* Each thread has to continuously restart its operation
* Analogy – two people trying to go simultaneously through a narrow gap
    * They keep retrying until they agree on an order
* Typically short-lived like the scheduling condition that causes them
    * Decreases performance rather than preventing progress entirely

## Disadvantages of lock-free data structures

* Hard to uphold invariants in the absence of mutexes
    * Avoiding data races requires atomic operations
    * Important to ensure that updates become visible in the correct order
* May improve concurrency but decrease overall performance
    * Atomic operations can be slower than non-atomic ones
    * Hardware must synchronize data between threads
    * Performance not necessarily portable

# Lock-free Stack

## Push Method

* The `compare_exchange_weak` helps to always have the last value of the head and not lose track of any node when more than one thread is trying to push an element into the stack
* Once the thread has been out of the push method, means that the value was added to the stack without a problem

<img src="img/thread_safe_lock_free_push.png" alt="Thread Safe Lock-Free Push" width="80%" style="margin: 0 auto;">

## Pop Method

### Remarks of the code in the Diagram `C04_stack_pop`

* The hard part in this part of the code is to **not have memory leaks**. C++ is not a garbage collector language like C# or Java. Therefore it is necessary to implement a system to eliminate the pointers and information that are no longer need after the pop.
* The pop functionality is made only in the first lines using the `while` to get the last real value of the `head`
* The next lines in this approach tries to eliminate the residual pointers when there were already used. However, as it is mentioned in the diagram, there is a *wrong behavior* using this code, when a thread gets the head but it starts again after a complete process in otehr thread.

### Discussion of the `pop()` with hazard pointers

* In this code a change is made in compare with the other one. It is using the hazard pointer, which is going to verify that an element in the stack is not being used from another thread when wants to be deleted.
* Hazard pointers store a list of the nodes in use

Although this simple implementation does indeed safely reclaim the deleted nodes, it adds quite a bit of overhead to the process. Scanning the hazard pointer array requires checking `max_hazard_pointers` atomic variables, and this is done for every `pop()` call. Atomic operations are inherently slow—often 100 times slower than an equivalent nonatomic operation on desktop CPUs—so this makes `pop()` an expensive operation. Not only do you scan the hazard pointer list for the node you’re about to remove, but you also scan it for each node in the waiting list. Clearly this is a bad idea. There may well be max_hazard_pointers nodes in the list, and you’re checking all of them against `max_hazard_pointers` stored hazard pointers. Ouch! There has to be a better way.

### Reference Counting

* Hazard pointers store a list of the nodes in use
* Reference counters stores a count of the of the number of threads accessing each node
    * Idea similar to `std::shared_ptr<>`
* Why not just using `std::shared_ptr<>`?
    * Not guaranteed to be lock free
    * A lock-free implementation would impose overheads in many use case scenarios for which lock freedom is not needed
    * If `std::shared_ptr<>` was lock-free, problem would be easily solved


### Reference Couting with split counters

* External count kept alongside the pointer
    * Increased every time the pointer is read
* Internal count kept alongside the node
    * Decreased when reader is finished with the node
* Sum equal to total number of references
* A simple operation reading the pointer will leave the
    * External counter increased by one
    * Internal counter decreased by one
* When the external count/pointer pairing is no longer needed (i.e., node no longer accessible from a location accessible to multiple hreads)
    * The value of the external count is added to the internal count
    * The internal count is decreased by one
    * The external count is discarded
* Once the internal count is zero, there are no outstanding references and the node can be deleted