<div style="float:right;border-left:1em solid transparent">
    <i>Notebooks on Programming</i>
</div>

---
# Preface
### [Emil Sekerinski](http://www.cas.mcmaster.ca/~emil), McMaster University, November 2024

---

<figure style="float:right" >
    <img style="width:9em" src="./img/by-nc-nd.png"/>
    <figcaption style="width:13em;font-size:x-small"><a href="https://creativecommons.org/licenses/by-nc-nd/4.0/" style="font-size:x-small">Licensed under Creative Commons CC BY-NC-ND</a>
    </figcaption>
</figure>

## 1. Examples of Concurrency


<figure style="width:18em;float:right;border-left:1em solid transparent">

Table of Contents of Operating System Concepts by J. L. Peterson, A. Silberschatz, Addison-Wesley, 1983:
  1. Introduction
  2. Operating System Services
  3. File Systems
  4. CPU Scheduling
  5. Memory Management
  6. Virtual Memory
  7. Disk and Drum Scheduling
  8. Deadlocks
  9. Concurrent Processes
  10. Concurrent Programming
  11. Protections
  12. Design Principles
  13. Distributed Systems
  13. Historical Perspective

</figure>

### Example 1: Operating Systems

<figure style="width:14em;float:right;border-left:1em solid transparent">
    <img alt="Cover of above book illustrated with dinasaurs" src="./img/OperatingSystemConcepts.jpg"/>
    <figcaption style="font-size:x-small">Credit: <a href="https://codex.cs.yale.edu/avi/os-book/OS10/covers-dir/index.html" style="font-size:x-small">https://codex.cs.yale.edu/avi/os-book/OS10/covers-dir/index.html</a>
    </figcaption>
</figure>

Historically, operating systems supported *multiprocessing* with a single *processor:*

- The processor switches among programs (*processes*) after a short *time slice*. 
- When one program is *blocked* on I/O, the processor also switches to another program.

This provides the illusion that programs run in parallel. Programs may not directly interact but need to _synchronize_ access to common resources like file storage, printer, or network. Thus concurrency and synchronization became part of books on operating systems and are still!



<figure style="width:40%;float:right;border-left:10px solid transparent">
    <img alt = "Main Thread containing Looper and Message Queue. Looper connected to Local Service Call, Broadcast Receiver, and Activity. Message Queue connected to UI Events and System Events" src="./img/android-threads.png"/>
    <figcaption style="font-size:x-small">Credit: <a href="https://github.com/codepath/android_guides/wiki/Managing-Threads-and-Custom-Services" style="font-size:x-small">Android Guides</a>
    </figcaption>
</figure>

### Example 2: Interactive Programs

From [Android Guides](https://github.com/codepath/android_guides/wiki/Managing-Threads-and-Custom-Services):
> The main thread ... is in charge of dispatching events and rendering the user interface and is usually called the _UI thread_. All components (activities, services, etc) ... run in the same process and are instantiated by default in the UI thread.
>
> ... performing long operations such as network access or database queries in the UI thread will block the entire app UI from responding. When the UI thread is blocked, no events can be dispatched, including drawing events. From the user's perspective, the application will appear to freeze. 
>
> Additionally, ... the Android UI toolkit is not thread-safe and as such you must not manipulate your UI from a background thread ... two rules:
>
> - Do not run long tasks on the main thread (to avoid blocking the UI)
> - Do not change the UI at all from a background thread (only the main thread)

Programs under [Windows](https://msdn.microsoft.com/en-us/library/ff649143.aspx) have a similar structure.

<figure style="width:65%;float:right;border-left:1em solid transparent">
    <img alt = "Rectangle for Pizza Customer and Pizza Vendor processes. Pizza Customer has states Hungry for pizza, Select a pizza, Order a pizza, after 60 min ask for pizza, pizza received and pay the pizza, eat the pizza, and hunger satisfied. Pizza Vendor has a lane for delivery boy with states Deliver the pizza, Receive payment, a lane for pizza chef with state Bake the pizza, and a lane for clerk with states Order received, call 'where is the pizza' and Calm Customers. States are connected by synchronization arrows" src="./img/bpmn-pizza.png"/>
    <figcaption style="font-size:x-small">Credit: <a href="https://github.com/bpmn-io/bpmn-js-examples/tree/master/colors" style="font-size:x-small">bpmn.io</a>
    </figcaption>
</figure>

### Example 3: Requirements Analysis

The [Business Process Modelling Notation](http://www.bpmn.org/) (BPMN) serves to describe the interactions between agents in business processes. BPMN diagrams are similar to flowcharts, but they are not meant to be executed; they only serve to document the setting for which software is to be developed. (BPMN elements are also supported in [draw.io](draw.io) and Vizio.)

Other requirements analysis techniques are textual. Being able to express concurrency is essential as the surrounding world is concurrent.

<figure style="width:40%;float:right;border-left:3em solid transparent">
    <img alt= "Camera with view rays going through a raster image and some reflecting on a scene object as shadow rays to a light source" src="./img/Ray_trace_diagram.png"/>
    <figcaption style="font-size:x-small">Credit:
        <a href="https://en.wikipedia.org/wiki/Ray_tracing_(graphics)" style="font-size:x-small">Wikipedia</a>
    </figcaption>
</figure>

### Example 4: Parallel Computing

Ray tracing is a technique to render raster images of a three-dimensional scene, e.g. in animations. For all image pixels, rays are traced "backward" from the camera (eye) until they hit an object in the scene. From there, the reflecting and refracting rays are traced until they hit a light source. This process is recursively repeated as objects may be in the shadow of other objects. The colour and intensity of a pixel are then calculated based on the reflecting and refracting properties of the traced objects and the light source's colour. The run-time for each pixel is proportional to the number of light sources, proportional to the number of rays spawned by every object hit, and exponential in the depth of the recursion.

As light rays do not influence each other, all pixels of a frame and all frames of an animation can be computed in parallel. With 4K resolution (3840 × 2160 pixels) and 24 frames per second, a 110 min animation needs 1.3 × 10¹² pixels to be computed. For all rays needed for one pixel, the intersection of the ray with any object needs to be determined.

<figure style="width:35%;float:right;border-left:1em solid transparent">
    <img alt="bar diagram with number of processors: 1000 for Bugs, 2500 for Incredibles, 3000 for Cars, 5000 for Rat, 3000 for Wall-E, 4000 for Up, 4500 for TS3, 12000 for Cars, 10000 for Brave, 20000 for Monsters University, 20000 for Inside Out" src="./img/pixar-processors.jpg"/>
    <figcaption style="font-size:x-small">Credit:
        <a href="http://fortune.com/2015/09/14/pixar-brings-movies-life/" style="font-size:x-small">Fortune, Sept 2015</a>
    </figcaption>
</figure>

<a href="http://fortune.com/2015/09/14/pixar-brings-movies-life/">How Pixar brings its animated movies to life, Fortune, Sept 2015</a>:

> It took around 3,000 processors to render the movies The Incredibles and Cars, two films from the mid 2000s. For more recent films like Monsters University and Inside Out, that number has soared to around 20,000 processors.
>
> All that extra processing power is noticeable when you study the hairs of the animated creature, Sulley, ... . When the original Monsters movie first appeared in 2001, Sulley had 1.1 million hairs covering his body. By Monsters University, released in 2013, Sulley had 5.5 million individual hairs.

<figure style="width:30%;float:right;border-left:1em solid transparent">
    <img alt="Post Office connected to 4 Counters and 1 Door" src="./img/PostOffice.jpg"/>
    <img alt="Counter connected to Queue and Clerk" src="./img/Counter.jpg"/>
    <img alt="3 Customers connect by next pointers, Queue pointing to first and last Customer" src="./img/Queue.jpg"/>
    <figcaption style="font-size:x-small">
        Credit: Birtwhistle, Dahl, Myhrhaug, Nygaard. <i style="font-size:x-small">Simula Begin</i>, 1979.
    </figcaption>
</figure>

### Example 5: Software Design

Object-oriented design allows the structure of the program, the class hierarchy, to reflect the structure of the problem domain: _a description of the problem becomes part of the solution_.

Simula-67, the first object-oriented language, supported _coroutines_ that allowed objects to be concurrent, particularly for simulations. Coroutines are scheduled *cooperatively*, meaning that transfer for control is explicit, unlike preemptively scheduled threads. (Coroutines are called [fibres](https://msdn.microsoft.com/en-us/library/windows/desktop/ms682661.aspx) in Windows and are related to goroutines in Go; subsequent object-oriented languages, notably Smalltalk-80 and later C++, did not follow Simula in that respect.)

The overview to the right is for a simulation program for a post office: in principle, customers and clerks are all concurrent.

<figure style="width:60%;float:left;border-right:1em solid transparent">
    <img alt="Citizen quartz multi-alarm with state dead and large concurrent state with lanes for main, alarm1-status, alarm2-status, chime-status, light, power, each with substates and transitions between all states." src="./img/statechart-alarm.png"/>
    <figcaption style="font-size:x-small">
        Credit: Harel, <a href="https://doi.org/10.1016/0167-6423(87)90035-9" style="font-size:x-small">Statecharts: a visual formalism for complex systems.</a> <i style="font-size:x-small">Science of Computer Programming</i>, June 1987.
    </figcaption>
</figure>

Embedded systems have to react to environmental events. As these can be independent, a natural structure is by having concurrent processes reacting to different kinds of events. To the left is a *statechart* for an alarm clock (statecharts are related to UML state machines). For example, light and alarm status are independent and expressed as *concurrent states*, visually separated by dashed lines.

### Example 6: Server Architecture

<figure style="width:60%;float:right;border-left:2em solid transparent;">
    <img alt="4 Presentation/Web Servers connect to Switch, Switch connected to 3 Application Servers, those connected to 2nd Switch, that connected to 2 Database Servers, those connected to 3 Storage units" src="./img/ThreeTierServerArchitecture.jpg"/>
    <figcaption style="font-size:x-small">
        Credit: <a href="https://www.cisco.com/c/en/us/about/press/internet-protocol-journal/back-issues/table-contents-46/124-cloud2.html" style="font-size:x-small">Cloud Computing - A Primer, Cisco.</a>
    </figcaption>
</figure>

An early server architecture is the *three-tier architecture*: presentation, application, and database servers are all concurrent. They may run on the same computer or on different computers.

The architecture does not scale well beyond a certain size; why? Data centers for cloud computing connect servers differently, with a "fat tree."

<figure style="width:50%;float:right;border-left:1em solid transparent">
    <a href="https://commons.wikimedia.org/w/index.php?curid=2666326"><img alt="overlapping piconet domains containing each one master device (1 piconet), possibly a master-slave device (1-N piconets), and several slave devices (N piconets) with Bluetooth links among some within a domain and connecting each domain to a neighbouring domain" src="./img/Bluetooth_network_topology.png"/></a>
    <figcaption style="font-size:x-small">
        Credit: <a href="https://commons.wikimedia.org/wiki/User:Rob_Blanco" style="font-size:x-small">Rob Blanco</a>, <a href="https://commons.wikimedia.org/wiki/File:Bluetooth network topology.png" style="font-size:x-small">Bluetooth network topology</a>, <a href="https://creativecommons.org/licenses/by-sa/2.5/es/deed.en" rel="license" style="font-size:x-small">CC BY-SA 2.5 ES</a>
    </figcaption>
</figure>

### Example 7: Protocols

Bluetooth is a protocol for personal area networks (PAN) with a *mesh topology*: devices form *piconets*; devices have different roles, which change as devices join and leave the network. All devices are concurrent and must manage communication, including forwarding messages to the right recipient.

### Common Themes
- Competition for shared resources, e.g. database, counter
- Communication between processes, e.g. between network devices
- Synchronization of processes, e.g. between OS services
- Fairness among processes, e.g. among client requests, network packets
- Hierarchy of processes, e.g. in UI's

## 2. Why is Concurrent Programming Hard?

Processes execute in a _sequence of steps_. Concurrent execution leads to _interleaving_ of steps. For example, the _parallel (concurrent) composition_ <span style="color:darkgreen">A</span>&nbsp;‖&nbsp;<span style="color:darkorange">B</span> of processes <span style="color:darkgreen">A</span> and <span style="color:darkorange">B</span> may result in:

|        |               |
| :----- | :------------ |
| <span style="color:darkgreen">A</span>      | <span style="color:darkgreen;font-size:120%"> ➀ ➁ ➂ ➃ ➄ ➅</span><br> |
| <span style="color:darkorange;font-size:120%">B</span>      | <span style="color:darkorange;font-size:120%"> ➀ ➁ ➂ ➃ ➄ ➅</span>  |
| <span style="color:darkgreen;font-size:120%">A</span> ‖ <span style="color:darkorange;font-size:120%">B</span>  | <span style="color:darkgreen;font-size:120%"> ➀ </span><span style="color:darkorange;font-size:120%"> ➀ ➁ ➂</span><span style="color:darkgreen;font-size:120%"> ➁</span><span style="color:darkorange;font-size:120%"> ➃</span><span style="color:darkgreen;font-size:120%"> ➂ ➃ ➄</span><span style="color:darkorange;font-size:120%"> ➄</span><span style="color:darkgreen;font-size:120%"> ➅</span><span style="color:darkorange;font-size:120%"> ➅</span>|

Interleaving implies *nondeterminism*: different executions may lead to different interleavings. Interleavings may cause *data races*.

Suppose processes `inc1`, `inc2` intend to increment `t`, a variable stored in memory, by `1` and `2`. For this, processors have to load a variable into a register. With `ri` being a register, the steps become:
```
    inc1:                    inc2:   
        r1 := t                  r2 := t
        r1 := r1 + 1         r2 := r2 + 2
        t := r1                  t := r2
```
For example, `t` could represent the number of sold tickets.

**Question.** Which amount does `inc1 ‖ inc2` add to `t` when run concurrently?

_Answer._ Either `1`, `2`, or `3`. A possible interleaving for `t` to be set to `1` is:

```
r1 := t ; r2 := t ; r2 := r2 + 2 ; t := r2 ; r1 := r1 + 1 ; t := r1
```

_Locking_ a variable (or any resource) gives exclusive access to that variable:

<div style="float:left;border-left:2em solid transparent">

```
P:
    lock x and y  
    x := x + 1  
    y := y – 1  
    unlock x and y
```
</div>
<div style="border-left:2em solid transparent">

```
Q:
    lock x and y  
    x := x + 2  
    y := y – 2  
    unlock x and y  
```
</div><br>

Variables `x` and `y` could be two bank accounts or the number of sold and available concert tickets.

**Question.** What could happen in `P ‖ Q` if `P` locks `x`, `y` and `Q` locks `y`, `x` in that order?

_Answer._  
If `P` locks `x` and then `Q` locks `y`, a _deadlock_ occurs: neither can proceed.

- Programs with data races or incorrect synchronization may compute wrong results, deadlock, livelock (infinite loop), or abort.
- Because of inherent nondeterminism, concurrent programs cannot be tested effectively.

<figure style="width:30%;float:right;border-left:2em solid transparent">
    <img alt="6-wheeled Mars Pathfinder on red soil" src="./img/pathfinder-concept.jpg"/>
    <figcaption style="font-size:x-small">
        Credit: <a href="https://www.nasa.gov/mission_pages/pathfinder/overview" style="font-size:x-small">NASA</a>
    </figcaption>
</figure>

### Example 1: NASA Mars Pathfinder

In July 1997, Pathfinder landed on Mars.

After a while Pathfinder stopped sending data and reset itself continuously.

After 18 hours the failure was reproduced in a lab replica: *priority inversion*, a form of *starvation*.

The system had a "watch dog" that discovered the situation and did a reset, and a reset, and a reset, …

The engineers managed to transmit code to Mars and execute it, to update the software.  Testing during development did not reveal the error.

[Authoritative account](http://web.archive.org/web/20161230103247/http://research.microsoft.com/en-us/um/people/mbj/Mars_Pathfinder/Authoritative_Account.html "research.microsoft.com/en-us/um/people/mbj/Mars_Pathfinder/Authoritative_Account.html"), [Wikipedia on Priority Inversion](https://en.wikipedia.org/wiki/Priority_inversion "https://en.wikipedia.org/wiki/Priority_inversion")

<figure style="width:40%;float:right;border-left:2em solid transparent;">
    <img alt="bar diagram for number of errors with 4 groups of bars, 'other' and 'arch/i386' with almost no errors, 'net' with some Block errors, 'fs' with some Null errors, 'drivers' with large number of Block and Null errors" src="./img/LinuxErrors.png"/>
    <figcaption style="font-size:x-small">
        Credit: Chou et al., <a href="https://pdos.csail.mit.edu/archive/6.097/readings/osbugs.pdf" style="font-size:x-small">An Empirical Study of Operating Systems Errors</a>, 18th ACM Symposium on Operating systems principles, Oct 2001
    </figcaption>
</figure>

### Example 2: Device Driver

Device drivers typically run in the operating system kernel and interface to mice, keyboards, drives, and other devices. As they run in the kernel, a faulty device driver can cause the whole operating system to crash.

- Around 2000, Windows shipped with 500 device drivers, most of them provided by device vendors. Reportedly, 80% of Windows crashes were traced back to faulty device drivers; concurrency errors (incorrect locking and releasing of resources etc.) were the most frequent.
- In the Linux 2.4.1 distribution, according to a study from Stanford Unversity, device drivers have 7 times more errors than the rest of the operating system. Among those, concurrency errors are the most frequent: "Block" and "Lock" in the figure to the right.

### Example 3: Northeast American Power Blackout, 14 August 2003

Wikipedia: [World's second most widespread blackout in history:](https://en.wikipedia.org/wiki/Northeast_blackout_of_2003)
- 12:15 p.m. Incorrect power flow telemetry in Ohio detected, but not properly corrected.
- 1:31 p.m. Eastlake, Ohio generating plant shuts down.
- 2:02 p.m. First 345 kV line in Ohio fails due to contact with a tree.
- _2:14 p.m. An alarm system fails at FirstEnergy's control room._
- 2:27 p.m. A second 345 kV line fails due to a tree.
- …
- 4:10 p.m. Ontario separates from the western New York grid.
- 4:11 p.m. The Keith-Waterman, Bunce Creek-Scott 230 kV lines and the St. Clair–Lambton \#1 230 kV line and \#2 345 kV line between Michigan and Ontario fail.
- 4:12 p.m. Windsor, Ontario, and surrounding areas drop off the grid.
- 4:12 p.m. Northern New Jersey separates its power-grids from New York and the Philadelphia area, causing a cascade of failing secondary generator plants along the New Jersey coast and throughout the inland regions west.
- 4:13 p.m. End of cascading failure. 256 power plants are off-line, 85% of which went offline after the grid separations occurred, most due to the action of automatic protective controls.

10 million people in Ontario and 45 million people in eight U.S. states without power

Task Force Report:
>	… a software bug in General Electric Energy's Unix-based XA/21 energy management system that prevented alarms from showing on their control system. _This alarm system stalled because of a race condition_. After the alarm system failed silently without being noticed by the operators, unprocessed events (that had to be checked for an alarm) started to queue up and the primary server failed within 30 minutes.

J.R. Minkel, [The 2003 Northeast Blackout--Five Years Later](https://www.scientificamerican.com/article/2003-blackout-five-years-later/), Scientific American, August 2008:
> The event contributed to at least 11 deaths and cost an estimated $6 billion.

### Example 4: Therac-25

Nancy Leveson and Clark Turner, [An Investigation of the Therac-25 Accidents](https://doi.org/10.1109/MC.1993.274940), IEEE Computer, July 1993:

> Some of the most widely cited software-related accidents in safety-critical systems involved a computerized radiation therapy machine called the Therac-25. Between June 1985 and January 1987, six known accidents involved massive overdoses by the Therac-25 with resultant deaths and serious injuries. They have been described as the worst series of radiation accidents in the 35-year history of medical accelerators.
>
> ...
>
> It is clear from the AECL [Atomic Energy of Canada Limited] documentation on the modifications that the software allows concurrent access to shared memory, that there is no real synchronization aside from data stored in shared variables, and that the "test" and "set" for such variables are not indivisible operations. _Race conditions resulting from this implementation of multitasking played an important part in the accidents._

## 3. Why is Concurrent Programming Getting More Prevalent?

<figure style="width:60%;float:right;border-left:2em solid transparent;">
    <img alt="bar graph for number of devices from 2018 to 2023 showing linear increase from 17 to 29 billion devices" src="./img/DeviceConnectionGrowth.webp"/>
    <figcaption style="font-size:x-small">
        Credit: <a href="https://www.cisco.com/c/en/us/solutions/collateral/service-provider/visual-networking-index-vni/mobile-white-paper-c11-520862.html" style="font-size:x-small">Cisco Global Mobile Data Traffic Forecast</a>, March 2020
    </figcaption>
</figure>

### 1. Increase in Internet Traffic (if it needs to be said)

[Cisco Global Mobile Data Traffic Forecast](https://www.cisco.com/c/en/us/solutions/collateral/service-provider/visual-networking-index-vni/mobile-white-paper-c11-520862.html), March 2020:

The compound annual growth rate (CAGR) of device and connections is predicted to be 10% annually, with machine-to-machine connection (IoT devices) growing most.

### 2. Increase in Number of Processor Cores
<figure style="width:50%;float:right;border-left:2em solid transparent;">
    <img alt="chart with processor characteristics from 1970 to 2020, showing exponential increase of transistors from 10 to 10^7, flattening increase of single-thread performance, flattening and slightly decreasing frequency at 10^3.5, increasing and then constant power consumption at 10^2, constant 1 number of cores and increasing exponentially since 2005" src="./img/42-years-processor-trend.png"/>
    <figcaption style="font-size:x-small">
        Credit: Karl Rupp, <a href="https://www.karlrupp.net/2018/02/42-years-of-microprocessor-trend-data/" style="font-size:x-small">www.karlrupp.net</a>, Feb 2018
    </figcaption>
</figure>

_Processor frequency_ is no longer increasing due to the power wall. For CMOS circuits:

```
Power = Capacitive load × Voltage² × Frequency
```

_Single-thread performance_ is no longer increasing: the benefits of caching, pipelining, etc. are maxed out.

_Number of transitors_ per processor is still increasing–linearly on a logarithmic scale, i.e. exponentially, doubling every 18 months as predicted by Gordon Moore.

_This allow for more cores_.

Further reading: Peter Denning and Ted Lewis, [_Exponential Laws of Computing Growth_](https://cacm.acm.org/magazines/2017/1/211094-exponential-laws-of-computing-growth/fulltext), Communications of the ACM, Jan 2017

## 4. What Can We Do About It?

### Libraries

Libraries hide some of the complexity for specific applications, e.g.
- efficient implementation of *data structures*: [Java](http://gee.cs.oswego.edu/dl/concurrency-interest/index.html)
- *parallel computing*: [MPI (Message Passing Interface)](https://en.wikipedia.org/wiki/Message_Passing_Interface), [OpenMP](http://www.openmp.org/)
- *distributed computing*: [Akka](http://akka.io/) for reliability and load balancing

These are useful in practice, but limited to the intended applications.

### Verification Tools

Microsoft's [VCC](https://github.com/Microsoft/vcc) is one such tool:

> VCC is a mechanical verifier for concurrent C programs. VCC takes a C program, *annotated with function specifications, data invariants, loop invariants, and ghost code*, and tries to prove these annotations correct. If it succeeds, VCC promises that your program actually meets its specifications.

It's main use is for Microsoft's [Hyper-V](https://www.microsoft.com/en-us/research/project/vcc-a-verifier-for-concurrent-c/) hypervisor.

Similar tools for Java and other languages exist. They are mainly used for highly-critical software, despite the fact that these tools dramatically reduce the time for testing and can even reduce development and maintenance effort.

### Static Analysis Tools

Unlike verification tools, static analysis tools find errors with no or little annotation. However, they can't find all errors (*incomplete*) and may produce false warnings (*unsound*).

For detecting concurrency errors in Windows:
- Device drivers have to pass the [Static Driver Verifier](https://docs.microsoft.com/en-us/windows-hardware/drivers/devtest/introducing-static-driver-verifier) (called so because there is also a dynamic verifier that detects driver errors at run-time), a static analysis tools that emerged from the [SLAM](https://www.microsoft.com/en-us/research/project/slam/) research project.
- Visual Studio for C/C++ includes a tool for [Code Analysis](https://docs.microsoft.com/en-us/visualstudio/code-quality/analyzing-c-cpp-code-quality-by-using-code-analysis), which report [numerous errors](https://docs.microsoft.com/en-us/visualstudio/code-quality/mixed-recommended-rules-rule-set): the concurrency errors start at C26100.

For detecting concurrency errors in Java:
- [NASA Java PathFinder](https://en.wikipedia.org/wiki/Java_Pathfinder): free, large, extensive checking
- [ThreadSafe](https://en.wikipedia.org/wiki/ThreadSafe): newer, commercial, specifically for concurrency errors
- [FindBugs](http://findbugs.sourceforge.net/): open source, several categories of errors: [results from some applications](http://findbugs.sourceforge.net/demo.html)
- [IBM Concurrency Benchmark](http://researcher.watson.ibm.com/researcher/view_person_subpage.php?id=5722): a set of programs with concurrency bugs, to evaluate tools

Despite their drawbacks, static analysis tools have become popular in practice, e.g. https://scan.coverity.com/ as they still reduce testing time significantly, don't require training, and fit in existing development processes.

### Dynamic Analysis Tools

[ThreadSanitizer](https://github.com/google/sanitizers/wiki/ThreadSanitizerCppManual) is one such tool, developed by Google, and included in [clang](https://clang.llvm.org/docs/ThreadSanitizer.html):
> ThreadSanitizer is a tool that detects data races. It consists of a compiler instrumentation module and a run-time library.

From a Google report "[How Developers Use Data Race Detection Tools](http://dl.acm.org/citation.cfm?id=2688205)", 2014:
> [ThreadSanitizer] regularly finds critical bugs, and is in wide use across Google ... . One interesting incident occurred in the open source Chrome browser. Up to 15% of known crashes were attributed to just one bug ..., which proved difficult to understand–the Chrome engineers spent over 6 months tracking this bug without success. On the other hand, the [ThreadSanitizer] team found the reason for this bug in a 30 minute run, without even knowing about these crashes. The crashes were caused by data races on a couple of reference counters.

ThreadSanitizer is included in [Xcode](https://developer.apple.com/videos/play/wwdc2016/412/). From Apple's [developer documentation](https://developer.apple.com/documentation/code_diagnostics/thread_sanitizer):
> Running your code with Thread Sanitizer checks enabled can result in CPU slowdown of 2⨉ to 20⨉, and an increase in memory usage by 5⨉ to 10⨉.

### New Programming Languages

- [Go](https://golang.org/): message passing over _synchronous_ and *asynchronous channels*, static analysis for race detection of shared variables.<br> Developed by Google; used by [Netflix](https://github.com/Netflix?language=go), [many others](https://github.com/golang/go/wiki/GoUsers)
- [Erlang](https://www.erlang.org/): functional concurrent language, _actors_ with message passing over asynchronous channels. <br>
Developed by Ericsson; used for [WhatsApp](https://www.fastcompany.com/3026758/inside-erlang-the-rare-programming-language-behind-whatsapps-success) (also [here](https://www.wired.com/2015/09/whatsapp-serves-900-million-users-50-engineers/)), [FaceBook Chat](), [Amazon SimpleDB](https://en.wikipedia.org/wiki/Amazon_SimpleDB), [Cisco Network Configuration](http://www.cse.chalmers.se/edu/year/2016/course/TDA383_LP3/files/lectures/ConsTAhs-20170301.pdf)
- [Scala](http://scala-lang.org/): functional object-oriented language, _futures_ for background computation, actors, data-parallel operations on collections. <br>
Developed by EPFL; used for [Apache Spark](https://spark.apache.org/), [Twitter](https://github.com/search?q=org%3Atwitter&type=Repositories&utf8=%E2%9C%93), [Duolingo](http://making.duolingo.com/rewriting-duolingos-engine-in-scala)
- [Clojure](https://clojure.org/): functional concurrent language, message passing over synchronous and asynchronous channels, _transactions_ for shared state.<br> Used at [Walmart](http://blog.cognitect.com/blog/2015/6/30/walmart-runs-clojure-at-scale)
- [Rust](https://www.rust-lang.org/): has a type system with _ownership_ to make both message passing and shared variables safer.<br>
Developed by Mozilla for [FireFox](http://www.infoworld.com/article/3165424/web-browsers/mozilla-binds-firefoxs-fate-to-the-rust-language.html); used by [Coursera, Dropbox, Samsung, ...](https://www.rust-lang.org/en-US/friends.html)

Further reading: Edward Lee, [_The Problem with Threads_](http://dl.acm.org/citation.cfm?id=1137289), 2006.

## 5. Dimensions of Concurrency

* In _multiprogramming_ several concurrent processes may be executed by multiplexing processors. 
* In _multiprocessing_ several processors are sharing memory. 
* In _distributed processing_ there are several processors without shared memory.

_Granularity_ of atomic operations can reach from nanoseconds (for arithmetic operations) to days. For very fine-grained concurrency, the overhead of starting processes outweighs the benefit, for example, evaluating parameters of the calls `p(x + y, x - y)` in parallel.

_Coupling_ can be _loose_ or _tight_. For very tightly coupled programs, the overhead of communication and synchronization outweighs the benefit: for example, sorting an array by having each process compare and swap two adjacent elements.

Coupling between processes can be *independent*, *regular*, or *general*.
- Independent concurrency: For arrays `a`, `b`, `c`, the vector addition
```algorithm
  c := a + b
```
- Regular concurrency: For array `a` replacing each element with the average of its neighbours
```algorithm
  a[i] := (a[i - 1] + a[i + 1]) / 2
```

Some compilers, notably Fortran compilers, can recognize independent or regular concurrency and automatically generate code for vector processors that simultaneously perform the same operation on several elements, leading to _data parallelism_.

The main concern for independent and regular concurrency is *performance*; for general concurrency, it is *correctness*.

## 6. What This Course is About

The support for concurrency in programming languages is evolving. The emphasis is on
- *general concurrency* (independent and regular concurrency in courses on parallel and distributed programming),
- the *fundamentals of concurrency* and seeing how they *apply to current languages*, and
- *correctness* of concurrent programs.

These notes use _algorithmic notation_ for brevity and clarity, with implementations in Python, Java, and Go. For example, setting `x` to `1` and *in parallel* `y` to `2` is expressed as:

```algorithm
    x := 1 ‖ y := 2
```
The assignment `x := e`, also written as `x ← e`, is read `x` becomes `e` or `x` gets `e`.

To illustrate the verbosity of programming languages, here is the same in Python, with a print statement added. For both assignments, classes need to be declared, objects created, threads started, and awaited for termination. (Select the cell and run the program with control-return).

In [None]:
from threading import Thread

class SetX(Thread):
    def run(self):
        global x; x = 1

class SetY(Thread):
    def run(self):
        global y; y = 2

setX = SetX(); setY = SetY() # create new threads
setX.start(); setY.start()   # run threads
setX.join(); setY.join()     # wait for threads to finish
print(x, y)

In Java, classes also need to be declared and exceptions need to be caught (select the cell and save the file with control-return; select the next cell and run the shell commands with control-return).

In [None]:
%%writefile SetXY.java
public class SetXY {
    static int x, y;
    public static void main(String args[]) {
        class SetX extends Thread {
            public void run() {
                x = 1;
            }
        }
        class SetY extends Thread {
            public void run() {
                y = 2;
            }
        }
        Thread setX = new SetX(), setY = new SetY();
        setX.start(); setY.start();
        try {setX.join(); setY.join();}
        catch (Exception e) {};
        System.out.println(x + " " + y);
    }
}

In [None]:
!javac SetXY.java
!java SetXY

In C, a function that takes a `void *` argument and returns a `void *` argument has to be declared for each assignment statement. Threads are created and started by passing a pointer to such a function to a library function from the `Pthread` library.

In [None]:
%%writefile SetXY.c
#include <pthread.h>
#include <stdio.h>

int x, y;

void *SetX(void *arg) {
    x = 1; return NULL;
}

void *SetY(void *arg) {
    y = 2; return NULL;
}

int main(int argc, char *argv[]) {
    pthread_t setX, setY;
    pthread_create(&setX, NULL, SetX, NULL);
    pthread_create(&setY, NULL, SetY, NULL);
    pthread_join(setX, NULL);
    pthread_join(setY, NULL);
    printf("%d %d\n", x, y);
}

In [None]:
!gcc SetXY.c -lpthread -o SetXY
!./SetXY

**Question.** If you comment out the two lines containing the join statements, what can you observe?

_Answer:_  
Assuming that the variables are initialized to `0`, the output could be `0, 0`, `1, 0`, `0, 2`, or `1, 2`.

In Go, for each assignment, a function needs to be declared. Below, these functions are anonymous. The `go` keyword starts a goroutine, a kind of thread. Goroutines are not named, so awaiting the termination of a specific goroutine cannot be expressed directly. Instead, a channel is introduced. Below, a dummy value, `true`, is sent on completion in each goroutine; the main program waits for these values and discards them:

In [None]:
%%writefile SetXY.go
package main

import "fmt"

func main() {
    var x, y int
    done := make(chan bool)
    go func() {x = 1; done <- true} ()
    go func() {y = 2; done <- true} ()
    <- done; <- done
    fmt.Println(x, y)
}

In [None]:
!go run SetXY.go

We cover the main concurrency concepts:

- Nature of concurrency
- Mutual exclusion and condition synchronization
- Atomicity
- Safety, liveness, termination, deadlock, livelock, fairness
- Computer architecture and memory models
- Processes vs threads
- Critical sections
- Barrier synchronization
- Producers and consumers
- Readers and writers
- Bounded buffers
- Semaphores
- Monitors
- Message passing over synchronous and asynchronous channels
- Remote procedure call and rendezvous

The course also includes a review of the correctness of sequential and object-oriented programs, as the correctness of concurrent programs emerges as an extension.

Python and Java are used for semaphores and monitors; Go is used for message passing.

## 7. Recommended Reading

### Discrete Math

- David Gries, Fred B. Schneider, [A Logical Approach to Discrete Math](http://doi.org/10.1007/978-1-4757-3837-7), Springer, 1993.


### Concurrency

- Maurice Herlihy, Nir Shavit, Victor Luchangco, Michael Spear, [The Art of Multiprocessor Programming](https://www.sciencedirect.com/book/9780124159501/the-art-of-multiprocessor-programming"), 2nd Edition, Morgan Kaufmann, 2020: includes theory and more recent topics like transactional programming
- Allen Downey, [_The Little Book of Semaphores_](http://greenteapress.com/wp/semaphores/), 2016: free book with a trove of examples
- Gregory Andrews, [_Foundations of Multithreaded, Parallel, and Distributed Programming_](https://archive.org/details/foundationsofmul0000andr), 2000: misses some of the newer topics, but all it covers is still valuable; [homepage](https://www2.cs.arizona.edu/~greg/mpdbook)

### Operating Systems
- Remzi Arpaci-Dusseau and Andrea Arpaci-Dusseau, [_Operating Systems: Three Easy Pieces_](http://ostep.org/), 2018: concise, free book; has one piece on concurrency with one chapter on concurrency bugs; uses C with Pthread

### Python
- John Guttag, [_Introduction to Computation and Programming Using Python, Third Edition_](https://mitpress.mit.edu/9780262542364/introduction-to-computation-and-programming-using-python), 2021: focuses on methodological aspects rather than on the language; complements http://python.org

### Java
- John Guttag and Barbara Liskov, [_Program Development in Java: Abstraction, Specification, and Object-Oriented Design_](https://archive.org/details/programdevelopme0000lisk), 2001: also focuses on methodological aspects rather than on the language

### Go
- Alan Donovan and Brian Kernighan, [_The Go Programming Language_](http://www.gopl.io/), 2015: authoritative book on Go
- https://golang.org/doc/: official Go documentation
- https://github.com/golang/go/wiki/Learn: further online learning tools