# Process, Threads And Containers

Processes, threads and containers are abstractions created to enable multiple tasks to be carried out at the same time. This enable concurrency. Parallelism means to make use of multiple CPU cores at the same time.

It is possible to have a concurrent system on a single CPU core. Threads which are waiting for data from memory and I/O can be set to a blocked state. Blocked threads are resceduled when their data is available.

## Lambda Functions And Closures

Lambda functions are anonymous functions which are denoted by a pair of `||` followed by curly brackets. The vertical bars lets you define arguments. Lambda functions that read variables from within their scope are called `closures`. Unlike regular functions, lambda function cannot be defined in global scope.

In [5]:
fn add(a: i32, b: i32) -> i32 {
  a + b
}

fn demo_main() {
  let lambda_add = |a, b| { a + b };

  assert_eq!(add(4,5), lambda_add(4,5));
}

demo_main();

## Threads In Rust

To spawn a thread in rust we pass an anonymous function to `std::thread::spawn()`. When the spawned thread wants to access the variables in the parent scope, called a capture, rust want that captures be moved into the closure. It wants the closures to take the ownership using `move` keyword. This is because Closures spawned in subthreads can potentially outlive their calling scope so to make sure the accessing the data is valid the ownership has to be moved to the closure itself.

In [18]:
use std::{thread, time};

let start = time::Instant::now();

let handler = thread::spawn(|| {
  let pause = time::Duration::from_millis(300);
  thread::sleep(pause.clone());
});

handler.join().unwrap();

let finish = time::Instant::now();

println!("{:02?}", finish.duration_since(start));

300.311701ms


The `join` keyword is used to wait for the other thread to finish. It instructs the OS to defer scheduling calling the thread until the other thread finishes.

## Effect Of Spawning Thread

1. Every thread requires its own memory and if we keep creating threads we can exhaust the system's memory. 

2. As the number of threads to schedule increases, the OS scheduler's work increases. Deciding which thread to schedule takes more time.

3. Switching between threads also invalidates caches.

> CPU-intensive multithreading doesn't scale well past the number of physical cores.

## Plotting The Cost Of Threads

1. To get an idea of the CPU load associated with using threads, let's plot the time taken to start and finish 500 threads that are put to sleep for 20 secs.

In [4]:
use std::{thread, time};

for n in 1..501 {
  let mut handlers: Vec<thread::JoinHandle<()>> = Vec::with_capacity(n);

  let start = time::Instant::now();

  for _m in 0..n {
    let handle = thread::spawn(|| {
      let pause = time::Duration::from_millis(20);
      thread::sleep(pause);
    });
    handlers.push(handle);
  }

  while let Some(handle) = handlers.pop() {
    handle.join();
  }

  let finish = time::Instant::now();
  println!("{}\t{:02?}", n, finish.duration_since(start));
}; // to suppress the output

1	20.310102ms
2	20.394302ms
3	20.548802ms
4	20.547402ms
5	20.646703ms
6	20.729402ms
7	20.864102ms
8	20.911802ms
9	20.617102ms
10	20.759902ms
11	20.793902ms
12	20.785003ms
13	20.903302ms
14	20.809102ms
15	20.845702ms
16	21.376402ms
17	21.387303ms
18	21.142202ms
19	20.655202ms
20	20.742802ms
21	20.698502ms
22	20.799102ms
23	21.674603ms
24	21.040502ms
25	21.366502ms
26	21.226402ms
27	21.850602ms
28	21.182903ms
29	21.579402ms
30	21.733402ms
31	21.030902ms
32	21.763202ms
33	22.044903ms
34	21.389502ms
35	21.100002ms
36	21.212503ms
37	21.304702ms
38	22.316603ms
39	21.710102ms
40	21.536003ms
41	21.422202ms
42	22.928003ms
43	21.784802ms
44	21.572903ms
45	23.348102ms
46	21.830903ms
47	21.578202ms
48	22.821203ms
49	22.059602ms
50	21.746803ms
51	23.418202ms
52	21.778903ms
53	22.037102ms
54	22.989503ms
55	22.015603ms
56	23.535102ms
57	21.982403ms
58	23.173402ms
59	22.167603ms
60	23.007703ms
61	22.441602ms
62	22.041703ms
63	23.412602ms
64	22.162803ms
65	23.512902ms
66	24.174803ms
67	22.520803ms
68	2

2. Now we try the same thing but instead of sleep we use spin loops.

In [5]:
use std::{thread, time};

for n in 1..501 {
  let mut handlers: Vec<thread::JoinHandle<()>> = Vec::with_capacity(n);

  let start = time::Instant::now();

  for _m in 0..n {
    let handle = thread::spawn(|| {
      let start = time::Instant::now();
      let pause = time::Duration::from_millis(20);
      while start.elapsed() < pause {
        thread::yield_now();
      }
    });
    handlers.push(handle);
  }

  while let Some(handle) = handlers.pop() {
    handle.join();
  }

  let finish = time::Instant::now();
  println!("{}\t{:02?}", n, finish.duration_since(start));
}; // to suppress the output

1	20.188502ms
2	20.199303ms
3	20.374102ms
4	20.728302ms
5	20.525103ms
6	20.703102ms
7	20.634302ms
8	20.663102ms
9	20.764703ms
10	21.537502ms
11	20.635902ms
12	22.343103ms
13	22.964703ms
14	21.308202ms
15	25.630503ms
16	21.209002ms
17	40.607205ms
18	37.829904ms
19	37.875804ms
20	37.670204ms
21	37.694304ms
22	37.305605ms
23	40.428604ms
24	31.846904ms
25	39.016604ms
26	37.749904ms
27	29.851104ms
28	29.807303ms
29	32.194003ms
30	37.817604ms
31	37.862204ms
32	37.922004ms
33	37.919204ms
34	37.989205ms
35	31.618903ms
36	38.198504ms
37	37.935705ms
38	38.032704ms
39	38.435304ms
40	32.620604ms
41	30.787503ms
42	41.690305ms
43	38.285904ms
44	40.699505ms
45	40.862304ms
46	41.092205ms
47	41.929904ms
48	41.319505ms
49	41.553405ms
50	28.246203ms
51	32.821004ms
52	40.929904ms
53	41.004805ms
54	41.004104ms
55	41.255805ms
56	60.351407ms
57	43.659304ms
58	41.339605ms
59	41.935805ms
60	41.353704ms
61	41.306205ms
62	37.549904ms
63	41.576105ms
64	41.323604ms
65	34.378804ms
66	41.468305ms
67	41.726504ms
68	4

We can see that the time taken to process each thread increases significantly because the scheduler has more work to do.

**Note**: We have a used a while loop to iterate over the handlers because, once we join a thread it doesn't exist anymore and Rust doesn't allow holding a reference of something that doesn't exist. Therefore, to call `join()` on a handler, it must be removed from `handlers`. But the `for` loop doesn't allow modifications to the data being iterated over. Hence we use a while loop to repeatedly gain mutable access when calling `handlerss.pop()`.

### Yielding Control With `thread::yield_now()`

`std::thread::yield_now()` is a signal to the OS that the current thread should be unscheduled. This allows processing time for the other threads while it's still waiting for the 20ms to arrive. The downside to yielding is that we don't know if we'll be able to resume at exactly 20ms.

## Sharing Variables In Threads

Notice in the above examples we were creating a separate `pause` variable for each thread. When we try sharing the variable we get the following error. 

In [6]:
use std::{thread, time};

let pause = time::Duration::from_millis(20);

let handle1 = thread::spawn(|| {
  thread::sleep(pause);
});

let handle2 = thread::spawn(|| {
  thread::sleep(pause);
});

handle1.join();
handle2.join();

Error: closure may outlive the current function, but it borrows `pause`, which is owned by the current function

Error: closure may outlive the current function, but it borrows `pause`, which is owned by the current function

To fix this we use the `move` keyword when the closures are created.

In [7]:
use std::{thread, time};

let pause = time::Duration::from_millis(20);

let handle1 = thread::spawn(move || {
  thread::sleep(pause);
});

let handle2 = thread::spawn(move || {
  thread::sleep(pause);
});

handle1.join();
handle2.join();

## Differences Between Closures And Functions

- Closures and Functions have different internal representations. Closures implement the `std::ops::FnOnce` trait and potentially `std::ops::Fn` and `std::ops::Fnmut`. These structs contain any variables from the closure's environment that are used inside it.

- Functions are implemented as pointers as function pointers. A **function pointer** points to code, not data. Here code means computer memory that has been marked as executable.

- Closures that do not enclose any variables from their environment are also function pointers.

## Functional Programming Style: The `map` Function

Consider these two functions that do the same job of parsing an input `str` and returning a vector.

In [10]:
use crate::Operation::{Forward, Home, Noop, TurnLeft, TurnRight};

enum Operation {
  Forward(isize),
  TurnLeft,
  TurnRight,
  Home,
  Noop(u8),
}

const HEIGHT: isize = 200;

In [11]:
fn parse(input: &str) -> Vec<Operation> {
  let mut steps = Vec::<Operation>::new(); // temporary variable
  for byte in input.bytes() {
    let step = match byte {
        b'0' => Home,
        b'1'..=b'9' => {
            let distance = (byte - 0x30) as isize;
            Forward(distance * (HEIGHT / 10))
        }
        b'a' | b'b' | b'c' => TurnLeft,
        b'd' | b'e' | b'f' => TurnRight,
        _ => Noop(byte), 
    };
    steps.push(step);
}
steps
}

In [12]:
fn parse(input: &str) -> Vec<Operation> {
  input.bytes().map(|byte| {
    match byte {
      b'0' => Home,
        b'1'..=b'9' => {
            let distance = (byte - 0x30) as isize;
            Forward(distance * (HEIGHT / 10))
        }
        b'a' | b'b' | b'c' => TurnLeft,
        b'd' | b'e' | b'f' => TurnRight,
        _ => Noop(byte), 
    }
  }).collect()
}

The second function is shorter, more declarative, and closer to idiomatic Rust. Using `map` and `collect` has the following benefits:

1. No need to create temporary variable `steps`.

2. It has provided more opportunities for the Rust compiler to optimize your code’s execution. Iterators are an efficient abstraction. Working with their methods directly allows the Rust compiler to create optimal code that takes up minimal memory.

3. `map()` also returns an iterator. This allows many transformations to be chained together. Significantly, although `map()` may appear in multiple places in your source code, Rust often optimizes those function calls away in the compiled binary.

4. When your code uses `for` loops, you restrict the number of places where the compiler can make decisions. Iterators provide an opportunity for you to delegate more work to the compiler. This ability to delegate is what  unlocks parallelism.

## Using Channels As A Task Queue

Channels can be used as a task queue because multiple items can be sent, even if a receiver is not ready to receive any messages. They have two ends: sending and receiving.

> By convention, from radio and telegraph operators, the Sender is called `tx` (shorthand for transmission ) and the Receiver is called `rx`.

### Bounded And Unbounded Queues From `crossbeam` Crate

The standard library provides a channels implementation but crossbeam provides slightly more features. It includes both bounded queues and unbounded queues. A bounded queue applies **back pressure** under contention, preventing the consumer from becoming overloaded. Bounded queues (of fixed-width types) have deterministic maximum memory usage. These do have one negative characteristic, though. They force queue producers to wait until a space is available. This can make bounded queues unsuitable for asynchronous messages, which cannot tolerate waiting.

In [4]:
use std::thread;
use std::sync::mpsc::channel;#

let (tx, rx) = channel();
thread::spawn(move|| {
  tx.send(42).unwrap();
});

assert_eq!(rx.recv().unwrap(), 42);

Channels can be seen as how network protocols work. Over the wire however only the type `[u8]` is available. The bytestream needs to be parsed and validated before its contents can be interpreted.

Channels are richer than simply streaming bytes. A byte stream is opaque and requires parsing to have structure extracted out of it. Channels offer the full power of Rust’s type system. Using `enum` for messages offers exhaustiveness testing for robustness and has a compact internal representation.

## Two Way Communication Using Channels

Bi-directional (duplex) communication is awkward to model with a single channel. An approach that’s simpler to work with is to create two sets of senders and receivers, one for each direction.

In [18]:
#[derive(Debug)]
enum ConnectivityCheck { // Defining a message type to pass msgs through channels
  Ping,
  Pong,
  Pang
}

In [22]:
use std::sync::mpsc::{Sender, Receiver};
use crate::ConnectivityCheck::{Ping, Pong, Pang};

let n_msgs = 3;
let (requests_tx, requests_rx): (Sender<ConnectivityCheck>, Receiver<ConnectivityCheck>) = channel();
let (responses_tx, responses_rx): (Sender<ConnectivityCheck>, Receiver<ConnectivityCheck>) = channel();

thread::spawn(move || loop { // Because all control flow is an expression, Rust allows the loop keyword here.
  match requests_rx.recv().unwrap() {
    Pong => eprintln!("unexpected pong response"),
    Ping => responses_tx.send(Pong).unwrap(),
    Pang => return,
  }
});

for _ in 0..n_msgs {
  requests_tx.send(Ping).unwrap();
}

requests_tx.send(Pang).unwrap();

for _ in 0..n_msgs {
  println!("{:?}", responses_rx.recv().unwrap());
}; // ; to suppress output


Pong
Pong
Pong


## Terms Related To Concurrency

- Program - A program, or application, is a name that we use to refer to a software package. When we execute a program, the OS creates a process.

- Executable - A file that can be loaded into memory and then run. Running an executable means creating a process and a thread for it, then changing the CPU’s instruction pointer to the first instruction of the executable.

- Task - Task cab be used for a number of things. When discussing processes, a task is one of the process’s threads. When referring to a thread, a task might be a function call. When referring to an OS, a task might be a running program, which might be comprised of multiple processes.

- Process - Running programs execute as processes. A process has its own virtual address space, at least one thread, and lots of bookkeeping managed by the OS. File descriptors, environment variables, and scheduling priorities are managed per process. Each process is started with a single thread, often called the primary thread, but can create additional threads from any of its threads. Running programs begin their life as a single process, but it isn’t uncommon to spawn subprocesses to do the work.

- Thread of execution - A sequence of CPU instructions that appear in serial. Multiple threads can run concurrently, but instructions within the sequence are intended to be executed one after another.

- Coroutine - Also known as fibre, green thread, and lightweight thread, a coroutine indicates tasks that switch within a thread. Switching between tasks becomes the responsibility of the program itself, rather than the OS.

- Non-blocking I/O - Normally a thread is unscheduled when it asks for data from I/O devices like the network. The thread is marked as blocked, while it waits for data to arrive. When programming with non-blocking I/O, the thread can continue executing even while it waits for data. 

- Asynchronous programming - Asynchronous programming describes programming for cases where the control flow is not predetermined. Instead, events outside the control of the program itself impact the sequence of what is executed. Those events are typically related to I/O, such as a device driver signalling that it is ready, or are related to functions returning in another thread. The asynchronous programming model is typically more complicated for the developer, but results in a faster runtime for I/O-heavy workloads. Speed increases because there are fewer system calls. This implies fewer context switches between the user space and the kernel space.

## Isolation

Isolated tasks cannot interfere with each other like corrupting memory, saturating the network, and congestion when saving to disk. Isolated tasks cannot access each other’s data without permission. Independent threads in the same process share a memory address space, and all threads have equal access to data within that space. Processes, however, are prohibited from inspecting each other’s memory.

Isolated tasks cannot cause another task to crash. A failure in one task should not cascade into other systems. If a process induces a kernel panic, all processes are shut down. By conducting work in virtual machines, tasks can proceed even when other tasks are unstable.

## Threads

A thread is the lowest level of isolation that an OS understands. The OS can schedule threads. For things like coroutines, fibers, and green threads switching between tasks is managed by the process itself. The OS is ignorant of the fact that a program is processing multiple tasks. For threads and other forms of concurrency, context switching is required.

## Context Switch

Switching between tasks at the same level of virtualization is known as a context switch. For threads to switch, CPU registers need to be cleared, CPU caches might need to be flushed, and variables within the OS need to be reset. As isolation increases, so does the cost of the context switch.

CPUs can only execute instructions in serial. To do more than one task, a computer, for example, needs to be able to press the Save Game button, switch to a new task, and resume at that task’s saved spot. The CPU is save scum.

Why is the CPU constantly switching tasks? Because it has so much time available. Programs often need to access data from memory, disk, or the network. Because waiting for data is incredibly slow, there’s often sufficient time to do something else in the meantime.

## Processes

Threads exist within a process. The distinguishing characteristic of a process is that its memory is independent from other processes. The OS, in conjunction with the CPU, protects a process’s memory from all others.

To share data between processes you need some support from the OS. For this, reusing network sockets is common. Most operating systems provide specialized forms of interprocess communication (IPC), which are faster, while being less portable.

## Task And Process In WebAssembly

WebAssembly (Wasm) isolates tasks within the process boundary itself. It’s impossible for tasks running inside a Wasm module to access memory available to other tasks. Originating in web browsers, Wasm treats all code as potentially hostile. Wasm modules are given access to address spaces within your process’s address space. Wasm address spaces are called linear memory. Runtime interprets any request for data within linear memory and makes its own request to the actual virtual memory. Code within the Wasm module is unaware of any memory addresses that the process has access to.

## Containers

Containers are extensions to processes with further isolation provided by the OS. Processes share the same filesystem, whereas containers have a filesystem created for them. The same is true for other resources, such as the network. Rather than address space, the term used for protections covering these other resources is namespaces.

## Do All Applications Need An OS?

It’s possible to run an application as its own OS. The general term for an application that runs without an OS is to describe it as freestanding—freestanding in the sense that it does not require the support of an OS. Freestanding binaries are used by embedded software developers when there is no OS to rely on.

Using freestanding binaries can involve significant limitations, though. Without an OS, applications no longer have virtual memory or multithreading. All of those concerns become your application’s concerns. To reach a middle ground, it is possible to compile a unikernel. A unikernel is a minimal OS paired with a single application. The compilation process strips out everything from the OS that isn’t used by the application that’s being deployed.