In [2]:
print("")



### Threads

#### Motivation
- issues with single process software
    - only one thing can happen at a time
    - if one thing takes a long time, everything else has to wait
    - if one thing crashes, everything else crashes
    - not great resource utilization
        - e.g. other CPU cores are idle
    -  creating a new process for every task is expensive
        - e.g. memory, startup time, message passing, 
        - lots of processes means lots of context switching
            - processes are good for isolation, but not for performance
            - in interdependent tasks in a single program, isolation/protection is not needed
- need a solution
    - to run multiple sequences of code for different objects
    - that shares data effectively
    - that switches between sequences of code efficiently
#### Threads
- designed for the above
- allow concurrent execution of a sequence of code
- also known as a lightweight process
    - instead of duplicating the entire process, only the minimal information needed to run the code is duplicated
        - PC, registers, stack, state, misc
        - processes also include data, heap, code regions
    - executed within a process
        - if any thread calls exit(), the entire process terminates
            - all threads share the same address space will be terminated
        - all threads share the same address space
            - can access the same data
                - no protection (but that shouldn't be needed)
            - can communicate with each other easily
    - smaller context
        - i.e. less information to save and restore from PCB
        - does not require duplicating the process's address space
            - only the stack is duplicated
        - slightly faster context switching
    - single address space for all threads in a process
    - shared data:
        - process instructions
        - most data
        - open files
        - signals and signal handlers
        - current working directory
        - user and group id
    - thread specific data:
        - thread id
        - set of registers, stack pointer
        - stack
        - thread specific data
        - signal mask
            - thread can block signals
        - scheduling properties
            - e.g. priority
        - return value
- <img src="images/threads.png" width="800px">
    - stack is partitioned for thread 1 and 2
- benefits
    - responsiveness
        - user interface can remain responsive while performing long-running tasks
        - if one thread is blocked, another thread can run
    - resource sharing
        - threads share resources of the process
            - memory, files, etc.
            - requires synchronization to avoid conflicts (e.g. two threads writing to the same file)
        - easier than shared memory or message passing
    - economy
        - creating a new thread is cheaper than creating a new process
            - less overhead
            - faster
            - less memory
            - easier
    - scalable
        - can take advantage of multiprocessor architectures
            - e.g. one thread per processor
- primary drawback
    - no inbuilt protection
        - threads share the same address space
        - one thread can easily corrupt another thread's stack
        - one thread can easily overwrite another thread's data, code, etc.

### Linux Implementation
- some flavors have different implementations or do not support threads
- most Linux supports POSIX threads (pthreads)
    - POSIX is a standard for portable operating systems
    - Pthreads is a standard IEEE POSIX C library for threads
    - can be either used at user-level or kernel level
- Pthreads API
    - thread management
        - functions to create, destroy, join, detach thread attributes
    - mutexes
        - functions to enforce synchronization
        - create, destroy, lock, unlock mutexes
    - condition variables
        - functions to manage thread communication
        - create, destroy, wait, signal, broadcast signals


```C
#include <pthread.h>
#include <stdio.h>

int sum; /* data shared by all threads */
void *runner(void *param); /* thread function prototype */

int main (int argc, char *argv[])
{
    pthread_t tid; /* thread identifier */
    pthread_attr_t attr /* set of thread attributes */

    if(atoi(argv[1]) < 0) { // atoi converts string to integer
        fprintf(stderr, “%d must be >=0\n”, atoi(argv[1]));
        return -1;
    }
    /* get the default thread attributes */
    pthread_attr_init(&attr);

    /* create the thread */
    pthread_create(&tid, &attr, runner, argv[1]); // pass tid and attr by reference, runner is the function to be executed, and argv[1] is the argument to the function runner

    /* wait for the thread to exit */
    pthread_join(tid, NULL);
    fprintf(stdout, “sum = %d\n”, sum);

    /* The thread will begin control in this function */
    void *runner (void  *param)
    {
        int i, upper = atoi(param);
        sum = 0;
        for(i=1 ; i<=upper ; i++)
            sum += i;
        pthread_exit(0);
    }
}
```
- `void *` functions are functions that return a pointer to void
    - i.e. they be typecasted to any type
- thread attributes cannot be changed after the thread is created
- API Calls
    - pthread_attr_init – initialize the thread attributes object
        - int pthread_attr_init(pthread_attr_t *attr);
        - defines the attributes of the thread created
            - scope, detach state, stack size, stack addr, scheduling policy 
    - pthread_create – create a new thread
        - int pthread_create(pthread_t *restrict thread, const pthread_attr_t *restrict attr,  			void *(*start_routine)(void*), void *restrict arg);
        - upon success, a new thread id is returned in thread
    - pthread_join – wait for thread to exit
        - int pthread_join(pthread_t thread, void **value_ptr);
        - calling process blocks until thread exits
    - pthread_exit – terminate the calling thread
        - void pthread_exit(void *value_ptr);
        - make return value available to the joining thread

# Exam Information
- wednesday, 04Oct23
- review questions have been posted
- similar format and questions to quizzes
    - some open ended questions as well
- in class review monday


### User vs Kernel Threads
- user level
    - advantages
        - efficient and flexible in space, speed, switching, and scheduling
    - disadvantages
        - one thread blocked on IO blocks all threads
        - difficult to automatically take advantage of SMP (symmetric multiprocessing)
- kernel level
    - advantages
        - removes disadvantages of user level threads
    - disadvantages
        - slower and more expensive
        - less flexible
        - more overhead
        - less portable
        - above are due to kernel involvement
    - provided by most modern general purpose operating systems
        - e.g. Linux, Windows, Solaris, etc.
### Multithreading Models
- many to one
    - many user level threads mapped to one kernel thread 
    - <img src="images/mto1.png" height="300px">
- one to one
    - one user level thread mapped to one kernel thread
    - <img src="images/1to1.png" height="150px">
    - e.g. linux pthreads (used in the lab)
- many to many
    - many user level threads mapped to many kernel threads
    - <img src="images/mtom.png" height="300px">
- two level model
    - similar to many to many but allows a user thread to be bound to a kernel thread
    - <img src="images/2level.png" height="300px">

### Threading Issues
- fork() and exec()
    - fork() duplicates the entire process
        - all threads are duplicated
        - child process has the same number of threads as the parent
        - some systems include two versions of fork()
            - one that duplicates all threads
            - one that duplicates only the calling thread
        - linux only has one version of fork()
            - duplicates only the thread that called fork()
    - exec() replaces the process's memory space
        - all threads are destroyed
        - only the thread that called exec() remains
        - linux
            - complete replacement of the process's memory space
                - i.e. all threads are overwritten
    - solution
        - call fork() then call exec() immediately after
            - only the calling thread is duplicated
- thread cancellation of target thread
    - termination of a thread before it has finished
    - asynchronous cancellation
        - terminates the target thread immediately
        - allocated resources are not freed
        - status of shared data may be ill-defined
    - deferred cancellation
        - target thread terminates itself
            - periodically checks if it should terminate
        - orderly cancellation can be easilly achieved
        - failure to check for cancellation requests may result in issues
    - linux supports both
        - pthread_cancel()
            - see man page for details
- signal handling
    - signals notify a process that an event has occurred
        - e.g. divide by zero, kill, etc.
    - signal handlers process signals
        - e.g. ignore, catch, etc.
        - OS may deliver the signal to the appropriate process
        - OS or process handles the signal
    - signal types
        - synchronous
            - generated by the process
            - something **in** your process caused the signal
                - you can point to the line of code that caused the signal
                - e.g. divide by zero, segmentation fault, etc.
                    - segfault is caused by the assembly instructions load or store
                    - if the address to be accessed is not valid, the OS sends a segfault signal
        - asynchronous
            - generated by the OS or another process
            - something **outside** of your process caused the signal
                - you cannot point to the line of code that caused the signal
                - e.g. kill, ctrl-c, etc.
    - delivery options
        - deliver to the thread to which the signal applies
            - e.g. divide by zero
        - deliver to every thread in the process
            - e.g. ctrl-c
        - deliver to certain threads in the process
        - assign a specific thread to receive all signals for the process
- implicit threading
    - correct multi-threaded programs
        - can cause latency and performance issues
        - is more difficult to write and debug
    - compilers and runtime libraries aid in creating and managing threads
        - semi-automatic parallelization
            - compiler identifies loops that can be parallelized
            - compiler generates code to create threads and manage them
            - compiler generates code to synchronize threads
    - some methods of implicit threading
        - thread pools
            - motivation
                - creating a new thread is expensive
                - overshooting the bound on concurrent threads is wasteful
            - create a number of threads at startup where they wait for work
            - number of threads is based on the number of processors
            - advantages
                - faster to service a request with an existing thread than to create a new one
                - allows the number of threads to be bound to the number of processors
        - OpenMP
            - compiler directives for an API for C, C++, and Fortran
            - supports parallel programming in shared memory environments
            - user specifies parallel regions
            - create as many threads as there are cores
            - run for loop in parallel
            - OpenMPI is like OpenMP but for distributed memory environments
                - i.e. multiple computers won't have shared memory so you need to use message passing
            - tl;dr
                - user tells compiler when, where, and how to parallelize
                - compiler handles most of the menial labor and detail work
        - Grand Central Dispatch, MS Thread Building Blocks (TBB), java.util.concurrent package
            - libraries that provide thread pools and other threading features
        - 

#### OpenMP example
```C
#include <omp.h>
#include <stdio.h>

int main(void)
{
    #pragma omp parallel
    {
        printf("In a parallel region\n");
    }
    return 0;
}
```
- should print "In a parallel region" as many times as there are cores

### Linux Threads
- referred to them as tasks
- relies on the `clone()` system call
    - `clone()` can be used to create a new process or thread 
    - flags
        - `CLONE_VM`
            - allows a child process to share the address space of the parent
        - `CLONE_FILES`
            - allows a child process to share the set of open files with the parent
        - `CLONE_FS`
            - allows a child process to share the same file system as the parent
        - `CLONE_SIGHAND`
            - allows a child process to share the same signal handlers as the parent
    - allows a child process to share the address space of the parent
### Multicore Processors
- <img src="images/microprocessortrends.png">
- multiple cores on a single chip
- motivation
    - power wall
        - power consumption increases with frequency
    - frequency scaling has hit a wall
        - heat dissipation is the primary limiting factor
            - heat generation is $\propto$ frequency$^3$
    - transistor density scaling
        - transistor density increases with time
        - transistor size decreases with time
        - transistor speed increases with time
        - transistor power consumption decreases with time
        - transistor cost decreases with time
- multicore vs multiprogramming programming
    - same-chip communication is faster
    - shared memory is faster and easier than message passing
- multicore vs multiprocessor vs multicomputer
    - multicore
        - multiple cores on a single chip
        - cores share the same memory
        - cores share the same cache
        - cores share the same bus
        - cores communicate via shared memory
    - multiprocessor
        - multiple processors on a single chip
        - processors have their own memory
        - processors have their own cache
        - processors share the same bus
        - processors communicate via message passing
    - multicomputer
        - multiple computers
        - computers have their own memory
        - computers have their own cache
        - computers have their own bus
        - computers communicate via message passing
- code must be written to utilize multiple cores
    - e.g. OpenMP
    - means that simply having more cores does not mean that your program will run faster
    - one of the primary reasons that PCs don't have more cores
        - most programs are not written to take advantage of more cores
- challenges
    - division of work
        - how to divide the work among the cores
        - how to assign work to cores
    - data splitting
        - how to divide the data among the cores
    - data dependency
        - how to handle dependencies between data and concurrent tasks
    - testing and debugging
        - how to test and debug concurrent programs