## Outline

#### [Introduction](00_intro.ipynb) (Aug 30)
* Parallelism
* Concurrency versus Parallelism

#### [Parallel Python](01_joblib.ipynb)  (Aug 30)
* A First Program
* Python parallel processes
* Concurrency versus parallelism
* The Global Interpreter Lock and python threads

#### [Strong Scaling](02_strong_scaling.ipynb) (Sep. 6)
* Amdahls Law
* Speedup
* Parallel Efficiency
    
#### [CPU Parallelism: Multicore](03_moores_multicore.ipynb) (Sep. 11)
* What is a CPU? (evolution of CPUs, motivation for parallelism, Moore's Law)
* Multicore
    * Shared-memory
    
#### [OpenMP](04_open_mp.ipynb) (Sep. 11)
* What is OpenMP (parallel C/Fortran on multicore, shared-memory architectures)
* Serial to Parallel Refactoring
* System overview
    * Preprocessor
    * Library
    * Runtime
* Example #parallel directive
    * Blocks and scoping
    * Interacting with the environment
* Reference Materials : https://hpc-tutorials.llnl.gov/openmp/

#### [Memory Hierarchy](05_cache_hierarchy.ipynb) (Sep 13)
* Latencies
* Cache coherency
* Row versus column order
* LM Bench
* **Read**: https://siboehm.com/articles/22/Fast-MMM-on-CPU

#### [Cilk](06_fork_join.ipynb)  (Sep. 18)
* work and span
* fork/join parallelism
* loop parallelism
* work stealing and scheduling
* **Read**: Work and Span https://en.wikipedia.org/wiki/Analysis_of_parallel_algorithms

#### [Loop Parallelism](07_openmp_loops.ipynb) (Sep. 20)

* Scoping and thread local variables
* Loop dependencies
* Loop fusion
* Loop fission
* False sharing
* Reductions in OpenMP
    

#### [CPU Parallelism: ILP](08_ILP.ipynb) (Sep 25)
* ILP
    * Pipelines 
    * Vectorization
    * Other Sources (out-of-order execution, branch prediction, speculative execution)

#### [Exercise Numba Jit Compilation](examples/ex_jit.ipynb) (Sep. 25)
* **Read**:
    * https://numba.pydata.org/
    * https://numba.readthedocs.io/en/stable/user/5minguide.html
* [Solutions](solutions/ex_jit_soln.ipynb)

#### [Factors Against Parallelism](09_factors.ipynb) (Sep. 27)
* Interference
* Skew
* Startup costs
* Overlap

#### [Processes and Threads](10_processthread.ipynb) (Sep. 27)
* OS processes
* OS threads
* virtual memory
* simultaneous multi-threading

#### [Vector Programming](11_vectorization.ipynb) (Oct. 2)
* vector processing
* assemmbly code (overview)
* vector registers
* programming with intrinsics
* (see examples on godbolt linked in notebooks)


#### [Java Thread Programming](12_javathreads.ipynb) (Oct. 4)
  * fork/join in java
  * Thread classes and Runnable interfaces

#### [Java Synchronization and Thread Safety](13_synchronization.ipynb) (Oct. 4)
* fork/join in java
* Thread classes and Runnable interfaces
* [Example: Synchronization in Java](examples/13_ex_javasycnh.ipynb)
    * [Solution: Syncrhonization in Java](examples/solutions/13_ex_javasynch.ipynb)
    
#### [Mutual Exclusion](14_mutex.ipynb) (Oct. 9)
* Peterson's algorithm
* Bakery algorithm
* Fast Mutual Exclusion
* **Read**: Herlihy and Shavit. _Art of Multiprocessor Programming_. 
  * Chapter 1: all
  * Chapter 2: 2.1-2.6
  * Chapter 7: 7.1-7.3.
* [Example: Fast Mutex in Java](examples/14_ex_fastmutex.ipynb)
  * [Solution: Fast Mutex in Java](examples/solutions/14_ex_fastmutex_soln.ipynb)
  
#### [Roofline Performance Model](15_roofline.ipynb) (Oct. 11)
* I/O intensity
* Off-chip bandwidth
* I/O and compute limited kernels
* **Read**: Williams et al. [Roofline: An Insightful Visual Performance Model for Floating-Point Programs and Multicore Architectures](https://escholarship.org/uc/item/5tz795vq), CACM, 52(4), 2009.

#### [Distributed Memory Architectures](16_hpc_cloud.ipynb) (Oct. 16)
* Distributed Memory Programming
    * Message Passing
    * Remote Procedure Calls
* Client/Server and Service-Oriented Architectures
* Clusters and Supercomputers
* Clouds and Frameworks
    
#### [Hadoop Example](18_hadoop.ipynb) (Oct. 18)
* WordCount example
* Java programming
* Toolchain
* [FoF In-Class Example](examples/ex_FoF.ipynb)
  
#### [Hadoop Systems and Semantics](19_MRsystems.ipynb) (October 23)
  * [Friend of Friends programming example](examples/18_ex_FoF.ipynb)
  
#### [Dask Arrays](20_dask_arrays.ipynb) (Nov. 1)
  * Data parallel and declarative programming. 
  * Execution graphs 
  * Lazy evaluation
  
#### [Dask Dataframes](21_dask_dataframes.ipynb) (Nov. 1)
  * Parallel pandas
  * Slicing and aggregation
  * Indexing



#### [Introduction to Spark](22_spark.ipynb) (Nov. 6)
  * Resilient Distributed Datasets
  * Caching
  * M/R equivalence
  * Checkpointing
  * __Reading__: [Zaharia et al. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing, NSDI, 2012](https://www.usenix.org/system/files/conference/nsdi12/nsdi12-final138.pdf)
  
#### [Spark Programming](examples/22_FoF_Spark.ipynb) (Nov. 8)
  * Spark join
  * Using memory
  
#### [Ray](23_Ray.ipynb) (Nov. 13)
* Remote functions
* Distributed objects
* Distributed memory management
* **Reading:** [P. Moritz et al. Ray: A Distributed Framework for Emerging AI Applications. OSDI, 2018.](https://arxiv.org/pdf/1712.05889.pdf)

#### [Ray Actors](24_Ray.ipynb) (Nov. 15)
  * Bulk synchronous parallel
  * barrier synchronization
  * stateful distributed objects 
  * service centers
  * ray.get() as a synchronization primitive.

#### Not presented or organized


#### Tool 5: MPI
    * Send/Receive
    * Deadlock
    * Collective Operations
#### Concept 13: Deadlock
#### Concept 14: Data-Parallel Cloud Programming
    * Map/Reduce: I/O Streaming
    * In-Memory Computing
    * Parallel Python
#### Concept 15: Distributed Arrays
#### Tool 6: Dask
#### Concept 16: Distrbuted Data-Frames
#### Concept 17: Resilient-Distributed Data Structures
    * Lineage
    * Recovery
    * Checkpoints
#### Tool 7: Spark
#### Concept 18: Barriers and BSP
#### Concept 19: Remote Functions
#### Tool 8: Ray
#### Concept 20: Actors
  * TODO fix deadlock example or abandon


#### Concepts that didn't make it yet

Simultaneous multithreading

### What happened to numba?