## Outline

#### [Introduction](00_intro.ipynb) (Aug 26)
* Course Goals and Culture
* Parallelism
* Modern processors from a programmer's perspective

#### [Parallel Python](01_joblib.ipynb)  (Aug 28)
* A First Program
* Python parallel processes
* Concurrency versus parallelism
* The Global Interpreter Lock and python threads

#### [Strong Scaling](02_strong_scaling.ipynb) (Sep. 4)
* Amdahls Law
* Speedup
* Parallel Efficiency
    
#### [OpenMP](04_open_mp.ipynb) (Sep. 9)
* What is OpenMP (parallel C/Fortran on multicore, shared-memory architectures)
* Serial to Parallel Refactoring
* System overview
    * Preprocessor
    * Library
    * Runtime
* Example #parallel directive
    * Blocks and scoping
    * Interacting with the environment
* Reference Materials : https://hpc-tutorials.llnl.gov/openmp/

#### [CPU Parallelism: Multicore](03_moores_multicore.ipynb) (Sep. 11)
* What is a CPU? (evolution of CPUs, motivation for parallelism, Moore's Law)
* Multicore
    * Shared-memory

#### [Memory Hierarchy](05_memory_access.ipynb) (Sep 16)
* Latencies
* Cache coherency
* Row versus column order
* LM Bench
* **Read**: https://siboehm.com/articles/22/Fast-MMM-on-CPU

#### [Cilk](06_fork_join.ipynb)  (Sep. 18)
* work and span
* fork/join parallelism
* loop parallelism
* reducers
* **Read**: Work and Span https://en.wikipedia.org/wiki/Analysis_of_parallel_algorithms
* *Note*: Recorded lecture is incomplete. Missing first 10 minutes. Sorry.

#### [Loop Parallelism](07_openmp_loops.ipynb) (Sep. 23 and Sep. 25)

* Scoping and thread local variables
* Loop dependencies
* Loop fusion
* Loop fission
* Reductions in OpenMP

#### [Java Thread Programming](08_javathreads.ipynb) (Sep. 30)
  * fork/join in java
  * Thread classes and Runnable interfaces

#### [Java Synchronization and Thread Safety](09_synchronized.ipynb) (Oct. 2)
* fork/join in java
* Thread classes and Runnable interfaces
* [Example: Synchronization in Java](examples/08_ex_javasycnh.ipynb)
* **COMPLETE** the example exercise. 

#### [Mutual Exclusion](12_mutex.ipynb) (Oct. 7)
* Peterson's algorithm
* Bakery algorithm
* Fast Mutual Exclusion
* **Read**: Herlihy and Shavit. _Art of Multiprocessor Programming_. 
  * Chapter 1: all
  * Chapter 2: 2.1-2.6
  * Chapter 7: 7.1-7.3.
* [Example: Fast Mutex in Java](examples/14_ex_fastmutex.ipynb)
  * Please **COMPLETE** the exercise in the example. Then look at the solutions.
  * [Solution: Fast Mutex in Java](examples/solutions/14_ex_fastmutex_soln.ipynb)
 
#### [Roofline Performance Model](11_roofline.ipynb) (Oct. 9)
* I/O intensity
* Off-chip bandwidth
* I/O and compute limited kernels
* **Read**: Williams et al. [Roofline: An Insightful Visual Performance Model for Floating-Point Programs and Multicore Architectures](https://escholarship.org/uc/item/5tz795vq), CACM, 52(4), 2009.

#### [Instruction Level Parallelism](12_ILP.ipynb) (Oct. 14)
* Pipelines 
* Vectorization
* Other Sources (out-of-order execution, branch prediction, speculative execution)

#### [Example: Numba Jit Compilation](examples/ex_jit.ipynb) (Oct. 14 or 16)
* **Read**:
    * https://numba.pydata.org/
    * https://numba.readthedocs.io/en/stable/user/5minguide.html
* [Solutions](solutions/ex_jit_soln.ipynb)


## MIDTERM (October 21) material ends here!

All topics prior to Oct. 14 will covered on the midterm.
     
#### [Factors Against Parallelism](13_factors.ipynb) (Oct. 16)
* Interference
* Skew
* Startup costs
* Overlap
 
#### Midterm (Oct. 21)


#### [Dask Arrays](15_dask_arrays.ipynb) (Oct. 30)
  * Data parallel and declarative programming. 
  * Execution graphs 
  * Lazy evaluation
  
#### [Dask Dataframes](16_dask_dataframes.ipynb) (Nov. 4)
  * Parallel pandas
  * Slicing and aggregation
  * Indexing
    
#### [Hadoop Systems and Semantics](17_hadoop.ipynb) (November 6)
  * [Friend of Friends programming example](examples/18_ex_FoF.ipynb)

#### [Introduction to Spark](19_spark.ipynb) (Nov. 11)
  * Resilient Distributed Datasets
  * Caching
  * M/R equivalence
  * Checkpointing
  * __Reading__: [Zaharia et al. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing, NSDI, 2012](https://www.usenix.org/system/files/conference/nsdi12/nsdi12-final138.pdf)
  
#### [Spark Programming](examples/22_FoF_Spark.ipynb) (Nov. 13)
  * Spark join
  * Using memory
  
#### [Ray](23_Ray.ipynb) (Nov. 18)
* Remote functions
* Distributed objects
* Distributed memory management
* **Reading:** [P. Moritz et al. Ray: A Distributed Framework for Emerging AI Applications. OSDI, 2018.](https://arxiv.org/pdf/1712.05889.pdf)

#### [Ray Actors](24_Ray.ipynb) (Nov. 20)
  * Bulk synchronous parallel
  * barrier synchronization
  * stateful distributed objects 
  * service centers
  * ray.get() as a synchronization primitive.

#### No Class: Holiday (Nov. 25 and 27)


#### MPI (Dec. 2): Recorded lecture
  * No class.

#### All Reduce (Dec. 4)

#### TBD (Dec. 4)

#### Final Exam (Dec. 12)
  * In class (Hodson 11) 9-12am 

#### Things to Understand (Study Guide)

These examples and readings embody concepts that I think it is important to know. These are things that I want to highlight that go beyond the treatment in the lecture notes or homework.

* Row/Column example: [row_column.c](./examples/openmp/row_column.c)
* False sharing example: [sharing.c](./examples/openmp/sharing.c)
* Fast matrix multiplication example: https://siboehm.com/articles/22/Fast-MMM-on-CPU
* Cilk: examples Integral and Matrix Multiplication on [http://preview.speedcode.org/](http://preview.speedcode.org/)
* [Fast mutual exclusion example](examples/10_ex_fastmutex.ipynb)
* Roofline paper Figure 3. [Roofline: An Insightful Visual Performance Model for Floating-Point Programs and Multicore Architectures](https://escholarship.org/uc/item/5tz795vq)