rust-lang · Sl1mb0 · Sep 21, 2021 · Sep 22, 2021 · Sep 22, 2021 · Sep 22, 2021
diff --git a/src/parallel-rustc.md b/src/parallel-rustc.md
@@ -15,19 +15,38 @@ use the `parking_lot` crate as well.
 
 ## Codegen
 
+Parallel codegen occurs in the `rustc_codegen_ssa::base` module.
+
 There are two underlying thread safe data structures used in code generation:
 
 - `Lrc`
-    -  Which is an [`Arc`][Arc] if `parallel_compiler` is true, and a [`Rc`][Rc]
-       if it is not.
-- `MetadataRef` -> [`OwningRef<Box<dyn Erased + Send + Sync>, [u8]>`][OwningRef]
-    - This data structure is specific to `rustc`.
-
-During [monomorphization][monomorphization] the compiler splits up all the code to 
-be generated into smaller chunks called _codegen units_. These are then generated by 
-independent instances of LLVM running in parallel. At the end, the linker 
-is run to combine all the codegen units together into one binary. This process
-occurs in the `rustc_codegen_ssa::base` module.
+    -  [`Arc`][Arc] if `parallel_compiler` is true
+    -  [`Rc`][Rc] if it is not
+- `MetadataRef`
+    - A `rustc` version of an [OwningRef][OwningRef]
+
+First, we collect and partition the [monomorphized][monomorphization] version of the program
+that has been compiled. The individual partitions are then sorted from largest to smallest.
-that has been compiled. The individual partitions are then sorted from largest to smallest.
+that is being compiled. The individual partitions are then sorted from largest to smallest.
-that has been compiled. The individual partitions are then sorted from largest to smallest.
+that is being compiled. The individual partitions are then sorted from largest to smallest.
+Once the partitions have been sorted, the smallest and largest halves are iterated over separately.
+Their elements are paired and stored in a `Vec` so that the largest
+and smallest partitions are first and second, the second largest and smallest are
+third and fourth, and so on. These partitions are then translated into LLVM-IR.
+
+Organizing the partitions in this way is a compromise between throughput and memory consumption.
+Initially, they were sorted from largest to smallest to increase thread utilization.
+This minimized the amount of idle threads, as larger units at the end meant more threads
+finishing their work early and waiting for the others to finish. However, this meant that all of
+the largest partitions would be in memory at the same time; increasing memory consumption and
+impacting overall performance.
+
+Once the partitions have been organized they must be translated into LLVM-IR, where they are
+then passed to independent instances of LLVM running in parallel. It is important to note
+that if `parallel_compiler` is _not_ true, these translations can only occur on a single thread.
+This creates a staircase effect where all of the LLVM threads must wait on a single 
+thread to generate work for them. If `parallel_compiler` _is_ true, the LLVM queue is 
+loaded in parallel.
+
+At the end, the linker is ran and combines all the compiled codegen units together into one binary.
-At the end, the linker is ran and combines all the compiled codegen units together into one binary.
+At the end, the linker is run and combines all the compiled codegen units together into one binary.
-At the end, the linker is ran and combines all the compiled codegen units together into one binary.
+At the end, the compiled codegen units together into the final artifact. For binary executable results this is done by a linker.
-At the end, the linker is ran and combines all the compiled codegen units together into one binary.
+At the end, the linker is run and combines all the compiled codegen units together into one binary.
-At the end, the linker is ran and combines all the compiled codegen units together into one binary.
+At the end, the compiled codegen units together into the final artifact. For binary executable results this is done by a linker.
 
 ## Query System