Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added detail to codegen section #1216

Closed
wants to merge 6 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
39 changes: 29 additions & 10 deletions src/parallel-rustc.md
Expand Up @@ -15,19 +15,38 @@ use the `parking_lot` crate as well.

## Codegen

Parallel codegen occurs in the `rustc_codegen_ssa::base` module.

There are two underlying thread safe data structures used in code generation:

- `Lrc`
- Which is an [`Arc`][Arc] if `parallel_compiler` is true, and a [`Rc`][Rc]
if it is not.
- `MetadataRef` -> [`OwningRef<Box<dyn Erased + Send + Sync>, [u8]>`][OwningRef]
- This data structure is specific to `rustc`.

During [monomorphization][monomorphization] the compiler splits up all the code to
be generated into smaller chunks called _codegen units_. These are then generated by
independent instances of LLVM running in parallel. At the end, the linker
is run to combine all the codegen units together into one binary. This process
occurs in the `rustc_codegen_ssa::base` module.
- [`Arc`][Arc] if `parallel_compiler` is true
- [`Rc`][Rc] if it is not
- `MetadataRef`
- A `rustc` version of an [OwningRef][OwningRef]

First, we collect and partition the [monomorphized][monomorphization] version of the program
that has been compiled. The individual partitions are then sorted from largest to smallest.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
that has been compiled. The individual partitions are then sorted from largest to smallest.
that is being compiled. The individual partitions are then sorted from largest to smallest.

Once the partitions have been sorted, the smallest and largest halves are iterated over separately.
Their elements are paired and stored in a `Vec` so that the largest
and smallest partitions are first and second, the second largest and smallest are
third and fourth, and so on. These partitions are then translated into LLVM-IR.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is misleading to mention LLVM IR explicitly, especially now that there are 5+ rustc codegens for all kinds of IRs. It should mention that a codegen backend is invoked to translate the cgu to its particular IR. Especially because you dont mention cg_llvm, only cg_ssa, and cg_ssa does not do any llvm things.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also don't think its all that important to talk about the particularities of how the order of the CUs to translate is established. It suffices to say, I feel, that the compiler partition and order the CUs in an effort to [... whatever the goals are at the given time...]


Organizing the partitions in this way is a compromise between throughput and memory consumption.
Initially, they were sorted from largest to smallest to increase thread utilization.
This minimized the amount of idle threads, as larger units at the end meant more threads
finishing their work early and waiting for the others to finish. However, this meant that all of
the largest partitions would be in memory at the same time; increasing memory consumption and
impacting overall performance.

Once the partitions have been organized they must be translated into LLVM-IR, where they are

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same thing as above

then passed to independent instances of LLVM running in parallel. It is important to note
that if `parallel_compiler` is _not_ true, these translations can only occur on a single thread.
This creates a staircase effect where all of the LLVM threads must wait on a single
thread to generate work for them. If `parallel_compiler` _is_ true, the LLVM queue is
loaded in parallel.

At the end, the linker is ran and combines all the compiled codegen units together into one binary.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
At the end, the linker is ran and combines all the compiled codegen units together into one binary.
At the end, the linker is run and combines all the compiled codegen units together into one binary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
At the end, the linker is ran and combines all the compiled codegen units together into one binary.
At the end, the compiled codegen units together into the final artifact. For binary executable results this is done by a linker.

Executable binary is not the only kind of the output we have. rlibs, sos and os are some of the others.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nagisa I think your suggestion may be missing the words "are combined together" or similar.


## Query System

Expand Down