New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added detail to codegen section #1216
Changes from all commits
d335642
a0296a5
650d401
6e505b2
0a94bf7
2fd8ec9
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
|
@@ -15,19 +15,38 @@ use the `parking_lot` crate as well. | |||||||||
|
||||||||||
## Codegen | ||||||||||
|
||||||||||
Parallel codegen occurs in the `rustc_codegen_ssa::base` module. | ||||||||||
|
||||||||||
There are two underlying thread safe data structures used in code generation: | ||||||||||
|
||||||||||
- `Lrc` | ||||||||||
- Which is an [`Arc`][Arc] if `parallel_compiler` is true, and a [`Rc`][Rc] | ||||||||||
if it is not. | ||||||||||
- `MetadataRef` -> [`OwningRef<Box<dyn Erased + Send + Sync>, [u8]>`][OwningRef] | ||||||||||
- This data structure is specific to `rustc`. | ||||||||||
|
||||||||||
During [monomorphization][monomorphization] the compiler splits up all the code to | ||||||||||
be generated into smaller chunks called _codegen units_. These are then generated by | ||||||||||
independent instances of LLVM running in parallel. At the end, the linker | ||||||||||
is run to combine all the codegen units together into one binary. This process | ||||||||||
occurs in the `rustc_codegen_ssa::base` module. | ||||||||||
- [`Arc`][Arc] if `parallel_compiler` is true | ||||||||||
- [`Rc`][Rc] if it is not | ||||||||||
- `MetadataRef` | ||||||||||
- A `rustc` version of an [OwningRef][OwningRef] | ||||||||||
|
||||||||||
First, we collect and partition the [monomorphized][monomorphization] version of the program | ||||||||||
that has been compiled. The individual partitions are then sorted from largest to smallest. | ||||||||||
Once the partitions have been sorted, the smallest and largest halves are iterated over separately. | ||||||||||
Their elements are paired and stored in a `Vec` so that the largest | ||||||||||
and smallest partitions are first and second, the second largest and smallest are | ||||||||||
third and fourth, and so on. These partitions are then translated into LLVM-IR. | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it is misleading to mention LLVM IR explicitly, especially now that there are 5+ rustc codegens for all kinds of IRs. It should mention that a codegen backend is invoked to translate the cgu to its particular IR. Especially because you dont mention cg_llvm, only cg_ssa, and cg_ssa does not do any llvm things. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I also don't think its all that important to talk about the particularities of how the order of the CUs to translate is established. It suffices to say, I feel, that the compiler partition and order the CUs in an effort to [... whatever the goals are at the given time...] |
||||||||||
|
||||||||||
Organizing the partitions in this way is a compromise between throughput and memory consumption. | ||||||||||
Initially, they were sorted from largest to smallest to increase thread utilization. | ||||||||||
This minimized the amount of idle threads, as larger units at the end meant more threads | ||||||||||
finishing their work early and waiting for the others to finish. However, this meant that all of | ||||||||||
the largest partitions would be in memory at the same time; increasing memory consumption and | ||||||||||
impacting overall performance. | ||||||||||
|
||||||||||
Once the partitions have been organized they must be translated into LLVM-IR, where they are | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. same thing as above |
||||||||||
then passed to independent instances of LLVM running in parallel. It is important to note | ||||||||||
that if `parallel_compiler` is _not_ true, these translations can only occur on a single thread. | ||||||||||
This creates a staircase effect where all of the LLVM threads must wait on a single | ||||||||||
thread to generate work for them. If `parallel_compiler` _is_ true, the LLVM queue is | ||||||||||
loaded in parallel. | ||||||||||
|
||||||||||
At the end, the linker is ran and combines all the compiled codegen units together into one binary. | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit:
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
Executable binary is not the only kind of the output we have. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @nagisa I think your suggestion may be missing the words "are combined together" or similar. |
||||||||||
|
||||||||||
## Query System | ||||||||||
|
||||||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.