From 09862a7c00442d49a2f6c8472dc4117d6356820b Mon Sep 17 00:00:00 2001 From: Tshepang Mbambo Date: Sat, 29 Nov 2025 02:05:46 +0200 Subject: [PATCH 1/2] sembr backend/libs-and-metadata.md --- src/backend/libs-and-metadata.md | 146 ++++++++++++++++--------------- 1 file changed, 77 insertions(+), 69 deletions(-) diff --git a/src/backend/libs-and-metadata.md b/src/backend/libs-and-metadata.md index 44b647d28..4cf0dfa6a 100644 --- a/src/backend/libs-and-metadata.md +++ b/src/backend/libs-and-metadata.md @@ -1,36 +1,41 @@ # Libraries and metadata When the compiler sees a reference to an external crate, it needs to load some -information about that crate. This chapter gives an overview of that process, +information about that crate. +This chapter gives an overview of that process, and the supported file formats for crate libraries. ## Libraries -A crate dependency can be loaded from an `rlib`, `dylib`, or `rmeta` file. A -key point of these file formats is that they contain `rustc`-specific -[*metadata*](#metadata). This metadata allows the compiler to discover enough +A crate dependency can be loaded from an `rlib`, `dylib`, or `rmeta` file. +A key point of these file formats is that they contain `rustc`-specific +[*metadata*](#metadata). +This metadata allows the compiler to discover enough information about the external crate to understand the items it contains, which macros it exports, and *much* more. ### rlib -An `rlib` is an [archive file], which is similar to a tar file. This file -format is specific to `rustc`, and may change over time. This file contains: +An `rlib` is an [archive file], which is similar to a tar file. +This file format is specific to `rustc`, and may change over time. +This file contains: -* Object code, which is the result of code generation. This is used during - regular linking. There is a separate `.o` file for each [codegen unit]. The - codegen step can be skipped with the [`-C +* Object code, which is the result of code generation. + This is used during regular linking. + There is a separate `.o` file for each [codegen unit]. + The codegen step can be skipped with the [`-C linker-plugin-lto`][linker-plugin-lto] CLI option, which means each `.o` file will only contain LLVM bitcode. * [LLVM bitcode], which is a binary representation of LLVM's intermediate - representation, which is embedded as a section in the `.o` files. This can - be used for [Link Time Optimization] (LTO). This can be removed with the + representation, which is embedded as a section in the `.o` files. + This can be used for [Link Time Optimization] (LTO). + This can be removed with the [`-C embed-bitcode=no`][embed-bitcode] CLI option to improve compile times and reduce disk space if LTO is not needed. * `rustc` [metadata], in a file named `lib.rmeta`. * A symbol table, which is essentially a list of symbols with offsets to the - object files that contain that symbol. This is pretty standard for archive - files. + object files that contain that symbol. + This is pretty standard for archive files. [archive file]: https://en.wikipedia.org/wiki/Ar_(Unix) [LLVM bitcode]: https://llvm.org/docs/BitCodeFormat.html @@ -41,46 +46,46 @@ format is specific to `rustc`, and may change over time. This file contains: ### dylib -A `dylib` is a platform-specific shared library. It includes the `rustc` -[metadata] in a special link section called `.rustc`. +A `dylib` is a platform-specific shared library. +It includes the `rustc` [metadata] in a special link section called `.rustc`. ### rmeta -An `rmeta` file is a custom binary format that contains the [metadata] for the -crate. This file can be used for fast "checks" of a project by skipping all code +An `rmeta` file is a custom binary format that contains the [metadata] for the crate. +This file can be used for fast "checks" of a project by skipping all code generation (as is done with `cargo check`), collecting enough information for documentation (as is done with `cargo doc`), or for [pipelining](#pipelining). This file is created if the [`--emit=metadata`][emit] CLI option is used. -`rmeta` files do not support linking, since they do not contain compiled -object files. +`rmeta` files do not support linking, since they do not contain compiled object files. [emit]: https://doc.rust-lang.org/rustc/command-line-arguments.html#option-emit ## Metadata -The metadata contains a wide swath of different elements. This guide will not go -into detail about every field it contains. You are encouraged to browse the +The metadata contains a wide swath of different elements. +This guide will not go into detail about every field it contains. +You are encouraged to browse the [`CrateRoot`] definition to get a sense of the different elements it contains. -Everything about metadata encoding and decoding is in the [`rustc_metadata`] -package. +Everything about metadata encoding and decoding is in the [`rustc_metadata`] package. Here are a few highlights of things it contains: -* The version of the `rustc` compiler. The compiler will refuse to load files - from any other version. -* The [Strict Version Hash](#strict-version-hash) (SVH). This helps ensure the - correct dependency is loaded. -* The [Stable Crate Id](#stable-crate-id). This is a hash used - to identify crates. -* Information about all the source files in the library. This can be used for - a variety of things, such as diagnostics pointing to sources in a +* The version of the `rustc` compiler. + The compiler will refuse to load files from any other version. +* The [Strict Version Hash](#strict-version-hash) (SVH). + This helps ensure the correct dependency is loaded. +* The [Stable Crate Id](#stable-crate-id). + This is a hash used to identify crates. +* Information about all the source files in the library. + This can be used for a variety of things, such as diagnostics pointing to sources in a dependency. -* Information about exported macros, traits, types, and items. Generally, - anything that's needed to be known when a path references something inside a - crate dependency. -* Encoded [MIR]. This is optional, and only encoded if needed for code - generation. `cargo check` skips this for performance reasons. +* Information about exported macros, traits, types, and items. + Generally, + anything that's needed to be known when a path references something inside a crate dependency. +* Encoded [MIR]. + This is optional, and only encoded if needed for code generation. + `cargo check` skips this for performance reasons. [`CrateRoot`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_metadata/rmeta/struct.CrateRoot.html [`rustc_metadata`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_metadata/index.html @@ -89,10 +94,10 @@ Here are a few highlights of things it contains: ### Strict Version Hash The Strict Version Hash ([SVH], also known as the "crate hash") is a 64-bit -hash that is used to ensure that the correct crate dependencies are loaded. It -is possible for a directory to contain multiple copies of the same dependency -built with different settings, or built from different sources. The crate -loader will skip any crates that have the wrong SVH. +hash that is used to ensure that the correct crate dependencies are loaded. +It is possible for a directory to contain multiple copies of the same dependency +built with different settings, or built from different sources. +The crate loader will skip any crates that have the wrong SVH. The SVH is also used for the [incremental compilation] session filename, though that usage is mostly historic. @@ -114,14 +119,15 @@ See [`compute_hir_hash`] for where the hash is actually computed. ### Stable Crate Id The [`StableCrateId`] is a 64-bit hash used to identify different crates with -potentially the same name. It is a hash of the crate name and all the -[`-C metadata`] CLI options computed in [`StableCrateId::new`]. It is -used in a variety of places, such as symbol name mangling, crate loading, and +potentially the same name. +It is a hash of the crate name and all the +[`-C metadata`] CLI options computed in [`StableCrateId::new`]. +It is used in a variety of places, such as symbol name mangling, crate loading, and much more. By default, all Rust symbols are mangled and incorporate the stable crate id. -This allows multiple versions of the same crate to be included together. Cargo -automatically generates `-C metadata` hashes based on a variety of factors, like +This allows multiple versions of the same crate to be included together. +Cargo automatically generates `-C metadata` hashes based on a variety of factors, like the package version, source, and target kind (a lib and test can have the same crate name, so they need to be disambiguated). @@ -131,30 +137,31 @@ crate name, so they need to be disambiguated). ## Crate loading -Crate loading can have quite a few subtle complexities. During [name -resolution], when an external crate is referenced (via an `extern crate` or +Crate loading can have quite a few subtle complexities. +During [name resolution], when an external crate is referenced (via an `extern crate` or path), the resolver uses the [`CStore`] which is responsible for finding -the crate libraries and loading the [metadata] for them. After the dependency -is loaded, the `CStore` will provide the information the resolver needs +the crate libraries and loading the [metadata] for them. +After the dependency is loaded, the `CStore` will provide the information the resolver needs to perform its job (such as expanding macros, resolving paths, etc.). To load each external crate, the `CStore` uses a [`CrateLocator`] to -actually find the correct files for one specific crate. There is some great -documentation in the [`locator`] module that goes into detail on how loading +actually find the correct files for one specific crate. +There is some great documentation in the [`locator`] module that goes into detail on how loading works, and I strongly suggest reading it to get the full picture. -The location of a dependency can come from several different places. Direct -dependencies are usually passed with `--extern` flags, and the loader can look -at those directly. Direct dependencies often have references to their own -dependencies, which need to be loaded, too. These are usually found by +The location of a dependency can come from several different places. +Direct dependencies are usually passed with `--extern` flags, and the loader can look +at those directly. +Direct dependencies often have references to their own dependencies, which need to be loaded, too. +These are usually found by scanning the directories passed with the `-L` flag for any file whose metadata -contains a matching crate name and [SVH](#strict-version-hash). The loader -will also look at the [sysroot] to find dependencies. +contains a matching crate name and [SVH](#strict-version-hash). +The loader will also look at the [sysroot] to find dependencies. As crates are loaded, they are kept in the [`CStore`] with the crate metadata -wrapped in the [`CrateMetadata`] struct. After resolution and expansion, the -`CStore` will make its way into the [`GlobalCtxt`] for the rest of the -compilation. +wrapped in the [`CrateMetadata`] struct. +After resolution and expansion, the +`CStore` will make its way into the [`GlobalCtxt`] for the rest of the compilation. [name resolution]: ../name-resolution.md [`CrateLocator`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_metadata/locator/struct.CrateLocator.html @@ -167,20 +174,21 @@ compilation. ## Pipelining One trick to improve compile times is to start building a crate as soon as the -metadata for its dependencies is available. For a library, there is no need to -wait for the code generation of dependencies to finish. Cargo implements this -technique by telling `rustc` to emit an [`rmeta`](#rmeta) file for each -dependency as well as an [`rlib`](#rlib). As early as it can, `rustc` will -save the `rmeta` file to disk before it continues to the code generation -phase. The compiler sends a JSON message to let the build tool know that it +metadata for its dependencies is available. +For a library, there is no need to wait for the code generation of dependencies to finish. +Cargo implements this technique by telling `rustc` to emit an [`rmeta`](#rmeta) file for each +dependency as well as an [`rlib`](#rlib). +As early as it can, `rustc` will +save the `rmeta` file to disk before it continues to the code generation phase. +The compiler sends a JSON message to let the build tool know that it can start building the next crate if possible. The [crate loading](#crate-loading) system is smart enough to know when it -sees an `rmeta` file to use that if the `rlib` is not there (or has only been -partially written). +sees an `rmeta` file to use that if the `rlib` is not there (or has only been partially written). This pipelining isn't possible for binaries, because the linking phase will -require the code generation of all its dependencies. In the future, it may be +require the code generation of all its dependencies. +In the future, it may be possible to further improve this scenario by splitting linking into a separate command (see [#64191]). From caafc24171530b2ee1321a19a69796f354325e32 Mon Sep 17 00:00:00 2001 From: Tshepang Mbambo Date: Sat, 29 Nov 2025 02:24:02 +0200 Subject: [PATCH 2/2] link text spanning separate lines is awkward --- src/backend/libs-and-metadata.md | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/src/backend/libs-and-metadata.md b/src/backend/libs-and-metadata.md index 4cf0dfa6a..4a0b75c69 100644 --- a/src/backend/libs-and-metadata.md +++ b/src/backend/libs-and-metadata.md @@ -23,9 +23,8 @@ This file contains: * Object code, which is the result of code generation. This is used during regular linking. There is a separate `.o` file for each [codegen unit]. - The codegen step can be skipped with the [`-C - linker-plugin-lto`][linker-plugin-lto] CLI option, which means each `.o` - file will only contain LLVM bitcode. + The codegen step can be skipped with the [`-C linker-plugin-lto`][linker-plugin-lto] CLI option, + which means each `.o` file will only contain LLVM bitcode. * [LLVM bitcode], which is a binary representation of LLVM's intermediate representation, which is embedded as a section in the `.o` files. This can be used for [Link Time Optimization] (LTO).