Augment build json messages for app-bundle post-processing #13612

dvdhrm · 2024-03-21T10:31:39Z

Hi

We are working on tools to produce final application artifacts from Rust applications, allowing distribution of Rust applications through AppStores and other application hubs. Following, a list of target formats we work with, to give a rough understanding of the scope:

Producing Foobar.app application bundles for macOS, code-signing with suitable Apple provisioning information and certificates. On top, creating Foobar.pkg, Foobar.dmg, or Foobar.zip product/image/archive builds for distribution. (+analogous approach for iOS)
Producing Foobar.msix application bundles for Windows, signed and augmented with suitable metadata ready to distribute on the Windows Store.
Producing Foobar.apk Android packages ready for execution on Android devices. On top, creating Foobar.aab Android bundles for distribution on the Play Store.
Producing Flatpak repositories ready for distribution on Fedora-style Linux systems.
Producing Foobar.snap for distribution on Ubuntu-style Linux systems.
...

Design

Our strategy is to put Cargo into the center, and ideally make cargo build produce the artifacts the user desires. Since Cargo does not support this, yet, we provide our own cargo-subcommand, as is common with such extensions. We want Cargo.toml as root level configuration, possibly augmented with package.metadata keys to augment whatever cannot be deduced from Cargo metadata. However, we want the process to deduce all information from the Cargo package, and only require metadata configuration if the user desires full control over specific parts of the build. And also, we want all this to work on stable, without reliance on unstable features (yet the user is of course free to use them).

Whether Cargo ever gains capabilities to produce such artifacts is not important, but we still model the process to allow moving required steps/configuration piece by piece from the custom subcommand to Cargo, if possible. (e.g., make Cargo emit universal binaries on macOS, stripping this process from our build).

Application bundles differ greatly across platforms, but they all require some kind of application entry-point(s). Hence, our build process is rooted in a single Cargo package which must provide the entry-point. This package is built and bundled up with all the artifacts produced as part of the build. Unstable Cargo bindeps allow full freedom on splitting such builds into multiple executables, shared objects, or other artifacts, if desired.

The custom subcommand currently starts by parsing cargo metadata of the target package, building a dependency tree of involved crates, parsing relevant metadata of each involved crate, and thus has a pretty good overview of what will end up in the build. It then invokes cargo rustc --lib --crate-type [..] for each target-architecture required for the desired application bundle. It will collect all produced artifacts, possibly post-process matching binaries into fat/universal-binaries, and then package all data into the application bundle. Afterwards, it will produce the desired archive formats from the application bundle for distribution.

Struggles

The tool as described works fine and can produce suitable artifacts, and the design seems to work out. Yet, there are some things we have to solve via rather ugly heuristics, where we would love for Cargo to provide us more information:

Artifact Types: compiler-artifact messages tell us the files a build-target emitted, but does not tell us what the files represent. We have to judge based on build-type, crate-type, and file-extension whether a given file is of interest (e.g., *.rmeta files are not of interest to us, but *.so or split-debuginfo certainly is).

This is particularly annoying for executables, since they often lack file-extensions and can even override the output filename via configuration (which is not reported in the metadata). We use the executable: ... key in the compiler-artifact message to improve the heuristics.

Ideally, we would be able to tell Cargo which artifact-types we can make use of, and it would only report those, ideally even telling us the type of each artifact.

Also note that bigger applications will often split into many shared-objects and executables, for reasons like faster ELF/PE load times, delayed distribution, or licensing. And it is very convenient to simply pick up all artifacts Cargo reports, rather than requiring users to configure builds for each part, possibly duplicating it for each supported bundle-type, making this process very fragile.
Entrypoint: The main entry-point of applications can be either in shared-objects, loaded by the system with some known symbol(s), or a standard executable with a known start address. It depends on how the system was designed. Yet, we want to support both such styles from a single code-base. The common approach is to provide a src/lib.rs plus src/bin/runner.rs, and the build-system builds the target suitable for the selected system.

Unfortunately, src/bin/runner.rs cannot be guarded based on platform, so it has to be implemented for each system, even if it makes no sense on the system, thus leading to unnecessary stubs. Furthermore, it can be ambiguous which binary provides the entry-point, if the package has multiple ones. Lastly, it feels unnecessarily complex to have to pick a different build-target, if there is no technical requirement for it, and entry-point definitions end up in different files, depending on the target.

While it is certainly great to allow users to create a different build-target for each platform, we did not see it as a suitable default. Hence, we instead always build src/lib.rs and expect it to provide the entry-points for all targets. We use --crate-type {bin,cdylib} to ensure the correct artifact is produced. This is especially nice if the application employs cross-platform frameworks, since you can now simply use their macros to generate the entry-points in lib.rs and be done. No need to create stub src/bin/....

Unfortunately, such executables are not reported as executable: ... in compiler-artifact messages, making it awkward to find. Furthermore, cargo run will not run this, even if no other binary build-target exists, nor does it support a --lib flag (and cargo run is just very convenient during development, even without bundling the application).

Proposals

If Cargo would report filetypes: [] alongside filenames: [] in its compiler-artifact message, we would have a much easier time figuring out which files we are interested in, and it would certainly make false positives much less likely. Whether this would use short keys like dll, so, exe, debuginfo, rmeta, ..., or whether it reports mime-types, we do not mind.
If libraries built as --crate-type bin would be reported via the executable: ... key, we would be happy with the entry-point design. I implemented this as core/compiler: report executable paths for all binary crates #13605. We would also love for cargo run to run executables built from libraries via one of the suggested approaches.

We maintain a longer list of things we would to see improved in Cargo and Rust, but those can all be dealt with now. The listed issues are the ones requiring rather unsatisfying workarounds.

Thanks
David

The text was updated successfully, but these errors were encountered:

soloturn · 2024-04-27T05:59:40Z

there is a 5 years old flag --out-dir which looks like it would help: #6790 to get the binaries. if you get the binaries, the rest is just normal packaging, isn't it? if not, can you add an examle of the the directory listing to better understand which files you mean to include in the package?

dvdhrm · 2024-04-29T07:53:36Z

there is a 5 years old flag --out-dir which looks like it would help: #6790 to get the binaries. if you get the binaries, the rest is just normal packaging, isn't it?

We already parse the compiler-artifact message, so we know where the artifacts are. --out-dir just copies those to a separate directory, which does not really help in our situation, if I understand you correctly.

if not, can you add an examle of the the directory listing to better understand which files you mean to include in the package?

Can you specify which directory listing you want?

We package applications for arbitrary cargo packages of a user. Hence, we have no control over the naming-scheme they use for the different parts of their product. Instead, we rely on compiler-artifact messages to tell us about all the artifacts created by a standard build. The artifact of the lib-target of the main package is what we pick as entry-point, all other artifacts are copied as dependencies.

The problem we face is that we cannot reliably tell what kind of artifact a compiler-artifact message is for. --out-dir would collect all artifacts in a single directory, but we would still have to filter those artifacts depending on what the given platform allows in its application bundles, as well as ensuring each artifact is copied to the correct location.

For instance, the main entry-point executable needs to be copied to MyApp.app/Contents/MacOS/MyApp for macOS application bundles. Other executables usually go to MyApp.app/Contents/Helpers/ (I will skip the specifics, let me know if you want details). But how do we know which of the compiler-artifact messages is about the main-executable? We can filter based on package_id and target, but we might still get multiple compiler-artifact messages. So now we have to filter based on executable and filenames, because there is nothing else we can filter on.

This works most of the time, sure, but it is really fragile as soon as crates use conflicting names. What if the main executable is called foobar.rmeta? Or foobar.d? Or foobar.pdb? What if some existing compiler-target decides to support split-debug in the future, and uses a file-extension that clashes with names existing packages use?
Also be aware that reverse-domain-style is common for executables in application bundles, so it is much easier to cause extension clashes and not notice it (e.g., org.example.analyzer.pdb looks like a suitable reverse-domain name for a pdb-analyzer cmdline tool, but will have a conflicting file-name).

On top, we support dynamic-library builds via the Cargo bindeps feature. So a single build might produce a main binary plus dynamic dependencies or even other executables needed at runtime. We want to pull all these into the application bundle, to ensure they are available at runtime.

Long story short: We look at compiler-artifact messages and want to know:

Is this: an executable, a shared library, split-debuginfo, anything else

Currently, we cannot tell. We can guess, but there are several possible clashes that make it really hard to tell reliably. And we would like to be safe against new features in the future. We don't want new split-debuginfo (or other artifacts) to suddenly mess with heuristics, but instead ignore them until we introduce support for them as well.

epage · 2024-04-29T14:34:13Z

The title mentioned "cargo metadata and output". For Cargo maintainers, its easy to assume this would be referring to the cargo metadata output (which also gets referenced in the Design section) but the problem and solution sound like they are focused on the compiler's json output. As such, I've renamed the issue and adjusted the labels. If there is something I missed about this, let us know!

dvdhrm · 2024-04-30T11:44:13Z

The title mentioned "cargo metadata and output". For Cargo maintainers, its easy to assume this would be referring to the cargo metadata output (which also gets referenced in the Design section) but the problem and solution sound like they are focused on the compiler's json output. As such, I've renamed the issue and adjusted the labels. If there is something I missed about this, let us know!

They are tightly intertwined, aren't they? At least we cross-match the data from the build messages to the parsed cargo-metadata output (e.g., package_id, target-dict, profile-dict). Additionally, we prefer as much information up front as possible. So if cargo-metadata can tell us which artifacts of which type will be built, we wouldn't even care for the build messages.

But I don't mind the change of title, fine with me!

dvdhrm mentioned this issue Mar 21, 2024

core/compiler: report executable paths for all binary crates #13605

Draft

weihanglo added Command-metadata S-triage Status: This issue is waiting on initial triage. C-feature-request Category: proposal for a feature. Before PR, ping rust-lang/cargo if this is not `Feature accepted` labels Mar 29, 2024

epage added A-json-output Area: JSON message output Command-build and removed Command-metadata A-json-output Area: JSON message output labels Apr 29, 2024

epage changed the title ~~Augment Cargo metadata and output for app-bundle post-processing~~ Augment build json messages for app-bundle post-processing Apr 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Augment build json messages for app-bundle post-processing #13612

Augment build json messages for app-bundle post-processing #13612

dvdhrm commented Mar 21, 2024

soloturn commented Apr 27, 2024

dvdhrm commented Apr 29, 2024

epage commented Apr 29, 2024

dvdhrm commented Apr 30, 2024

Augment build json messages for app-bundle post-processing #13612

Augment build json messages for app-bundle post-processing #13612

Comments

dvdhrm commented Mar 21, 2024

Design

Struggles

Proposals

soloturn commented Apr 27, 2024

dvdhrm commented Apr 29, 2024

epage commented Apr 29, 2024

dvdhrm commented Apr 30, 2024