Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On MacOS, dtrace fails due to running as the wrong CPU architecture if the "cargo" binary's arch doesn't the arch of the program being profiled #302

Closed
zbentley opened this issue Feb 3, 2024 · 1 comment · Fixed by #304

Comments

@zbentley
Copy link
Contributor

zbentley commented Feb 3, 2024

Environment

  • M1 (arm/aarch64) MacOS 14.
  • rustc 1.75 aarch64 stable toolchain and target.
  • flamegraph-rs 0.5.1.
  • SIP disabled.
  • sudo-capable account.
  • Rosetta 2 installed.
  • cargo (the binary executable) compiled as non-native x86_64, running via Rosetta. E.g. you can reproduce the following:
> arch
arm64
> file $(which cargo)
/path/to/cargo: Mach-O 64-bit executable x86_64

That last part is important, and isn't as uncommon as you'd think due to e.g. subpar brew installations of rustup-init or other workstation provisioning automations that run an x86 binary somewhere in the stack before Cargo is installed.

It's also rarely a problem. Cargo-the-build-system still works totally fine compiling and cross compiling Rust programs for user-selected architectures/toolchains even if cargo-the-binary isn't built for the native hardware architecture. While not optimal, it doesn't even have major performance implications, since cargo-the-binary is usually delegating out to other (hopefully native-arch) binaries like rustc for most heavy lifting.

In some rare cases like this one, though, it does cause a problem:

Steps to reproduce

  • Create a crate with a rust binary and a target architecture of aarch64. Doesn't matter what the binary does; hello world is fine.
  • Do cargo flamegraph --root --bin $project

Expected behavior

Successful emission of a flamegraph SVG.

Observed behavior

> cargo flamegraph --root --bin untitled
    Finished release [optimized + debuginfo] target(s) in 0.00s
dtrace: invalid probe specifier profile-997 /pid == $target/ { @[ustack(100)] = count(); }: "/usr/lib/dtrace/darwin.d", line 26: syntax error near "uthread_t"
failed to sample program

Pathology

I'm writing a longer-form blog post about this which I'll link shortly. The short version is that when cargo is used as a subcommand runner, Rosetta's "architecture preference inheritance" kicks in. If a subcommand (e.g. the flamegraph binary) is spawned as a child of a non-native-architecture binary running under Rosetta, MacOS will try to use the non-native architecture for all child and grandchild, recursively processes.

In other words, if cargo is an x86 binary, even if flamegraph is an ARM binary (which causes MacOS to say "can you run as x86? No? Ok, ARM is fine then" and fall back), when flamegraph runs sudo, or when sudo runs dtrace, MacOS will keep trying to launch sub-sub-subcommands as x86.

This is problematic when dtrace is involved, since dtrace (and sudo) are distributed on MacOS as universal binaries, meaning that they can be run as either architecture. When its spawner expresses a preference to run as x86, dtrace will gladly do so, and will then attempt to trace an aarch64 user program.

This results in the less-than-informative "invalid probe specifier ... syntax error" failure in the dtrace stdlib import, which really means "x86 dtrace has no idea how to load symbols needed to trace an aarch64 binary".

Proposed resolution

flamegraph should probe the binary being traced and see what architecture(s) it is built for. If it's built for the native architecture (potentially among others), or if its only built for one architecture, it should override the architecture-preference "hints" with which sudo and dtrace are spawned in order to force the architecture-matching versions of those universal binaries to be used, even if cargo-the-binary has the wrong arch.

I'm drafting a PR to do this now, but I definitely am not invested in that fix. If this behavior is deemed rare enough as an edge case (though see the top of this report; I don't think it's too terribly rare), I could just update flamegraph's readme or help docs to explain the error and resolution steps (reinstall cargo as the right architecture) instead.

@zbentley
Copy link
Contributor Author

zbentley commented Feb 5, 2024

I wrote some additional info about the discovery/identification of this issue on my blog, here in case other troubleshooting steps are useful for others.

@djc djc closed this as completed in #304 Feb 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant