Skip to content

moq-native(jemalloc): drop runtime activation; fixes moq-boy startup crash#1509

Merged
kixelated merged 2 commits into
mainfrom
claude/fervent-mestorf-b64c98
May 26, 2026
Merged

moq-native(jemalloc): drop runtime activation; fixes moq-boy startup crash#1509
kixelated merged 2 commits into
mainfrom
claude/fervent-mestorf-b64c98

Conversation

@kixelated
Copy link
Copy Markdown
Collaborator

@kixelated kixelated commented May 26, 2026

Summary

Root cause of moq-boy crashing on first startup with jemalloc profiler failed: name or mib specifies an unknown/invalid value:

rs/moq-native/src/jemalloc.rs had a branch that tried to flip on profiling at runtime:

Ok(false) => {
    unsafe { raw::write(prof_active, true) }
        .map_err(|err| anyhow::anyhow!("failed to activate jemalloc profiling: {err}"))?;
}

But jemalloc only initializes the profiling backend if opt.prof=true at init. Writing prof.active later returns EINVAL. The branch was dead code in the success case and a crash in the failure case.

moq-relay's nix module sets MALLOC_CONF=prof:true,prof_active:true at the systemd unit, so it hits the Ok(true) branch and never noticed. moq-boy compiles in the same jemalloc feature but is launched without any MALLOC_CONF wrapper, takes the broken Ok(false) path, and crashes. systemd restarts it, restart works fine if the env got applied elsewhere, masking the bug.

This drops the activation attempt entirely. If prof.active reads false, log a hint to set MALLOC_CONF=prof:true,prof_active:true at startup and return Ok(()).

Closes #1507.

Test plan

  • cargo check --features jemalloc on rs/moq-native builds cleanly
  • Manual: launch moq-boy without MALLOC_CONF (previously crashed; now logs a hint and continues without profiling)
  • Manual: launch moq-relay via the nix unit (already worked, should still work — Ok(true) branch unchanged)
  • Manual: launch a binary with MALLOC_CONF=prof:true,prof_active:true,prof_prefix=/tmp/heap and send SIGUSR1 — should still dump

Follow-up (separate PR)

moq-boy probably wants a launcher / nix module mirroring the relay's MALLOC_CONF env so heap profiling is opt-in via the same surface as the relay, instead of relying on the caller to set it manually.

(Written by Claude)

kixelated and others added 2 commits May 25, 2026 21:03
Heap profiling is a debug aid, but jemalloc::run() returned anyhow::Result<()>
and both moq-relay and moq-boy treated a returned Err as fatal via
`Err(err) = jemalloc => return Err(err).context(...)` arms in their top-level
select. The result: an unrelated jemalloc setup hiccup (prof.active unwritable,
SIGUSR1 handler install failing) could take down a running relay.

Convert all `?` paths inside run() to `tracing::warn!` + `return Ok(())`. The
existing `Err(err) = jemalloc` select arms in callers stay (signature is
unchanged) but the pattern simply never matches, so the binaries keep running.

Closes #1507.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Root cause of moq-boy crashing on first startup with "name or mib specifies
an unknown/invalid value": the `Ok(false) => write(prof.active, true)` branch
tries to flip profiling on at runtime, but jemalloc only initializes the
profiling backend when `opt.prof=true` at init. Writing prof.active later
returns EINVAL.

moq-relay's nix module sets MALLOC_CONF=prof:true,prof_active:true at the
systemd unit, so it hits the `Ok(true)` branch and never noticed. moq-boy has
no such wrapper, takes the dead `Ok(false)` path, and crashes.

Drop the activation attempt entirely. If prof.active reads false, log a hint
to set MALLOC_CONF=prof:true,prof_active:true at startup and return Ok(()).

Reverts the warn-on-EINVAL coverup from the previous commit on this branch in
favor of fixing the actual broken path.
@kixelated kixelated changed the title moq-native(jemalloc): swallow setup errors instead of killing the process moq-native(jemalloc): drop runtime activation; fixes moq-boy startup crash May 26, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 26, 2026

Review Change Stack

Caution

Review failed

Failed to post review comments

Walkthrough

The PR makes jemalloc profiling activation in run() best-effort: read failures or a compiled-but-inactive prof.active are logged and cause run() to return Ok(()) instead of attempting raw::write(prof.active, true) or propagating errors. The run() docstring was updated to state profiling must be enabled at startup via MALLOC_CONF=prof:true and to note that toggling prof.active later returns EINVAL.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: removing runtime activation of jemalloc profiling and fixing the moq-boy startup crash.
Description check ✅ Passed The description clearly explains the root cause, the problem, the solution, and provides test plans related to the jemalloc profiling changes.
Linked Issues check ✅ Passed The PR successfully addresses #1507 by converting jemalloc setup failures from fatal errors to non-fatal warnings that return Ok(()).
Out of Scope Changes check ✅ Passed All changes in jemalloc.rs are directly related to fixing the profiling activation error handling as specified in #1507.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
✨ Simplify code
  • Create PR with simplified code
  • Commit simplified code in branch claude/fervent-mestorf-b64c98

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
rs/moq-native/src/jemalloc.rs (1)

8-9: 💤 Low value

Clarify the early-return behavior in the docstring.

The phrase "Returns when profiling can't be set up" could be clearer. The function returns early with Ok(()) on setup failure, but runs indefinitely in a signal-handling loop on success.

📝 Suggested docstring improvement
-/// Profiling is a debug aid, so any setup failure is logged and swallowed
-/// rather than propagated. Returns when profiling can't be set up.
+/// Profiling is a debug aid, so any setup failure is logged and the function
+/// returns early with Ok(()). On successful setup, loops indefinitely handling SIGUSR1.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@rs/moq-native/src/jemalloc.rs` around lines 8 - 9, Clarify the docstring for
the profiling setup function in rs/moq-native/src/jemalloc.rs: state explicitly
that on setup failure the function returns early with Ok(()) (logging the error
and swallowing it), but on successful setup it does not return and instead runs
indefinitely in the signal-handling loop; update the sentence "Returns when
profiling can't be set up" to something like "Returns Ok(()) if profiling setup
fails (error logged and swallowed); on success the function enters the
signal-handling loop and does not return."
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@rs/moq-native/src/jemalloc.rs`:
- Around line 8-9: Clarify the docstring for the profiling setup function in
rs/moq-native/src/jemalloc.rs: state explicitly that on setup failure the
function returns early with Ok(()) (logging the error and swallowing it), but on
successful setup it does not return and instead runs indefinitely in the
signal-handling loop; update the sentence "Returns when profiling can't be set
up" to something like "Returns Ok(()) if profiling setup fails (error logged and
swallowed); on success the function enters the signal-handling loop and does not
return."

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 2e6a1a97-f909-422a-8456-1a441937c2dd

📥 Commits

Reviewing files that changed from the base of the PR and between 568838a and 124b97f.

📒 Files selected for processing (1)
  • rs/moq-native/src/jemalloc.rs

@kixelated kixelated enabled auto-merge (squash) May 26, 2026 04:09
@kixelated
Copy link
Copy Markdown
Collaborator Author

@coderabbitai the docstring nit is moot — commit 3b8c269 (after the review snapshot at 124b97f) rewrote the docstring to describe the new contract ("Profiling must be enabled at startup via MALLOC_CONF=prof:true ... jemalloc only initializes the profiling backend when opt.prof is set at init; toggling prof.active later returns EINVAL"), since I also dropped the runtime-activation branch entirely as the real root cause fix. (Written by Claude)

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 26, 2026

Tip

For best results, initiate chat on the files or code changes.

@kixelated Thanks for the context — makes sense. Dropping the runtime-activation branch entirely is the right fix at the root, and the updated docstring clearly captures the new contract (MALLOC_CONF=prof:true at init, with the EINVAL note for late toggling). The nit is fully resolved.

@kixelated kixelated merged commit 8f257a7 into main May 26, 2026
1 check passed
@kixelated kixelated deleted the claude/fervent-mestorf-b64c98 branch May 26, 2026 04:26
@moq-bot moq-bot Bot mentioned this pull request May 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

moq_native::jemalloc: profiling activation errors should warn, not be fatal

1 participant