Skip to content

fix: misc token fixes#4569

Open
MasterPtato wants to merge 1 commit into04-06-fix_pb_clean_up_actor_stop_decision_handlingfrom
04-06-fix_misc_token_fixes
Open

fix: misc token fixes#4569
MasterPtato wants to merge 1 commit into04-06-fix_pb_clean_up_actor_stop_decision_handlingfrom
04-06-fix_misc_token_fixes

Conversation

@MasterPtato
Copy link
Copy Markdown
Contributor

Description

Please include a summary of the changes and the related issue. Please also include relevant motivation and context.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes.

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Copy link
Copy Markdown
Contributor Author

MasterPtato commented Apr 6, 2026

@claude
Copy link
Copy Markdown

claude bot commented Apr 6, 2026

Code Review

Overall this is a solid set of bug fixes with a good security improvement. A few observations below.


Security: Constant-time token comparison (ctx.rs, envoy.rs)

The subtle::ConstantTimeEq addition is the right call — timing-safe comparison prevents leaking token validity through response time differences.

One nuance worth noting: ConstantTimeEq on [u8] slices only guarantees constant-time behavior when both slices have equal length. If lengths differ, it returns "not equal" immediately, leaking whether the submitted token length matches the expected length. For an admin token this is an acceptable trade-off (an attacker who already knows the expected length hasn't gained much), but it's worth being aware of.

The ct_ne logic is correct: returns Forbidden when not-equal, falls through to Ok(()) when equal.


Bug fix: Missing tx.delete in pegboard-envoy/src/conn.rs

tx.add_conflict_key(&old_lb_key, ConflictRangeType::Read)?;
tx.delete(&old_lb_key);  // ← new

This is a real correctness fix. Without the delete, old load balancer index entries would accumulate across re-registrations. The conflict key for read is already present and remains correct.


Bug fix: Protocol version for serverful pools (pegboard-envoy/src/conn.rs)

tx.write(
    &pegboard::keys::runner_config::ProtocolVersionKey::new(namespace_id, pool_name.clone()),
    protocol_version,
)?;

Good fix. The comment correctly explains that metadata_poller only runs for serverless pools, leaving serverful pools without a protocol version update on re-registration. This addresses that gap.


Refactor: pegboard-outbound/src/lib.rs reordering

The new order — check namespace → check pool exists → check pool is serverless → build payload — is more logical. Previously the payload was built before confirming the pool is serverless, wasting work. The namespace check before the pool check is also more consistent with the error reporting (namespace errors use report_error, pool absence is a silent debug log).

The token is now passed from the call site rather than re-reading config inside serverless_outbound_req. The lifetime works out correctly: ctx.config() returns &Config, so a.admin_token.read() returns &String with a lifetime tied to ctx, which outlives the function call.


TypeScript: Token refresh in startEnvoySync (envoy-client/src/tasks/envoy/index.ts)

GLOBAL_SHARED_CTX.config.token = config.token;

The fix is correct — serverless envoys outlive the /start request that spawned them, so the token needs updating on subsequent calls. The cleanup on shutdown (GLOBAL_ENVOY = undefined; GLOBAL_SHARED_CTX = undefined) is important for correctness and was missing before.

Minor note: mutating config.token on a shared context object is a side-effectful pattern. It works here since startEnvoySync is the sole writer, but a brief comment explaining the token refresh mechanism (beyond the one already there) might help the next reader understand why mutation is safe.


Removed comment in pegboard-kv-channel/src/lib.rs

-/// Defense-in-depth: verifies the actor belongs to the authenticated namespace.
-/// The admin_token is a global credential, so this is not strictly necessary
-/// today, but prevents cross-namespace access if a less-privileged auth
-/// mechanism is introduced in the future.

The code remains — just the explanation is gone. The "defense-in-depth" rationale is useful context for future readers, especially the note about why a global-credential check is still doing a namespace verification. Consider keeping a shorter version of the rationale rather than removing it entirely.


update_ping.rs: comment/code reorder

// Clear old key          ← moved from after add_conflict_key
tx.add_conflict_key(&old_lb_key, ConflictRangeType::Read)?;
tx.delete(&old_lb_key);

The comment now sits above both the conflict-key registration and the delete, which reads more naturally as "this block clears the old key." No functional change.


channel.rs comment

Minor doc cleanup (/// Authentication token. without the implementation-detail suffix). Fine as-is given the field name is self-explanatory.

@claude
Copy link
Copy Markdown

claude bot commented Apr 6, 2026

PR Review: fix: misc token fixes

Summary

This PR makes several security and correctness fixes across token authentication and envoy connection code:

  1. Constant-time token comparison via subtle::ConstantTimeEq (timing-attack mitigation)
  2. Token refresh fix for serverless envoy reuse (updating token on global envoy reconnect)
  3. Missing tx.delete(&old_lb_key) added in handle_init (genuine bug fix)
  4. Refactoring in pegboard-outbound to pass token as a parameter rather than re-reading config
  5. Cleanup: replace HashMap allocation with find_map in pegboard-kv-channel URL param parsing
  6. Protocol version write added for serverful pools in handle_init

Overall: mostly correct with real bug fixes. A couple of points worth discussing below.


Issues

1. Removed Security Rationale Comment (pegboard-kv-channel/src/lib.rs)

The removed comment explained the design rationale for cross-namespace verification in resolve_actor:

// The admin_token is a global credential, so this is not strictly necessary
// today, but prevents cross-namespace access if a less-privileged auth
// mechanism is introduced in the future.

Per CLAUDE.md, comments documenting "deltas" are not useful, but this comment documented a security design decision (defense in depth), not a delta. Without it, future maintainers won't understand why the check exists and may remove it thinking it's redundant. Consider restoring it in a revised form.

2. TypeScript Token Mutation on Shared Context (envoy-client/src/tasks/envoy/index.ts:104)

GLOBAL_SHARED_CTX.config.token = config.token;

The token is mutated directly on the shared config object while startConnection(shared) may concurrently reference config.token. In JS's single-threaded event loop this is generally safe, but it's worth clarifying in a comment whether this update is guaranteed to be picked up by the current reconnection attempt or only the next one. If the envoy reconnect loop reads ctx.shared.config.token at reconnect time (not cached at connect time), this is fine — but that guarantee isn't obvious from the diff.


Minor Observations

  • conn.rs:262-272: The new tx.write(ProtocolVersionKey, protocol_version) in handle_init for serverful pools is correct and intentional. It writes on every init, which is slightly redundant for serverless pools where metadata_poller already handles it, but harmless.

  • ct_ne correctness: The constant-time comparison is semantically correct — ct_ne returns Choice(1) on mismatch, so the if branch fires on inequality. Note that subtle's ct_ne on &[u8] short-circuits on length difference, so it's not strictly constant-time for tokens of differing lengths, but length revelation is low-risk here.

  • pegboard-kv-channel find_map cleanup: Good improvement. Avoids unnecessary allocation and is more idiomatic. The behavior difference (first vs. last occurrence for duplicate query params) is not a real concern for well-formed URLs.

  • update_ping.rs: Comment reordering is a minor readability improvement; no semantic change.


Verdict

The missing tx.delete(&old_lb_key) fix is the most impactful change — without it, old load balancer entries would accumulate on reconnection. The constant-time token comparison is a good security hardening. No blocking issues found.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant