Skip to content
This repository was archived by the owner on Apr 29, 2026. It is now read-only.

Remove mutex and chan reg#113

Merged
navicore merged 1 commit intomainfrom
skynet-performance
Dec 21, 2025
Merged

Remove mutex and chan reg#113
navicore merged 1 commit intomainfrom
skynet-performance

Conversation

@navicore
Copy link
Copy Markdown
Owner

No description provided.

@claude
Copy link
Copy Markdown

claude Bot commented Dec 21, 2025

PR Review: Remove mutex and chan reg

Summary

This PR removes the global channel registry and associated mutex contention, transitioning to a zero-mutex design where channels are passed directly as Value::Channel on the stack. This is an excellent performance optimization that eliminates lock contention while maintaining correctness.

Code Quality & Architecture ✅

Strengths:

  • Excellent simplification: Removes ~400 lines of complex registry management code
  • Clear documentation: The updated module docs explain the zero-mutex design well
  • Idiomatic Rust: Proper use of Arc<ChannelData> for shared ownership
  • Good test coverage: Tests verify channel sharing via dup and cross-strand communication

Suggestions:

  1. crates/runtime/src/channel.rs:82-84 - The ChannelData::clone() implementation is redundant since sender and receiver already implement Clone. Rust will auto-derive this correctly:

    #[derive(Debug, Clone)]  // Just add Clone to the derive
    pub struct ChannelData {
        pub sender: mpmc::Sender<Value>,
        pub receiver: mpmc::Receiver<Value>,
    }
  2. crates/runtime/src/value.rs:27-33 - The PartialEq implementation for ChannelData uses pointer comparison, which means two channels with the same underlying sender/receiver won't compare equal if they're different Arc instances. Consider documenting this semantic choice or using a different approach if value equality is desired.

Potential Bugs 🟡

Critical - Memory Safety:

crates/runtime/src/channel.rs:305+ (in tests) - The test test_arena_string_send_between_strands uses unsafe Arc::from_raw patterns that could cause use-after-free:

let ch_ptr = Arc::as_ptr(&channel) as i64;
CHANNEL_PTR.store(ch_ptr, Ordering::Release);
std::mem::forget(channel.clone());  // Keeps Arc alive, but...

// In strand:
let channel = Arc::from_raw(ch_ptr);
let channel_clone = Arc::clone(&channel);
std::mem::forget(channel); // Don't drop

Issue: This pattern is error-prone and relies on manual reference counting. If the number of forget calls doesn't match the number of from_raw calls, you'll get either a memory leak or use-after-free.

Recommendation: Use a safer pattern with proper scoping:

static CHANNEL: OnceLock<Arc<ChannelData>> = OnceLock::new();

// In test setup:
CHANNEL.set(channel.clone()).unwrap();

// In strands:
let channel = CHANNEL.get().unwrap().clone();

This same pattern appears in test_channel_dup_sharing and test_arena_with_channel_send in scheduler.rs:702-807.

Performance Considerations ✅

Excellent improvements:

  • Zero mutex contention: Direct channel handle passing eliminates the global registry lock
  • No registry lookups: O(1) send/receive operations instead of O(1) + lock acquisition
  • Reduced memory overhead: No per-channel statistics or ID tracking
  • Simpler memory model: Channels are just values, reference counted automatically

Measurement suggestion: Consider adding a benchmark comparing the old vs new implementation to quantify the performance improvement (though the architectural benefits alone justify this change).

Security Concerns 🟢

No security issues identified.

The removal of the global registry actually improves security by:

  • Reducing attack surface (no global mutable state)
  • Eliminating potential deadlock scenarios
  • Making channel lifetime explicit (tied to value ownership)

Test Coverage ✅

Strong test coverage:

  • ✅ Channel creation and basic send/receive
  • ✅ Channel sharing via dup (value semantics)
  • ✅ Multi-value FIFO ordering
  • ✅ Channel close semantics
  • ✅ Arena string promotion on send
  • ✅ Cross-strand communication

Gaps identified:

  1. No test for channel drop behavior: What happens when all senders are dropped? The receiver should see a closed channel error. Add a test:

    #[test]
    fn test_channel_close_on_sender_drop() {
        // Create channel, drop all senders, verify receive fails
    }
  2. No test for MPMC work-stealing: The docs claim "work-stealing semantics" but there's no test verifying that multiple receivers correctly compete for messages without duplication.

  3. Error path coverage: The _safe variants are tested, but not thoroughly. Consider testing:

    • Send to a channel where all receivers are dropped
    • Receive from a channel where all senders are dropped

Additional Observations

Documentation Quality ✅

The module documentation is excellent and clearly explains:

  • The zero-mutex design rationale
  • Stack effects for each operation
  • MPMC semantics

Diagnostics Impact 🟡

crates/runtime/src/diagnostics.rs:187-188 - Good note that channel stats are no longer available:

// Note: Channel stats are not available with the zero-mutex design.

Consideration: For production debugging, you might want a way to track "how many channels exist". This could be done with a simple atomic counter in make_channel and the Drop impl for ChannelData, with zero contention cost.

API Consistency ✅

The transition from channel_id (Int) to Channel (Arc) is clean and type-safe. The compiler will catch any code still using the old pattern.

Conclusion

This is a high-quality refactoring that significantly improves the runtime's architecture. The main concerns are:

  1. Fix the unsafe Arc pattern in tests (critical for correctness)
  2. Add missing test coverage for drop behavior and MPMC work-stealing
  3. Consider auto-deriving Clone for ChannelData

The architectural benefits far outweigh the minor issues identified. The zero-mutex design is the right approach for a high-performance concurrent runtime.

Recommendation: ✅ Approve with minor fixes

Please address the unsafe Arc usage in tests before merging. The other suggestions are nice-to-haves but not blockers.


Great work on this refactoring! The codebase is significantly cleaner and more performant.

@navicore navicore merged commit 702fdbd into main Dec 21, 2025
2 checks passed
@navicore navicore deleted the skynet-performance branch December 21, 2025 05:03
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant