Skip to content
This repository was archived by the owner on Apr 29, 2026. It is now read-only.
This repository was archived by the owner on Apr 29, 2026. It is now read-only.

Performance: Channel throughput is 400-10,000x slower than alternatives #306

@navicore

Description

@navicore

Current State

Fanout benchmark: 1 producer → 10 workers → 100k messages

Language Time vs Seq
Seq 100,000ms 1x
Python (asyncio) 230ms 400x faster
Go 30ms 3,300x faster
Rust 9ms 11,000x faster

Root Causes

  1. Value boxing: Every message is wrapped in 40-byte Value struct
  2. Channel synchronization: Lock contention on each send/receive
  3. No batching: Each message is an individual operation
  4. Yielding overhead: chan.yield calls between operations

Potential Approaches

Near-term

  • Primitive channels: IntChannel type that passes i64 directly without boxing
  • Buffered channels: Reduce lock contention with ring buffer
  • Batch send/receive: chan.send-all / chan.receive-n operations

Long-term

  • Lock-free channels: Use atomic operations instead of mutex
  • Zero-copy for large values: Pass pointers instead of copying
  • Channel fusion: Optimize known patterns (fan-out, pipeline)

Benchmark Code

: worker-loop ( Channel Channel Int -- )
  2 pick chan.receive drop
  chan.yield
  dup 0 i.< if
    drop swap chan.send drop drop
  else
    drop 1 i.+ worker-loop
  then
;

: producer ( Channel Int -- )
  dup 0 i.> if
    dup 2 pick chan.send drop
    1 i.- producer
  else
    drop drop
  then
;

Success Criteria

  • Throughput within 100x of Go (target: < 3,000ms for 100k messages)
  • Primitive channels within 10x of Go

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions