Skip to content

v0.5.0

Latest

Choose a tag to compare

@mthrok mthrok released this 12 Jun 16:05
· 12 commits to main since this release
f043423

SPDL v0.5.0 Release Notes

Highlights

spdl autoresearch — automated, LLM-driven pipeline optimization. This release introduces a new top-level package, spdl.autoresearch, and the spdl autoresearch CLI command. It is an automated experiment engine that drives
a coding agent to analyze pipeline metrics, identify bottlenecks, propose parameter and code changes, and iterate toward an objective (for the bundled workflow, minimizing steady-state step time). The framework is split into a domain-neutral scheduler with checkpoint/resume (spdl.autoresearch.core) and a concrete SPDL data-loading optimization workflow
(spdl.autoresearch.pipeline_optimization), and is pluggable via the spdl.autoresearch.workflows entry-point group (#1463, #1465, #1466, #1467).

Shared-memory arena for iterate_in_subprocess. A new shared-memory arena framework lets you move large payloads (big bytes, NumPy arrays, Torch tensors) across the subprocess boundary through a single pre-allocated buffer instead of allocating a fresh shared-memory segment per object. Two backends ship: SharedMemorySegmentPool and SharedMemoryRingBuffer, both exported from spdl.pipeline. Pass one via the new arena= argument to iterate_in_subprocess to reduce per-item IPC overhead and allocator churn (#1520, #1521, #1522).

Packets serialization and zero-copy transport. AudioPackets,
VideoPackets, and ImagePackets are now picklable, so demuxed packets can be sent across processes (e.g. demux in a worker, decode on GPU elsewhere). A public Packets.deserialize static method is the single entry point for reconstructing packets, and a new Packets.deserialize_view reconstructs packets pointing directly into a caller-supplied buffer with no payload copy. Serialized payloads are padded to a 64-byte boundary so zero-copy views land on SIMD-aligned addresses (#1439, #1513, #1514, #1517).

BC-Breaking Changes

No BC-breaking changes to existing public APIs. All new arguments are optional with behavior-preserving defaults, and all new symbols are purely additive — so no migration is required.

New Features

  • spdl command-line entry point.
    A new spdl console script (and python -m spdl) provides a utility CLI; spdl autoresearch is its first subcommand (#1476).

    spdl --help
    spdl autoresearch --help
  • spdl.autoresearch package for automated, LLM-driven experiment workflows, with a domain-neutral scheduling engine (spdl.autoresearch.core), a pluggable workflow contract (WorkflowProtocol / WorkflowSpec), and a bundled pipeline-optimization workflow (#1449, #1450, #1463, #1464).

    # Interactive: a supervisor agent fills in missing config, starts the
    # engine, and monitors progress.
    spdl autoresearch supervisor <workdir> \
      --pipeline-script path/to/pipeline.py \
      --source-dir path/to/source/ \
      --build-command "<build command>" \
      --base-launch-command "<launch command with \$IMAGE>"
  • Shared-memory arena backends
    SharedMemorySegmentPool and SharedMemoryRingBuffer, exported from spdl.pipeline, plus the new arena= parameter on iterate_in_subprocess (#1520, #1521, #1522).

    from spdl.pipeline import iterate_in_subprocess, SharedMemorySegmentPool
    
    # Pre-allocate the arena in the parent; ownership transfers to the call,
    # which closes and unlinks it at teardown (do not reuse it afterwards).
    pool = SharedMemorySegmentPool(segment_size=1 << 18, count=4)
    for item in iterate_in_subprocess(make_items, arena=pool, buffer_size=2):
        ...  # large bytes / NumPy / Torch fields arrive as zero-copy views
    
    # SharedMemoryRingBuffer is the alternative backend; size it to the
    # in-flight high-water mark, roughly (buffer_size + 2) * max_unit_bytes.
    from spdl.pipeline import SharedMemoryRingBuffer
    ring = SharedMemoryRingBuffer(capacity=1 << 20)
  • Picklable Packets and public deserialization API.
    Packets objects support pickle/copy.deepcopy, and gain public deserialize and zero-copy deserialize_view static methods (#1439, #1513, #1514).

    import pickle
    import spdl.io
    
    packets = spdl.io.demux_video("sample.mp4")
    
    # Packets are now picklable (e.g. demux in a worker, decode elsewhere).
    restored = pickle.loads(pickle.dumps(packets))
    
    # Or reconstruct explicitly from the serialized bytes.
    state = packets.__getstate__()
    copy = spdl.io.VideoPackets.deserialize(state)
    
    # Zero-copy: the packets point directly into `state` (keep it alive).
    view = spdl.io.VideoPackets.deserialize_view(memoryview(state))
    frames = spdl.io.decode_packets(view)
  • Full-coverage distributed sampler mode.
    DistributedDeterministicSampler and DistributedRandomSampler gain a ddp_drop_last_distributed_round argument (default True, preserving prior behavior). Set it to False to cover every sample exactly once across ranks,
    with rank lengths differing by at most one (#1515).

    from spdl.source import DistributedDeterministicSampler
    
    # N = 11, world_size = 4
    # default (drop): every rank gets 2 indices; indices 8, 9, 10 are dropped.
    # full coverage : ranks 0-2 get 3 indices, rank 3 gets 2; all 11 covered.
    sampler = DistributedDeterministicSampler(
        11, rank=rank, world_size=4,
        ddp_drop_last_distributed_round=False,
    )
    indices = list(sampler)

Bug Fixes

  • Fix GeneratorExit from a consumer break incorrectly shutting down the subprocess worker in iterate_in_subprocess (#1433).
  • Make the local job runner and engine command Windows-friendly (#1486).
  • Fix the headspace throughput ceiling to use steady-state rather than epoch-average throughput (#1492).

Documentation

  • Add a video color-space documentation page (#1518, #1519).
  • Add an FFmpeg CLI cheat sheet to the docs (#1516).
  • Document the shared-memory arena (CPU savings, noisy-neighbour behavior).
  • Add demux codec_mutex concurrency guidance to Demuxer / decode_packets_nvdec docstrings (#1501, #1503).
  • Switch the video-classification example to an MTP pipeline with optimized concurrency, and add an autoresearch example doc (#1494, #1499, #1505).

Other Changes

  • Add an arena transport benchmark (throughput, CPU, memory) (#1524, #1527).
  • Exclude the tests/ directory from the package (#1488).
  • Enable Pyrefly type checking in fbcode/spdl and resolve the resulting type errors (internal).
  • Numerous internal refactors and prompt/knowledge updates supporting the new spdl.autoresearch framework.