SPDL v0.5.0 Release Notes
Highlights
spdl autoresearch — automated, LLM-driven pipeline optimization. This release introduces a new top-level package, spdl.autoresearch, and the spdl autoresearch CLI command. It is an automated experiment engine that drives
a coding agent to analyze pipeline metrics, identify bottlenecks, propose parameter and code changes, and iterate toward an objective (for the bundled workflow, minimizing steady-state step time). The framework is split into a domain-neutral scheduler with checkpoint/resume (spdl.autoresearch.core) and a concrete SPDL data-loading optimization workflow
(spdl.autoresearch.pipeline_optimization), and is pluggable via the spdl.autoresearch.workflows entry-point group (#1463, #1465, #1466, #1467).
Shared-memory arena for iterate_in_subprocess. A new shared-memory arena framework lets you move large payloads (big bytes, NumPy arrays, Torch tensors) across the subprocess boundary through a single pre-allocated buffer instead of allocating a fresh shared-memory segment per object. Two backends ship: SharedMemorySegmentPool and SharedMemoryRingBuffer, both exported from spdl.pipeline. Pass one via the new arena= argument to iterate_in_subprocess to reduce per-item IPC overhead and allocator churn (#1520, #1521, #1522).
Packets serialization and zero-copy transport. AudioPackets,
VideoPackets, and ImagePackets are now picklable, so demuxed packets can be sent across processes (e.g. demux in a worker, decode on GPU elsewhere). A public Packets.deserialize static method is the single entry point for reconstructing packets, and a new Packets.deserialize_view reconstructs packets pointing directly into a caller-supplied buffer with no payload copy. Serialized payloads are padded to a 64-byte boundary so zero-copy views land on SIMD-aligned addresses (#1439, #1513, #1514, #1517).
BC-Breaking Changes
No BC-breaking changes to existing public APIs. All new arguments are optional with behavior-preserving defaults, and all new symbols are purely additive — so no migration is required.
New Features
-
spdlcommand-line entry point.
A newspdlconsole script (andpython -m spdl) provides a utility CLI;spdl autoresearchis its first subcommand (#1476).spdl --help spdl autoresearch --help
-
spdl.autoresearchpackage for automated, LLM-driven experiment workflows, with a domain-neutral scheduling engine (spdl.autoresearch.core), a pluggable workflow contract (WorkflowProtocol/WorkflowSpec), and a bundled pipeline-optimization workflow (#1449, #1450, #1463, #1464).# Interactive: a supervisor agent fills in missing config, starts the # engine, and monitors progress. spdl autoresearch supervisor <workdir> \ --pipeline-script path/to/pipeline.py \ --source-dir path/to/source/ \ --build-command "<build command>" \ --base-launch-command "<launch command with \$IMAGE>"
-
Shared-memory arena backends
SharedMemorySegmentPoolandSharedMemoryRingBuffer, exported fromspdl.pipeline, plus the newarena=parameter oniterate_in_subprocess(#1520, #1521, #1522).from spdl.pipeline import iterate_in_subprocess, SharedMemorySegmentPool # Pre-allocate the arena in the parent; ownership transfers to the call, # which closes and unlinks it at teardown (do not reuse it afterwards). pool = SharedMemorySegmentPool(segment_size=1 << 18, count=4) for item in iterate_in_subprocess(make_items, arena=pool, buffer_size=2): ... # large bytes / NumPy / Torch fields arrive as zero-copy views # SharedMemoryRingBuffer is the alternative backend; size it to the # in-flight high-water mark, roughly (buffer_size + 2) * max_unit_bytes. from spdl.pipeline import SharedMemoryRingBuffer ring = SharedMemoryRingBuffer(capacity=1 << 20)
-
Picklable
Packetsand public deserialization API.
Packetsobjects support pickle/copy.deepcopy, and gain publicdeserializeand zero-copydeserialize_viewstatic methods (#1439, #1513, #1514).import pickle import spdl.io packets = spdl.io.demux_video("sample.mp4") # Packets are now picklable (e.g. demux in a worker, decode elsewhere). restored = pickle.loads(pickle.dumps(packets)) # Or reconstruct explicitly from the serialized bytes. state = packets.__getstate__() copy = spdl.io.VideoPackets.deserialize(state) # Zero-copy: the packets point directly into `state` (keep it alive). view = spdl.io.VideoPackets.deserialize_view(memoryview(state)) frames = spdl.io.decode_packets(view)
-
Full-coverage distributed sampler mode.
DistributedDeterministicSamplerandDistributedRandomSamplergain addp_drop_last_distributed_roundargument (defaultTrue, preserving prior behavior). Set it toFalseto cover every sample exactly once across ranks,
with rank lengths differing by at most one (#1515).from spdl.source import DistributedDeterministicSampler # N = 11, world_size = 4 # default (drop): every rank gets 2 indices; indices 8, 9, 10 are dropped. # full coverage : ranks 0-2 get 3 indices, rank 3 gets 2; all 11 covered. sampler = DistributedDeterministicSampler( 11, rank=rank, world_size=4, ddp_drop_last_distributed_round=False, ) indices = list(sampler)
Bug Fixes
- Fix
GeneratorExitfrom a consumerbreakincorrectly shutting down the subprocess worker initerate_in_subprocess(#1433). - Make the local job runner and engine command Windows-friendly (#1486).
- Fix the headspace throughput ceiling to use steady-state rather than epoch-average throughput (#1492).
Documentation
- Add a video color-space documentation page (#1518, #1519).
- Add an FFmpeg CLI cheat sheet to the docs (#1516).
- Document the shared-memory arena (CPU savings, noisy-neighbour behavior).
- Add demux
codec_mutexconcurrency guidance toDemuxer/decode_packets_nvdecdocstrings (#1501, #1503). - Switch the video-classification example to an MTP pipeline with optimized concurrency, and add an autoresearch example doc (#1494, #1499, #1505).
Other Changes
- Add an arena transport benchmark (throughput, CPU, memory) (#1524, #1527).
- Exclude the
tests/directory from the package (#1488). - Enable Pyrefly type checking in
fbcode/spdland resolve the resulting type errors (internal). - Numerous internal refactors and prompt/knowledge updates supporting the new
spdl.autoresearchframework.