Skip to content

Latest commit

 

History

History
558 lines (415 loc) · 23.3 KB

py-recv.rst

File metadata and controls

558 lines (415 loc) · 23.3 KB

Receiving

The classes associated with receiving are in the :pyspead2.recv package. A stream represents a logical stream, in that packets with the same heap ID are assumed to belong to the same heap. A stream can have multiple physical transports.

Streams yield heaps, which are the basic units of data transfer and contain both item descriptors and item values. While it is possible to directly inspect heaps, this is not recommended or supported. Instead, heaps are normally passed to :pyspead2.ItemGroup.update.

Note

Malformed packets (such as an unsupported SPEAD version, or inconsistent heap lengths) are dropped, with a log message. However, errors in interpreting a fully assembled heap (such as invalid/unsupported formats, data of the wrong size and so on) are reported as :pyValueError exceptions. Robust code should thus be prepared to catch exceptions from heap processing.

Configuration

Once a stream is constructed, the configuration cannot be changed. The configuration is captured in two classes, :py~spead2.recv.StreamConfig and :py~spead2.recv.RingStreamConfig. The split is a reflection of the C++ API and not particularly relevant in Python. The configuration options can either be passed to the constructors (as keyword arguments) or set as properties after construction.

Blocking receive

To do blocking receive, create a :pyspead2.recv.Stream, and add transports to it with :py~spead2.recv.Stream.add_buffer_reader, :py~spead2.recv.Stream.add_udp_reader, :py~spead2.recv.Stream.add_tcp_reader or :py~spead2.recv.Stream.add_udp_pcap_file_reader. Then either iterate over it, or repeatedly call :py~spead2.recv.Stream.get.

Asynchronous receive

Asynchronous I/O is supported through Python's :pyasyncio module. It can be combined with other asynchronous I/O frameworks like twisted and Tornado.

The stream is also asynchronously iterable, i.e., can be used in an async for loop to iterate over the heaps.

Packet ordering

SPEAD is typically carried over UDP, and by its nature, UDP allows packets to be reordered. Packets may also arrive interleaved if they are produced by multiple senders. We consider two sorts of packet ordering issues:

  1. Re-ordering within a heap. By default, spead2 assumes that all the packets that form a heap will arrive in order, and discards any packet that does not have the expected payload offset. In most networks this is a safe assumption provided that all the packets originate from the same sender (IP address and port number) and have the same destination.

    If this assumption is not appropriate, it can be changed with the :pyallow_out_of_order attribute of :pyspead2.recv.StreamConfig. This has minimal impact when packets do in fact arrive in order, but reassembling arbitrarily ordered packets can be expensive. Allowing for out-of-order arrival also makes handling lost packets more expensive (because one must cater for them arriving later), which can lead to a feedback loop as this more expensive processing can lead to further packet loss.

  2. Interleaving of packets from different heaps. This is always supported, but to a bounded degree so that lost packets don't lead to heaps being kept around indefinitely in the hope that the packet may arrive. The :pymax_heaps attribute of :pyspead2.recv.StreamConfig determines the amount of overlap allowed: once a packet in heap n is observed, it is assumed that heap n − max_heaps is complete. When there are many producers it will likely be necessary to increase this value. Larger values increase the memory usage for partial heaps, and have a small performance impact.

    It's possible to get more predictable results when the producers interleave their heap cnts (for example, by using :pyspead2.send.Stream.set_cnt_sequence) such that the remainder when dividing the heap cnt by the number of producers identifies the producer. In this case, set the :pysubstreams attribute of :pyspead2.recv.StreamConfig to the number of producers. Note that :pymax_heaps applies separately to each producer, and can usually be very low (1 or 2) if the producer sends one heap at a time.

Memory allocators

To allow for performance tuning, it is possible to use an alternative memory allocator for heap payloads. A few allocator classes are provided; new classes must currently be written in C++. The default (which is also the base class for all allocators) is :pyspead2.MemoryAllocator, which has no constructor arguments or methods. An alternative is :pyspead2.MmapAllocator.

The most important custom allocator is :pyspead2.MemoryPool. It allocates from a pool, rather than directly from the system. This can lead to significant performance improvements when the allocations are large enough that the C library allocator does not recycle the memory itself, but instead requests memory from the kernel.

A memory pool has a range of sizes that it will handle from its pool, by allocating the upper bound size. Thus, setting too wide a range will waste memory, while setting too narrow a range will prevent the memory pool from being used at all. A memory pool is best suited for cases where the heaps are all roughly the same size.

A memory pool can optionally use a background task (scheduled onto a thread pool) to replenish the pool when it gets low. This is useful when heaps are being captured and stored indefinitely rather than processed and released.

Incomplete Heaps

By default, an incomplete heap (one for which some but not all of the packets were received) is simply dropped and a warning is printed. Advanced users might need finer control, such as recording metrics about the number of these heaps. To do so, set contiguous_only to False in the :py~spead2.recv.RingStreamConfig. The stream will then yield instances of :py.IncompleteHeap.

Statistics

Refer to recv-stats for general information about statistics.

Additional statistics are available on the ringbuffer underlying the stream (~spead2.recv.Stream.ringbuffer property), with similar caveats about synchronisation.

The :pyspead2.recv.stream_stat_indices module contains constants for indices that can be used to retrieve core statistics by index, without needing to look up the index.

Explicit start

When using multiple readers with a stream or multiple streams, it is sometimes desirable to have them all begin listening to the network at the same time, so that their data can be matched up. Adding readers can be slow (mostly due to the cost of allocating buffers), so when adding multiple readers serially, they will start listening at very different times.

If one sets the explicit_start parameter to spead2.recv.StreamConfig to true, then adding a reader will do the expensive work of allocating buffers, but will not start it listening to the network. That is done by calling spead2.recv.Stream.start. This will still iterate serially over the readers, so they will not start listening at exactly the same time, but the skew will be much smaller because the operation is much more light-weight.

When this feature is in use, no readers can be added to a stream after calling ~spead2.recv.Stream.start (doing so will raise an exception). This gives the implementation a hint that adding readers cannot happen concurrently with packets arriving. At present the implementation does not take advantage of this assumption, but that is subject to change in future versions of spead2.