Pescador's primary goal is to provide fine-grained control over data streaming and sampling. These problems can get complex quickly, so this section provides an overview of the concepts underlying Pescador's design, and a summary of the provided functionality.
To understand what pescador does, it will help to establish some common terminology. If you're not already familiar with Python's iterator and generator concepts, here's a quick synopsis:
- An iterator is an object that produces a sequence of data, i.e. via
__next__
/next()
. - An iterable is an object that can produce iterators, i.e. via
__iter__
/iter()
.- See: iterable definition
- A generator (or more precisely generator function) is a callable object that returns a single iterator.
- See: generator definition
- Pescador defines a stream as the sequence of objects produced by an iterator.
- For example:
range
is an iterable functionrange(8)
is an iterable, and its iterator produces the stream:0, 1, 2, 3, ...
- Pescador defines an object called a Streamer for the purposes of (re)creating iterators indefinitely and (optionally) interrupting them prematurely.
- Streamer implements the iterable interface, and can be iterated directly.
- A Streamer can be initialized with one of two types:
- Any iterable type, e.g.
range(7)
,['foo', 'bar']
,"abcdef"
, or anotherStreamer
- A generator function and its arguments + keyword arguments.
- Any iterable type, e.g.
A Streamer transparently yields the data stream flowing through it
- A Streamer should not modify objects in its stream.
- In the spirit of encapsulation, the modification of data streams is achieved through separate functionality (see
processing-data-streams
)
- Pescador defines a family of multiplexer or Mux classes for the purposes of multiplexing streams of data. For stochastic sampling applications, ShuffledMux and StochasticMux are the most useful classes.
- BaseMux inherits from Streamer, which makes all muxes both iterable and recomposable. Muxes allow you to construct arbitrary trees of data streams. This is useful for hierarchical sampling.
- Muxes are initialized with a container of one or more streamers, and parameters to control the mux's sampling behavior..
- As a subclass of Streamer, a Mux also transparently yields the stream flowing through it, i.e.
streaming-data
.
Pescador adopts the concept of "transformers" for processing data streams.
- A transformer takes as input a single object in the stream.
- A transformer yields an object.
- Transformers are iterators, i.e. implement a __next__ method, to preserve iteration.
- An example of a built-in transformer is enumerate [ref]