AD: Data Type, Caps, and Flow Design
Note: This document has been moved here.
Rank counting with other/tensor types
All NNStreamer's GStreamer data types as pad capabilities (other/tensor*
) have the following common rules
- In each buffer, there is only ONE frame. That is for each buffer, with the type of
other/tensor
, there is only one instance of tensor for each buffer at any time. There cannot be multiple tensors in each buffer. - The data types do not hold data semantics. Filters should NOT try to determine data semantics (e.g., is it a video?) dynamically based solely on the dimensions, framerates, or element types of the data types. However, if a filter has additional information available including property values from pipeline developers or users, a filter may determine data semantics. For example,
tensor_decoder
transformsother/tensor
stream intovideo/x-raw
ortext/x-raw
depending on the property values.
The GST pad capability has the following structure:
other/tensor
framerate: (fraction) [ 0/1, 2147483647/1 ]
# We are still unsure how to handle framerate w/ filters.
dimension: (string with int:int:int:int) [1, 65535]:[1, 65535]:[1, 65535]:[1, 65535]
# We support up to 4th dimensions only. Supporting higher arbitrary dimension is TBD item.
type: (string) { uint8, int8, uint16, int16, uint32, int32, uint64, int64, float32, float64 }
The buffer with offset 0 looks like the following with dim1=2, dim2=2, dim3=2, dim4=2
:
dim4 | 0 | 1 | ||||||||||||||
dim3 | 0 | 1 | 0 | 1 | ||||||||||||
dim2 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | ||||||||
dim1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 |
offset/type=uint8 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 |
offset/type=uint16 | 0 | 2 | 4 | 6 | 8 | 10 | 12 | 14 | 16 | 18 | 20 | 22 | 24 | 26 | 28 | 30 |
Therefore, array in C corresponding to the buffer of a other/tensor
becomes:
type buffer[dim4][dim3][dim2][dim1];
Note that in some context (such as properties of tensor_*
plugins), dimensions are described in a colon-separated string that allows omitting unused dimensions:
dim1:dim2:dim3:dim4
If rank = 2 (dim3 = 1 and dim4 = 1), then, it can be expressed as well as:
dim1:dim2
Be careful! Colon-separated tensor dimension expression has the opposite order to the C-array type expression.
other/tensors
is defined to have multiple instances of other/tensor
in a buffer of a single stream path. Compared to having multiple streams (thus multiple pads) with other/tensor
that goes to or comes from a single element, having a single stream with other/tensors
has the following advantages:
-
tensor_filter
, which is the main engine to communicate deep neural network frameworks and models, becomes simple and robust. With neural network models requiring multiple input tensors, if the input tensor streams are not fully synchronized, we need to somehow synchronize them and provide all input tensors at the same time to the model. Hereby, being fully synchronized means that the streams should provide new data simultaneously, which requires to have the exactly same framerate and timing, which is mostly impossible. Withother/tensors
and single input and output streams fortensor_filter
, we can delegate the responsibilities of synchronization to GStreamer and its basic plugins, who are extremely good at such tasks. - During transmissions on a stream pipeline, passing through various stream filters, we can guarantee that the same set of input tensors are being processed without the worries of synchronizations after the point of merging or muxing.
The GStreamer pad capability of other/tensors
is as follows:
other/tensors
num_tensors = (int) [1, 16] # GST_MAX_MEMCHUNK_PER_BUFFER
framerate = (fraction) [0/1, 2147483647/1]
types = (string) Typestrings
dimensions = (string) Dimensions
Typestrings = (string) Typestring
| (string) TypeString, TypeStrings
Typestring = (string) { float32, float64, int64, uint64, int32, uint32, int16, uint16, int8, uint8 }
Dimensions = (string) Dimension
| (string) Dimension, Dimensions
Dimension = (string) [1-65535]:[1-65535]:[1-65535]:[1-65535]
The buffer of other/tensors
streams have multiple memory chunks. Each memory chunk represents one single tensor in the buffer format of other/tensor
. With default configurations of Gstreamer 1.0, the maximum allowed number of memory chunks in a buffer is 16. Thus, with such configurations of Gstreamer 1.0, other/tensors
may include up to 16 other/tensor
.
other/tensorsave
, along with its typefind
definition, is defined to enable to save other/tensors
streams as files and load such files are other/tensors
stream. With the definitions of headers defined with other/tensorsave
, GStreamer can decode the given file and determine that the file belongs to other/tensorsave
.
The detailed description of the file format is at Design External Save Format for other/tensor and other/tensors Stream for TypeFind
We have the following principles for timestamp policies. Timestamping policies of tensor_*
filters should follow the given principles.
- Timestamp from the input source (sensors) should be preserved to sink elements.
- When there are multiple flows merging into one (an element with multiple sink pads), a timestamp of the most recent should be preserved.
- For the current frame buffer in a sink pad,
i
in1 ... n
,FB(i)
, andT(FB(i))
is the timestamp of the current frame buffer, the timestamp of the corresponding frame buffer at source pads generated byFB(1)
...FB(n)
ismax(i = 1 .. n, T(FB(i)))
, where larger timestamp value means the more recent event. - Example: when multiple frames of the same stream are aggregated by
tensor_aggregator
, according to this principle, the timestamp of the last frame of the aggregated frames is used for output. - Note that this principle might cause confusion when we apply
tensor_demux
ortensor_split
, where we may extract some "old" frames from incoming aggregated frames. However, as a single frame in a GStreamer stream has a single timestamp, we are going to ignore it for now.
- For the current frame buffer in a sink pad,
Besides timestamping, we have additional synchronization issues when there are merging streams. We need to determine which frames are going to be merged (or muxed) when we have multiple available and unused frames in an incoming sink pad. In general, we might say that the synchronization of frames determines which frames to be used for mux/merge and timestamping rule determins which timestamp to be used among the chosen frames for mux/merge.
In principle and by default,
- If there are mutliple unused and available frames in a sink pad, unlike most media filters,
In some usage cases, we may need to drop frames from a queue. With the timestamp values of frames, a queue may drop frames with different policies according to GStreamer applications. Such policies include:
- Drop more recent frames, keep older frames
- Drop older frames, keep newer frames
Note that in the case of many multi-modal neural networks, mux/merge elements are supposed to drop any older frames with incoming frames in the incoming (sink pad) queue.
This is an obvious case. The timestamp is copied to all source pads from the sink pads. We do not preserve original timestamps in Merge or Mux; thus, the processed timestamp after Mux or Merge will only be applied.
Unlike mux and merge, aggregator merges tensors chronologically, not spatially. Moreover, unlike mux and merge, which merges entries into one entry, aggregator, depending on the properties, may divide or even simultaneously merge and divide entries. Thus, timestamping and synchronization may become much more complicated.
In general, tensor_* chooses the most recent timestamp when there are multiple candidates. For example, if we are merging/muxing/aggregating two frames from sinkpads, T
and T+a
, where a > 0
, the source pads are supposed to have T+a
.