Skip to content

Accessing data from within a kernel

Jonathan Beard edited this page Apr 5, 2021 · 17 revisions

How to use ports

So you've built a compute kernel, how do you access information from the external environment? The answer lies in ports. Ports are implemented in many different ways, the run-time handles that. The thing to remember is that you access data in a "first in, first out" (FIFO) manner. This means that all you, as a programmer, have to worry about is pulling data in from "input" ports and writing data to "output" ports. Once you write data it is gone to your kernel, and released to the downstream compute kernel.

There are two basic methods at play here, those that are zero copy and those that aren't. There's often a trade-off between the number of instructions required and the benefit gained from not moving data around. For things like fundamental types (or things that fit within a single cache line) the two copy methods are probably faster. Otherwise the single copy methods will be much more efficient.

The zero copy methods generally consist of allocates and sends. If you call allocate on an "object" type, the function will properly call the constructor to initialize any internal data members so that the returned memory is fully ready to use.

   /**
    * allocate - returns a reference to a writeable 
    * member at the tail of the FIFO.  You must have 
    * a subsequent call to send() in order to release
    * this object to the FIFO once it is written.
    * If the user needs to de-allocate the memory without
    * using it, they can call the deallocate function.
    * @return  T&
    */
   template < class T > T& allocate();
   template < class T > T& allocate( Args&&... params );

Analog to these two there are the allocate_s methods which are slightly different. Instead of returning a reference to the allocated memory, they return an object. To access the memory returned via the allocate_s call the user must de-reference the object. The advantage of using this allocate function is brevity. As the object created exists the scope it exists in, it is released to the downstream port. Link to info on auto-release objects link.

   /**
    * allocate_s - "auto-release" version of allocate,
    * where the action of pushing the memory allocated
    * to the consumer is handled by the returned object
    * exiting the calling stack frame. There are two functions
    * here, one that uses the object constructor type. This
    * one is for object types.
    * @return autorelease< T, allocatetype >
    */
   template < class T > autorelease_obj allocate_s();
   template < class T > autorelease_obj allocate_s( Args&&... params );

So what happens if you allocate memory that you don't end up using. Enter the deallocate function, it returns the memory to the free pool so that it can be used elsewhere. It is quite important that you call this function rather than holding on to the memory since the queue cannot be resized while a kernel has a "hold" on an allocation.

   /**
    * deallocate - call if you've decided that you want to release
    * the memory called via any of the above deallocate calls 
    * without sending it to the downstream kernel. Fails silently
    * if allocate wasn't called in the first place.
    */
   void deallocate();

Now we're on to the non-zero copy functions. For small transfers, this is the way to go. With the push family of functions, a reference is pushed to the queue. Object types must have a copy constructor defined, fundamental types need no further action.

   /**
    * push - function which takes an object of type T and a 
    * signal, makes a copy of the object using the copy 
    * constructor and passes it to the FIFO along with the
    * signal which is guaranteed to be delivered at the 
    * same time as the object (if of course the receiving 
    * object is responding to signals). There are specific
    * templates for object types and fundamental types to
    * optimize the logic for each (i.e. calling constructors
    * vs. simply copy).
    * @param   item -  T&
    * @param   signal -  raft::signal, default raft::none
    */
   template < class T, void push( const T &item, 
                                  const raft::signal signal = raft::none )
   template < class T, void push( const T &&item, 
                                  const raft::signal signal = raft::none )

There are many cases when you might need to transfer a C++ container to another kernel. To do so, you must transfer the items contained in it downstream. The way to do that is through the insert method which uses standard C++ iterator semantics.

   /**
    * insert - inserts the range from begin to end in the FIFO,
    * blocks until space is available.  If the range is greater
    * than the space available it'll simply block and add items
    * as space becomes available.  There is the implicit assumption
    * that another thread is consuming the data, so eventually there 
    * will be room.
    * @param   begin - iterator_type, iterator to begin of range
    * @param   end   - iterator_type, iterator to end of range
    * @param   signal - raft::signal, default raft::none
    */
   template< class iterator_type >
   void insert(   iterator_type begin,
                  iterator_type end,
                  const raft::signal signal = raft::none )

Now there is the issue of getting data from the input side. The simplest (single copy) way involves passing a reference to the pop function. The user also has the option (as in fact they do with many of the functions) of passing a pointer to a signal element so that asynchronous signals might be received. With all the pop methods, once you call them, the memory is yours to do with as you please. No further action is needed.

   /**
    * pop - pops the head of the queue.  If the receiving
    * object wants to watch use the signal, then the signal
    * parameter should not be null.
    * @param   item - T&
    * @param   signal - raft::signal
    */
   template< class T > void pop( T &item, raft::signal *signal = nullptr )

To pop a range of elements, the pop_range function returns n items from the input stream and places them in the std::vector that the user passes via a parameter. The user must pre-allocate the fifo to the correct size. As in std::vector x( size ).

   /**
    * pop_range - pops n_items from the buffer into the 
    * std::vector pointed to by pop_range.  There are 
    * two different ways this function could operate,
    * either with a push_back type semantic which would 
    * mean three copies or dealing with a pre-allocated
    * vector.  This function assumes that the user has
    * allocated a vector with the correct size (= n_items).
    * @param   items    - std::vector< std::pair< T, raft::signal > >& 
    * @param   n_items  - std::size_t
    */
   template< class T > void pop_range( pop_range_t< T >  &items, const std::size_t n_items )

When designing RaftLib, I knew I wanted a zero copy return as well. The only way to effectively do that is to provide a peek function. The only slight issue with a peek function, and a stream that automatically optimizes itself, is that the user must essentially "unpeek" the element before the queue can manage itself. The peek function call returns a reference to the head of the stream so that the user can operate on it.

   /**
    * peek - returns a reference to the head of the
    * queue.  unpeek() must be called after this to 
    * tell the runtime that the reference is no longer
    * being used. This function will block on an empty
    * queue, so a call to size() should be done prior 
    * to calling this function if the user wishes non-blocking
    * behavior. 
    * @param   signal - raft::signal, default: nullptr
    * @return T&
    */
   template< class T > T& peek( raft::signal *signal = nullptr );

There is also a peek_range function which enables the user to access multiple items from the incoming stream while keeping them "valid" if they so choose. Link to info on the auto-release object returned link.

   /**
    * peek_range - analogous to peek, only the user gets
    * a list of items.  unpeek() must be called after
    * using this function to let the runtime know that
    * the user is done with the references.
    * @ n - const std::size_t, number of items to peek
    * @return - std::vector< std::reference_wrapper< T > >
    */
   template< class T > autorelease_obj  peek_range( const std::size_t n );

The unpeek call can be called after any of the peek functions if the user wishes to keep the memory in a valid state. This can be quite useful for maintaining state within the kernel.

   /**
    * unpeek - call after peek to let the runtime know that 
    * all references to the returned value are no longer in
    * use. Keeps the memory location in a valid state for that
    * input port, i.e., it doesn't release the memory location
    * so that the next call to peek will return the same 
    * exact location. A call to recycle will release the memory,
    * or invalidate it.
    */
   virtual void unpeek();

If you're done with the data on the input stream, either because you've peaked it and you're done or you want to skip specific elements, this is the way to do it. The function is quite low overhead for fundamental types, for objects, it will call the requisite destructor (if the element wasn't pushed to the output queue) before releasing the memory. If you've made a push call to an output port using a reference from a peek call from the input port then you can still use this function safely. The memory will be returned to the input queue and transferred to the output queue.

   /** 
    * recycle - so you want to ignore some items from the
    * input stream without ever even looking at them?
    * This is the function for you. It is also used with the
    * peek call in order to invalidate or free memory to the
    * queue so that the next peek or pop operation will see a
    * a different location. This function should always be called
    * just in case you don't use the item again. If you've pushed
    * items from a peek call, the recycle call will be ignored
    * so that the same memory allocation can be used down stream
    * (NOTE: this is only for object types, for primitive types or
    * items less than or equal to a single cache line in size then
    * then the item is simply copied, as in by value to the down
    * stream queue). Bottom line, if you want to remove items and
    * ignore the item from the input queue in the most efficient way, 
    * use this function, it's totally safe.
    * @param   range - const std::size_t
    */
   void recycle( const std::size_t range = 1 )

Checking capacity on ports

Sometimes it is necessary to see if a port would block when attempting to allocate or push, there are several helper functions that will enable the programmer to check before calling blocking functions. These functions are:

Size

   /**
    * size - returns the current size of this FIFO
    * @return  std::size_t
    */
   virtual std::size_t size() = 0;

Space_Avail

   /**
    * space_avail - convenience function to get the current
    * space available in the FIFO, could otherwise be calculated
    * by taking the capacity() - size().
    * @return std::size_t
    */
   virtual std::size_t space_avail() = 0;

Capacity

   /**
    * capacity - returns the set maximum capacity of the 
    * FIFO.
    * @return std::size_t
    */
   virtual std::size_t capacity() = 0;

Choosing between ports

Often the programmer wants to select amongst a set of incoming ports to pop data from the ports only with data or just the first port that has data. To facilitate this type of transaction, RaftLib implements a select statement.

/**
 * within a kernel, e.g., within the run function, the programmer
 * wishes to select amongst several input ports, the return value 
 * is a raft::select_t object which is described in the text containing both the
 * count of available input data elements and the port on which those
 * elements can be found.
 */
auto ret_val = raft::select::in( input, "x_1", "x_2", "x_3" );

The _select_t type returned by raft::select::in contains a count (as the first element), and a std::reference to the FIFO object that contains those elements. The raft::select::in function randomly returns a full port, keeping the programmer from having to ensure they keep track of which port the data came from (in most cases) given the function itself provides a relatively uniform return.

When accessing ret_val the programmer (as an example) could do the following:

/**
 * check ret_val.first for the length, if greater than zero, 
 * proceed.
 */
if( ret_val.first > 0 )
{
   type_t x;
   /** 
    * don't forget the "get()" that must be added on 
    * the second element of the raft::select_t object
    * since it is a reference_wrapper we must do so until
    * we get a dot overload, which hopefully will be soon
    * for this. Doesn't necessarily need a pop, any port
    * operation for input will do.
    */
   ret_val.second.get().pop( x );
   output[ "y_1" ].push( x ); 
}