diff --git a/api/data-elements.md b/api/data-elements.md index aa64334..110fd5b 100644 --- a/api/data-elements.md +++ b/api/data-elements.md @@ -3,4 +3,95 @@ uid: data-elements title: Data Elements --- -Data elements are produced by Bonsai operators. \ No newline at end of file +Data elements are produced by Bonsai operators. These +pages contain information data elements that can help interpret and load data +produced by . + +In general, a data element comprises of properties which together contain +timestamped data from a particular device. For example, + outputs +[Bno055DataFrames](xref:OpenEphys.Onix1.Bno055DataFrame) which contains data +produced by a BNO055 device: +- The data produced by the BNO055 is contained in the Acceleration, + Calibration, EulerAngle, Gravity, and Temperature properties of the + Bno055DataFrame. Any of these properties can be individually selected and + [visualized](xref:visualize-data) in Bonsai. +- The property contains the precise + hardware timestamp for the data in the properties described in the first + bullet point created using the global ONIX Controller clock. This Clock + property can be used to sync BNO055 data with data from all other devices + from which ONIX is acquiring and put it all onto the same timeline. +- The property contains the precise + hardware timestamp created using the clock on the hardware that contains + the device. + +There are some exceptions to the pattern described above. For example: +- is an object passed through the + configuration chain for writing to and reading from the ONIX hardware. +- outputs the parameters used to + set the precise hardware output clock when the workflow starts. + +These pages also describe the type of each property. This type information can +be used to calculate the rate of data produced by the devices enabled in your +experiment. For example, the operator +(which outputs the data from a single Neuropixels 2.0 probe device) produces a +sequence of +[NeuropixelsV2eDataFrames](xref:OpenEphys.Onix1.NeuropixelsV2eDataFrame). Using +the fact that each sample comprises of a Clock property (8 bytes), a HubClock +property (8 bytes), and an AmplifierData property (384*2 bytes), this device's +data rate is: + +$$ +\begin{equation} + \frac{2*384+8+8\,bytes}{sample}*\frac{30,000\,samples}{s}*\frac{1\,MB}{10^6bytes} = 23.52\,MB/s + \label{eq:1x_npx2_bw} +\end{equation} +$$ + +NeuropixelsV2eDataFrame is actually a buffered data frame (as indicated by the +presence of NeuropixelsV2eData's BufferSize property), meaning that several data +samples and their timestamps are buffered into a single NeuropixelsV2eDataFrame. +The above calculation was calculated under the assumption that +NeuropixelsV2eData's BufferSize property is set to 1. Although the calculation +is slightly different when BufferSize is more than 1, the end result ends up +being the same. When BufferSize is more than 1, NeuropixelsV2eDataFrames are +produced at a rate 30 kHz divided by the value of BufferSize. Each +NeuropixelsV2eDataFrame comprises of: + +- a Clock property: an array of ulong (each 8 bytes) of length N +- a HubClock property: an array of ulong (each 8 bytes) of length N +- an AmplifierData property: a of ushort (each 2 bytes) + of size 384 x N + +where N is a stand-in for BufferSize. Therefore, the calculation becomes: + +$$ +\begin{equation} + \frac{(2*384+8+8)*N\,bytes}{sample}*\frac{30,000/N\,samples}{s}*\frac{1\,MB}{10^6bytes} = 23.52\,MB/s + \label{eq:1x_npx2_bw_buffersize} +\end{equation} +$$ + +N cancels out and the result is the same. + +Knowing the type of each property can also be helpful in two more ways: + +- A property's type indicates how a property can be used in Bonsai. Operators + typically accept only a specific type or set of types as inputs. When types + don't match, Bonsai indicates an error. +- If a property is saved using a (i.e. as a raw + binary file), knowing its type informs how to load the data. For example, + the [dtypes](https://numpy.org/doc/stable/reference/arrays.dtypes.html) in + our [example Breakout Board data-loading script](xref:breakout_load-data) + were selected according to the size of each data being saved. For example, + digital input clock samples are saved using 8 bytes which requires + `dt=np.uint64` when loading, and digital input pin samples are saved using a + single byte which requires `dt=np.uint8` when loading. + + diff --git a/articles/tutorials/tune-readsize.md b/articles/tutorials/tune-readsize.md index 95a3335..086f6c7 100644 --- a/articles/tutorials/tune-readsize.md +++ b/articles/tutorials/tune-readsize.md @@ -3,83 +3,115 @@ uid: tune-readsize title: Optimizing Closed Loop Performance --- -This tutorial shows how to optimize the -'s - property for your specific data -acquisition setup to minimize delays between data collection and computer -processing. This tutorial provides a method to tune `ReadSize` within the -context of your particular data sources and computer specifications in order to -achieve the fastest possible response times for closed-loop experiments. In most -situations, sub-200 microsecond closed-loop response times can be achieved. +This tutorial shows how to retrieve data from the ONIX hardware as quickly as +possible for experiments with strict low-latency closed-loop requirements by +tuning the workflow for your particular data sources and computer +specifications. In most situations, sub-200 microsecond closed-loop response +times can be achieved. > [!NOTE] > Performance will vary based on your computer's capabilities and your results -> might differ those presented below. The computer used to create this tutorial +> might differ from those presented below. The computer used to create this tutorial > has the following specs: +> > - CPU: Intel i9-12900K > - RAM: 64 GB > - GPU: NVIDIA GTX 1070 8GB > - OS: Windows 11 -## Hardware Buffer and ReadSize - -The ONIX **Hardware Buffer** consists of 2GB of dedicated RAM -that belongs to the acquisition hardware (it is _not_ RAM in the host computer). -The hardware buffer temporarily stores data that has not yet been transferred to -the host. When the host software is consuming data optimally, the hardware -buffer is bypassed entirely and data flows directly from production to the host -computer's RAM, minimizing the latency between data collection and processing. - -Each time the host software reads data from the hardware, it obtains -**ReadSize** bytes of data using the following procedure: - -1. A block of memory that is `ReadSize` bytes long is allocated on the host by - the API for the purpose of holding incoming data from ONIX hardware. -2. A pointer to that memory is provided to the - [RIFFA](https://open-ephys.github.io/ONI/v1.0/api/liboni/driver-translators/riffa.html) - driver (the PCIe backend of the ONIX system) which moves the allocated memory - block into a more privileged state known as kernel mode so that it can - initiate a - [DMA transfer](https://en.wikipedia.org/wiki/Direct_memory_access). DMA - allows data transfer to be performed by ONIX hardware without additional CPU - intervention. -3. The data transfer completes once `ReadSize` bytes have been produced. The - RIFFA driver moves the memory block from kernel mode to user mode so that it - can be accessed by software. The API function returns with a pointer to the - filled buffer. - -The key take-away points about this process are: - -1. Memory is allocated only once by the API, and the transfer is - [zero-copy](https://en.wikipedia.org/wiki/Zero-copy). ONIX hardware writes - directly into the API-allocated buffer autonomously using minimal resources - from the host computer. Within this process, `ReadSize` determines the amount - of data that is transferred each time the API reads data from the hardware. -2. If the buffer is allocated and the transfer is initiated by the host API - before data is produced by the hardware, the data is transferred directly - into the software buffer and completely bypasses the Hardware Buffer. In this - case, hardware is literally streaming data to the software buffer _the moment - it is produced_. It is physically impossible to achieve lower latencies than - this situation. The goal of this tutorial is to allow your system to operate - in this regime. - -The size of hardware-to-host data transfers is determined by the - property of the -StartAcquisition operator which is in every workflow that uses - to acquire data from ONIX. Choosing an optimal `ReadSize` -value balances the tradeoff between latency and overall bandwidth. Smaller -`ReadSize` values mean that less data is required before the RIFFA driver -relinquishes control of the buffer to software. This, in effect, means software -can start operating on data closer to the time that the data was produced, and -thus lower-latency feedback loops can be achieved. However, each data transfer -requires calls to the RIFFA driver which incurs significant overhead. If -`ReadSize` is so low that it takes less time for the hardware to produce a -`ReadSize` amount of data than the average time it takes for the host computer -to read a `ReadSize` amount of data, the Hardware Buffer will excessively -accumulate data. This will destroy real-time performance and eventually cause -the hardware buffer to overflow, terminating acquisition. +## Data Transmission from ONIX Hardware to Host Computer + +ONIX is capable of transferring data directly from production to the +host computer. However, if the host is busy when ONIX starts +producing data, ONIX will temporarily store this new data in its hardware buffer +while it waits for the host to be ready to accept new data. + +Key details about this process: + +- The size of hardware-to-host data transfers is determined by the + property of the + operator which is in every Bonsai + workflow that uses to acquire data from ONIX. +- Increasing `ReadSize` allows the host to read larger chunks of data from + ONIX per read operation without significantly increasing the duration of the + read operation, therefore increasing the maximum rate at which data can be + read. +- If the host is busy or cannot perform read operations rapidly enough to keep + up with the rate at which ONIX produces data, the ONIX hardware buffer will + start to accumulate excessive data. +- Accumulation of excess data in the hardware buffer collapses real-time + performance and risks hardware buffer overflow which would prematurely + terminate the acquisition session. `ReadSize` can be increased to avoid this + situation. +- As long as this situation is avoided, decreasing `ReadSize` means that ONIX + doesn't needs to produce as much data before the host can access it. This, + in effect, means software can start operating on data closer to the time + that the data was produced, thus achieving lower-latency feedback-loops. + +In other words, a small `ReadSize` can help the host access data sooner to when +that data was created. However, each data transfer incurs overhead. If +`ReadSize` is so small that ONIX produces a `ReadSize` amount of data faster +than the average time it takes the host computer to perform a read operation, +the hardware buffer will accumulate excessive data. This will destroy real-time +performance and eventually cause the hardware buffer to overflow, terminating +acquisition. The goal of this tutorial is to tune StartAcquisition's `ReadSize` +so that data flows from production to the software running on the host as +quickly as possible by minimizing the amount of time that it sits idly in both +the ONIX hardware buffer and the API-allocated buffer. This provides software +access to the data as close to when the data was produced as possible which +helps achieve lower latencies closed-loop feedback. + +### Technical Details + +> [!NOTE] +> This section explains more in-depth how data is transferred from ONIX to the +> host computer. Although these details provide additional context about ONIX, +> they are more technical and are not required for following the rest of the +> tutorial. + +When the host computer reads data from the ONIX +hardware, it retrieves a **ReadSize**-bytes sized chunk of data using the +following procedure: + +1. A `ReadSize`-bytes long block of memory is allocated on the host computer's + RAM by the host API for the purpose of holding incoming data from ONIX. +1. A pointer to that memory is provided to the + [RIFFA](https://open-ephys.github.io/ONI/v1.0/api/liboni/driver-translators/riffa.html) + driver (the PCIe backend/kernel driver for the ONIX system) which moves the + allocated memory block into a more privileged state known as kernel mode so + that it can initiate a [DMA + transfer](https://en.wikipedia.org/wiki/Direct_memory_access). DMA allows + data transfer to be performed by ONIX hardware without additional CPU + intervention. +1. The data transfer completes once this block of data has been populated with + `ReadSize` bytes of data from ONIX. +1. The RIFFA driver moves the memory block from kernel mode to user mode so + that it can be accessed by software. The API function returns with a pointer + to the filled buffer. + +During this process, memory is allocated only once by the API, and the transfer +is [zero-copy](https://en.wikipedia.org/wiki/Zero-copy). The API-allocated +buffer is written autonomously by ONIX hardware using minimal resources from +the host computer. + +So far, all this occurs on the host-side. Meanwhile, on the ONIX-side: + +- If ONIX produces new data before the host is able to consume the data in the + API-allocated buffer, this new data is added to the back of ONIX hardware + buffer FIFO. The ONIX hardware buffer consists of 2GB of RAM that belongs to + the acquisition hardware (it is _not_ RAM in the host computer) dedicated to + temporarily storing data that is waiting to be transferred to the host. Data + is removed from front of the hardware buffer and transferred to the host + once it's ready to accept more data. +- If the memory is allocated on the host-side and the data transfer is + initiated by the host API before any data is produced, ONIX transfers new + data directly to the host bypassing the hardware buffer. In this case, ONIX + is literally streaming data to the host _the moment it is produced_. This + data becomes available for reading by the host once ONIX transfers the full + `ReadSize` bytes. ## Tuning `ReadSize` to Optimize Closed Loop Performance + ONIX provides a mechanism for tuning the value of `ReadSize` to optimize closed loop performance that takes into account the idiosyncrasies of your host computer and experimental acquisition setup. @@ -90,9 +122,9 @@ computer and experimental acquisition setup. > Bonsai environment and familiarize yourself with using the library to acquire > data from ONIX before proceeding. -Copy the following workflow into the Bonsai workflow editor by hovering over +Copy the following workflow into the Bonsai workflow editor by hovering over the workflow image and clicking on the clipboard icon that appears. Open Bonsai and -paste this workflow by clicking the Bonsai workflow editor pane and hitting +paste this workflow by clicking the Bonsai workflow editor pane and pressing Ctrl+V. ::: workflow @@ -100,11 +132,12 @@ paste this workflow by clicking the Bonsai workflow editor pane and hitting ::: ### Hardware Configuration + The top-row configuration chain includes a operator. This configures ONIX's Load Tester Device, which produces and consumes data at user-specified rates for testing and tuning the latency between data production and real-time feedback. -This device is _not a emulator_. It is a real hardware device that produces and +This device is _not an emulator_. It is a real hardware device that produces and consumes data using the selected driver and physical link (e.g. PCIe bus) and thus provides accurate measurements of feedback performance for a given host computer. @@ -126,43 +159,47 @@ $$ \end{equation} $$ +To understand how we came up with this calculation, visit the + page. + We'll setup `ConfigureLoadTester` to produce data at the same frequency and bandwidth as two Neuropixels 2.0 probes with the following settings: screenshot of ConfigureLoadTester's property editor -- `DeviceAddress` is set to 11 because that's how this device is indexed in the - ONIX system. -- `DeviceName` is set to "Load Tester" -- `Enable` is set to True to enable the LoadTester device. -- `FramesPerSecond` is then set to 60,000 Hz. The rate at which frames are - produced by two probes. -- `ReceivedWords` is set to 392 bytes, the size of a single - . -- `TransmittedWords` is set to 100 bytes. This simulates the amount of data - required to e.g. send a stimulus waveform. +- `DeviceAddress` is set to 11 because that's how this device is indexed in + the ONIX system. +- `DeviceName` is set to "Load Tester" +- `Enable` is set to True to enable the LoadTester device. +- `FramesPerSecond` is then set to 60,000 Hz. The rate at which frames are + produced by two probes, since each is acquired independently. +- `ReceivedWords` is set to 392 bytes, the size of a single + including its clock members. +- `TransmittedWords` is set to 100 bytes. This simulates the amount of data + required to e.g. send a stimulus waveform. > [!NOTE] > The `DeviceAddress` must be manually configured because > is used for diagnostics and testing > and therefore is not made available through > like the rest of the local -> devices (analog IO, digital IO, etc.) +> devices (analog IO, digital IO, etc.). The device address can be found using +> [oni-repl](https://open-ephys.github.io/onix-docs/Software%20Guide/oni-repl/usage.html#repl-commands). Next we configure 's - - properties. `WriteSize` is set -to 16384 bytes. This defines a readily-available pool of memory for the creation -of output data frames. A larger size will reduce the frequency of dynamic memory -allocation system calls but increase the expense of each of those calls. The -effect on real-time performance is typically not as large as that of the -`ReadSize` property because it does not determine when data is written to -hardware. Data is written to hardware as soon as an output frame has been -created. To start, we also set the `ReadSize` property is also set to 16384. -Later in this tutorial, we'll examine the effect of this value on real-time -performance. + and + properties. + +`WriteSize` is set to 16384 bytes. This defines a readily-available pool of +memory for the creation of output data frames. Data is written to hardware as +soon as an output frame has been created, so the effect on real-time performance +is typically not as large as that of the `ReadSize` property. + +To start,`ReadSize` is also set to 16384. Later in this tutorial, we'll examine +the effect of this value on real-time performance. ### Real-time Loop + The bottom half of the workflow is used to stream data back to the load testing device from hardware so that it can perform a measurement of round trip latency. The operator acquires a sequence of @@ -177,23 +214,23 @@ hardware each of which is split into its The `HubClock` member indicates the acquisition clock count when the `LoadTesterDataFrame` was produced. The `EveryNth` operator is a - operator which only allows through every Nth -element in the observable sequence. This is used to simulate an algorithm, such + operator which only allows through every Nth +element in the observable sequence. This is used to simulate an algorithm, such as spike detection, that only triggers closed loop feedback in response to input -data meeting some condition. The value of `N` can be changed to simulate +data meeting some condition. The value of `N` can be changed to simulate different feedback frequencies. You can inspect its logic by double-clicking the node when the workflow is not running. In this case, `N` is set to 100, so every 100th sample is delivered to . -`LoadTesterLoopback` is a *sink* which writes HubClock values it receives back -to the load tester device. When the load tester device receives a HubClock from -the host computer, it's subtracted from the current acquisition clock count. -That difference is sent back to the host computer as the `HubClockDelta` -property of subsequent `LoadTesterDataFrames`. In other words, `HubClockDelta` -indicates the amount of time that has passed since the creation of a frame in -hardware and the receipt of a feedback signal in hardware based on that frame: -it is a complete measurement of closed loop latency. This value is converted to -milliseconds and then is used to to help visualize +`LoadTesterLoopback` is a _sink_ which writes HubClock values it receives back +to the load tester device. When the load tester device receives a HubClock from +the host computer, it's subtracted from the current acquisition clock count. +That difference is sent back to the host computer as the `HubClockDelta` +property of subsequent `LoadTesterDataFrames`. In other words, `HubClockDelta` +indicates the amount of time that has passed since the creation of a frame in +hardware and the receipt of a feedback signal in hardware based on that frame: +it is a complete measurement of closed loop latency. This value is converted to +milliseconds and then is used to help visualize the distribution of closed-loop latencies. Finally, at the bottom of the workflow, a @@ -206,35 +243,60 @@ Memory Monitor](xref:breakout_memory-monitor) page. ![SVG of load tester workflow memorymonitor branch](../../workflows/tutorials/tune-readsize/memory-monitor.bonsai) ::: +### Relevant Visualizers + +The desired output of this workflow are the [visualizers](xref:visualize-data) +for the Histogram1D and PercentUsed nodes. Below is an example of each which we +will explore more in the next section: + +![screenshot of Histogram1D visualizers with `ReadSize` 16384](../../images/tutorials/tune-readsize/histogram1d_16384.webp) +![screenshot of PercentUsed visualizers with `ReadSize` 16384](../../images/tutorials/tune-readsize/percent-used_16384.webp) + +The Histogram1D visualizer shows the distribution of closed-loop feedback +latencies. The x-axis is in units of μs, and the y-axis represents the number of +samples in a particular bin. The histogram is configured to have 1000 bins +between 0 and 1000 μs. For low-latency closed-loop experiments, the goal is to +concentrate the distribution of closed-loop feedback latencies towards 0 μs as +much as possible. + +The PercentUsed visualizer shows a time-series of the amount of the hardware +buffer that is occupied by data as a percentage of the hardware buffer's total +capacity. The x-axis is timestamps, and the y-axis is percentage. To ensure data +is available as soon as possible from when it was produced and avoid potential +buffer overflow, the goal is to maintain the percentage at or near zero. + ### Real-time Latency for Different `ReadSize` Values #### `ReadSize` = 16384 bytes -With `ReadSize` set to 16384 bytes, start the workflow, and -[open the visualizers](xref:visualize-data) for the PercentUsed and Histogram1D -nodes: + +With `ReadSize` set to 16384 bytes, start the workflow and open the visualizers for the PercentUsed and +Histogram1D nodes: ![screenshot of Histogram1D visualizers with `ReadSize` 16384](../../images/tutorials/tune-readsize/histogram1d_16384.webp) ![screenshot of PercentUsed visualizers with `ReadSize` 16384](../../images/tutorials/tune-readsize/percent-used_16384.webp) -Since data is produced at about 47MB/s, it takes about 340 μs to produce 16384 -bytes of data. This means that the data contained in a single `ReadSize` block -was generated in the span of approximately 340 μs. Because we are using every -100th sample to generate feedback, the sample that is actually used to trigger -an output could be any from that 340 μs span resulting in latencies that are -lower then 340 μs. This is reflected in the Histogram1D visualizer. The average -latency is ~300 μs (in this plot, 1000 corresponds to 1 ms) and can be as low as -~60 μs. The long tail in the distribution corresponds to instances when the -hardware buffer was used or the operating system was busy with other tasks. - -With `ReadSize` of 16384 bytes, the PercentUsed visualizer shows that the -percent of the Hardware Buffer being used remains close to zero. This indicates -that the Hardware Buffer is generally being bypassed because data is being read -more quickly by the host than it is produced by the hardware. For experiments -without hard real-time constraints, this latency is perfectly acceptable. For -experiments with hard real-time constraints, let's see how low we can get the -closed-loop latency. +The Histogram1D visualizer shows that the average latency is about 300 μs, with +most latencies ranging from ~60 μs to ~400 μs. This roughly matches our +expectations. Since data is produced at about 47MB/s, it takes about 340 μs to +produce 16384 bytes of data. This means that the data contained in a single +`ReadSize` block was generated in the span of approximately 340 μs. Because we +are using every 100th sample to generate feedback, the sample that is actually +used to trigger LoadTesterLoopback could be any from that 340 μs span resulting +in a range of latencies. The long tail in the distribution corresponds to +instances when the hardware buffer was used or the CPU was busy with other +tasks. + +The PercentUsed visualizer shows that the percent of the hardware buffer being +used remains close to zero. This indicates minimal usage of the hardware buffer, +and that the host is safely reading data faster than the ONIX produces that +data. For experiments without hard real-time constraints, this latency is +perfectly acceptable. + +For experiments with harder real-time constraints, let's see how much lower we +can get the closed-loop latency. #### `ReadSize` = 2048 bytes + Set `ReadSize` to 2048 bytes, restart the workflow (`ReadSize` is a [](xref:OpenEphys.Onix1#configuration) property so it only updates when a workflow starts), and open the same visualizers: @@ -242,11 +304,14 @@ property so it only updates when a workflow starts), and open the same visualize ![screenshot of Histogram1D visualizers with `ReadSize` 2048](../../images/tutorials/tune-readsize/histogram1d_2048.webp) ![screenshot of PercentUsed visualizers with `ReadSize` 2048](../../images/tutorials/tune-readsize/percent-used_2048.webp) -The closed-loop latencies now average about 80 μs. The hardware buffer is still -stable at around around zero indicating that, even given the increased overhead -associated with a smaller `ReadSize`, software is collecting data rapidly enough -to prevent excessive accumulation in the hardware buffer. Let's see if we can -decrease latency even further. +The Histogram1D visualizer shows closed-loop latencies now average about 80 +μs with lower variability. + +The PercentUsed visualizer shows the hardware buffer is still stable at +around zero. This means that, even with the increased overhead associated +with a smaller `ReadSize`, the host is reading data rapidly enough to prevent +excessive accumulation in the hardware buffer. Let's see if we can decrease +latency even further. #### `ReadSize` = 1024 bytes @@ -258,26 +323,27 @@ Set `ReadSize` to 1024 bytes, restart the workflow, and open the same visualizer The Histogram1D visualizer appears to be empty. This is because the latency immediately exceeds the x-axis upper limit of 1 ms. You can see this by inspecting the visualizer for the node prior to Histogram1D. Because of the very -small buffer size (which is on the order of a single Neuropixel 2.0 sample), the -computer cannot perform read operations at a rate required to keep up with data -production. This causes excessive accumulation of data in the hardware buffer. -The most recently produced data is added to the end of the hardware buffer's -queue, requiring several read operations before it can be read. As more data -accumulates in the buffer, the duration of time from when that data was produced -and when that data can finally be read increases. In other words, latencies -increase dramatically, and closed loop performance collapses. - -Because the amount of data in the hardware buffer is increasing (which is -indicated by the steadily rising PercentUsed visualizer), the acquisition -session will eventually terminate in an error when the MemoryMonitor PercentUsed -reaches 100% and the hardware buffer overflows. +small buffer size (which is on the order of a single Neuropixels 2.0 sample), +the computer cannot perform read operations at a rate required to keep up with +data production. This causes excessive accumulation of data in the hardware +buffer. In this case, when new data is produced, it gets added to the end of the +hardware buffer queue, requiring several read operations before this new data +can be read. As more data accumulates in the buffer, the duration of time from +when that data was produced and when that data can finally be read increases. In +other words, latencies increase dramatically, and closed loop performance +collapses. + +The PercentUsed visualizer shows that the percentage of the hardware buffer that +is occupied is steadily increasing. The acquisition session will eventually +terminate in an error when the MemoryMonitor PercentUsed reaches 100% and the +hardware buffer overflows. #### Summary The results of our experimentation are as follows: | `ReadSize` | Latency | Buffer Usage | Notes | -|-------------|----------------|----------------|----------------------------------------------------------------------------------------------------| +| ----------- | -------------- | -------------- | -------------------------------------------------------------------------------------------------- | | 16384 bytes | ~300 μs | Stable at 0% | Perfectly adequate if there are no strict low latency requirements, lowest risk of buffer overflow | | 2048 bytes | ~80 μs | Stable near 0% | Balances latency requirements with low risk of buffer overflow | | 1024 bytes | Rises steadily | Unstable | Certain buffer overflow and terrible closed loop performance | @@ -292,12 +358,12 @@ what is on the ), ~4.3 MB/s: ![screenshot of ConfigureLoadTester's property editor for a single Intan chips](../../images/tutorials/tune-readsize/load-tester-configuration_properties-editor_64ch.webp) -| `ReadSize` | Latency | Buffer Usage | Notes | -|------------|----------------|--------------|-----------------------------------------------------------------------------------| -| 1024 bytes | ~200 μs | Stable at 0% | Perfectly adequate if that are no strict low latency requirements | -| 512 bytes | ~110 μs | Stable at 0% | Lower latency, no risk of buffer overflow | -| 256 bytes | ~80 μs | Stable at 0% | Lowest achievable latency with this setup, still no risk of buffer overflow | -| 128 bytes | - | - | Results in error -- 128 bytes is too small for the current hardware configuration | +| `ReadSize` | Latency | Buffer Usage | Notes | +| ---------- | ------- | ------------ | --------------------------------------------------------------------------------- | +| 1024 bytes | ~200 μs | Stable at 0% | Perfectly adequate if that are no strict low latency requirements | +| 512 bytes | ~110 μs | Stable at 0% | Lower latency, no risk of buffer overflow | +| 256 bytes | ~80 μs | Stable at 0% | Lowest achievable latency with this setup, still no risk of buffer overflow | +| 128 bytes | - | - | Results in error -- 128 bytes is too small for the current hardware configuration | Regarding the last row of the above table, the lowest `ReadSize` possible is determined by the size of the largest data frame produced by enabled devices @@ -307,16 +373,27 @@ shows that the hardware buffer does not accumulate data: ![](../../images/tutorials/tune-readsize/percent-used_256_lower-payload.png) -These two tables together demonstrates why it is impossible to recommend a +> [!TIP] +> - The only constraint on `ReadSize` is the lower limit as demonstrated in +> the example of tuning for `ReadSize` for a single 64-channel Intan chip. +> We only tested `ReadSize` values that are a power of 2, but `ReadSize` can +> be fine-tuned further to achieve even tighter latencies if necessary. +> - **As of OpenEphys.Onix1 0.7.0**: Although `ReadSize` can be set to any +> value by the user (besides the constraint described in the previous bullet +> point), the ONIX1 Bonsai package rounds this `ReadSize` to the nearest +> multiple of four and uses that value instead. For example, if you try to +> set `ReadSize` to 887, the software will use the value 888 instead. + +These two tables together demonstrate why it is impossible to recommend a single correct value for `ReadSize` that is adequate for all experiments. The diversity of experiments (in particular, the wide range at which they produce data) requires a range of `ReadSize` values. Last, in this tutorial, there was minimal computational load imposed by -the workflow itself. In most applications, some processing is performed on the -data to generate the feedback signal. It's important to take this into account -when tuning your system and potentially modifying the workflow to perform -computations on incoming data in order to account for the effect of +the workflow itself. In most applications, some processing is performed on the +data to generate the feedback signal. It's important to take this into account +when tuning your system and potentially modifying the workflow to perform +computations on incoming data in order to account for the effect of computational demand on closed loop performance. \ No newline at end of file