diff --git a/api/data-elements.md b/api/data-elements.md
index aa64334..110fd5b 100644
--- a/api/data-elements.md
+++ b/api/data-elements.md
@@ -3,4 +3,95 @@ uid: data-elements
title: Data Elements
---
-Data elements are produced by Bonsai operators.
\ No newline at end of file
+Data elements are produced by Bonsai operators. These
+pages contain information data elements that can help interpret and load data
+produced by .
+
+In general, a data element comprises of properties which together contain
+timestamped data from a particular device. For example,
+ outputs
+[Bno055DataFrames](xref:OpenEphys.Onix1.Bno055DataFrame) which contains data
+produced by a BNO055 device:
+- The data produced by the BNO055 is contained in the Acceleration,
+ Calibration, EulerAngle, Gravity, and Temperature properties of the
+ Bno055DataFrame. Any of these properties can be individually selected and
+ [visualized](xref:visualize-data) in Bonsai.
+- The property contains the precise
+ hardware timestamp for the data in the properties described in the first
+ bullet point created using the global ONIX Controller clock. This Clock
+ property can be used to sync BNO055 data with data from all other devices
+ from which ONIX is acquiring and put it all onto the same timeline.
+- The property contains the precise
+ hardware timestamp created using the clock on the hardware that contains
+ the device.
+
+There are some exceptions to the pattern described above. For example:
+- is an object passed through the
+ configuration chain for writing to and reading from the ONIX hardware.
+- outputs the parameters used to
+ set the precise hardware output clock when the workflow starts.
+
+These pages also describe the type of each property. This type information can
+be used to calculate the rate of data produced by the devices enabled in your
+experiment. For example, the operator
+(which outputs the data from a single Neuropixels 2.0 probe device) produces a
+sequence of
+[NeuropixelsV2eDataFrames](xref:OpenEphys.Onix1.NeuropixelsV2eDataFrame). Using
+the fact that each sample comprises of a Clock property (8 bytes), a HubClock
+property (8 bytes), and an AmplifierData property (384*2 bytes), this device's
+data rate is:
+
+$$
+\begin{equation}
+ \frac{2*384+8+8\,bytes}{sample}*\frac{30,000\,samples}{s}*\frac{1\,MB}{10^6bytes} = 23.52\,MB/s
+ \label{eq:1x_npx2_bw}
+\end{equation}
+$$
+
+NeuropixelsV2eDataFrame is actually a buffered data frame (as indicated by the
+presence of NeuropixelsV2eData's BufferSize property), meaning that several data
+samples and their timestamps are buffered into a single NeuropixelsV2eDataFrame.
+The above calculation was calculated under the assumption that
+NeuropixelsV2eData's BufferSize property is set to 1. Although the calculation
+is slightly different when BufferSize is more than 1, the end result ends up
+being the same. When BufferSize is more than 1, NeuropixelsV2eDataFrames are
+produced at a rate 30 kHz divided by the value of BufferSize. Each
+NeuropixelsV2eDataFrame comprises of:
+
+- a Clock property: an array of ulong (each 8 bytes) of length N
+- a HubClock property: an array of ulong (each 8 bytes) of length N
+- an AmplifierData property: a of ushort (each 2 bytes)
+ of size 384 x N
+
+where N is a stand-in for BufferSize. Therefore, the calculation becomes:
+
+$$
+\begin{equation}
+ \frac{(2*384+8+8)*N\,bytes}{sample}*\frac{30,000/N\,samples}{s}*\frac{1\,MB}{10^6bytes} = 23.52\,MB/s
+ \label{eq:1x_npx2_bw_buffersize}
+\end{equation}
+$$
+
+N cancels out and the result is the same.
+
+Knowing the type of each property can also be helpful in two more ways:
+
+- A property's type indicates how a property can be used in Bonsai. Operators
+ typically accept only a specific type or set of types as inputs. When types
+ don't match, Bonsai indicates an error.
+- If a property is saved using a (i.e. as a raw
+ binary file), knowing its type informs how to load the data. For example,
+ the [dtypes](https://numpy.org/doc/stable/reference/arrays.dtypes.html) in
+ our [example Breakout Board data-loading script](xref:breakout_load-data)
+ were selected according to the size of each data being saved. For example,
+ digital input clock samples are saved using 8 bytes which requires
+ `dt=np.uint64` when loading, and digital input pin samples are saved using a
+ single byte which requires `dt=np.uint8` when loading.
+
+
diff --git a/articles/getting-started/onix-configuration.md b/articles/getting-started/onix-configuration.md
index fbf61ba..84c02ad 100644
--- a/articles/getting-started/onix-configuration.md
+++ b/articles/getting-started/onix-configuration.md
@@ -54,7 +54,8 @@ The data acquisition process is started when ContextTask passes through
. StartAcquisition allows the user to set parameters that are
related to data acquisition such as ReadSize and WriteSize. Setting the ReadSize property for a
particular workflow is a balancing act of minimizing latency of data data transfers from the ONIX
-system and avoiding data accumulation in the ONIX system's hardware buffer.
+system and avoiding data accumulation in the ONIX system's hardware buffer. To learn about the
+process of tuning ReadSize, check out the tutorial.
::: workflow

diff --git a/articles/tutorials/toc.yml b/articles/tutorials/toc.yml
index f91f9f1..977e714 100644
--- a/articles/tutorials/toc.yml
+++ b/articles/tutorials/toc.yml
@@ -2,3 +2,4 @@
items:
- href: ephys-processing-listening.md
- href: ephys-socket.md
+ - href: tune-readsize.md
diff --git a/articles/tutorials/tune-readsize.md b/articles/tutorials/tune-readsize.md
new file mode 100644
index 0000000..15a0969
--- /dev/null
+++ b/articles/tutorials/tune-readsize.md
@@ -0,0 +1,437 @@
+---
+uid: tune-readsize
+title: Optimizing Closed Loop Performance
+---
+
+This tutorial shows how to retrieve data from the ONIX hardware as quickly as
+possible for experiments with strict low-latency closed-loop requirements by
+tuning the workflow for your particular data sources and computer
+specifications. In most situations, sub-200 microsecond closed-loop response
+times can be achieved.
+
+> [!NOTE]
+> Performance will vary based on your computer's capabilities and your results
+> might differ from those presented below. The computer used to create this
+> tutorial has the following specs:
+>
+> - CPU: Intel i9-12900K
+> - RAM: 64 GB
+> - GPU: NVIDIA GTX 1070 8GB
+> - OS: Windows 11
+
+## Data Transmission from ONIX Hardware to Host Computer
+
+ONIX is capable of transferring data directly from production to the
+host computer. However, if the host is busy when ONIX starts
+producing data, ONIX will temporarily store this new data in its hardware buffer
+while it waits for the host to be ready to accept new data.
+
+Key details about this process:
+
+- The size of hardware-to-host data transfers is determined by the
+ property of the
+ operator which is in every Bonsai
+ workflow that uses to acquire data from ONIX.
+- Increasing `ReadSize` allows the host to read larger chunks of data from
+ ONIX per read operation without significantly increasing the duration of the
+ read operation, therefore increasing the maximum rate at which data can be
+ read.
+- If the host is busy or cannot perform read operations rapidly enough to keep
+ up with the rate at which ONIX produces data, the ONIX hardware buffer will
+ start to accumulate excessive data.
+- Accumulation of excess data in the hardware buffer collapses real-time
+ performance and risks hardware buffer overflow which would prematurely
+ terminate the acquisition session. `ReadSize` can be increased to avoid this
+ situation.
+- As long as this situation is avoided, decreasing `ReadSize` means that ONIX
+ doesn't need to produce as much data before the host can access it. This,
+ in effect, means software can start operating on data closer to the time
+ that the data was produced, thus achieving lower-latency feedback-loops.
+
+In other words, a small `ReadSize` can help the host access data sooner to when
+that data was created. However, each data transfer incurs overhead. If
+`ReadSize` is so small that ONIX produces a `ReadSize` amount of data faster
+than the average time it takes the host computer to perform a read operation,
+the hardware buffer will accumulate excessive data. This will destroy real-time
+performance and eventually cause the hardware buffer to overflow, terminating
+acquisition. The goal of this tutorial is to tune StartAcquisition's `ReadSize`
+so that data flows from production to the software running on the host as
+quickly as possible by minimizing the amount of time that it sits idly in both
+the ONIX hardware buffer and the host computer's buffer. This provides software
+access to the data as close to when the data was produced as possible which
+helps achieve lower latency closed-loop feedback.
+
+### Technical Details
+
+> [!NOTE]
+> This section explains more in-depth how data is transferred from ONIX to the
+> host computer. Although these details provide additional context about ONIX,
+> they are more technical and are not required for following the rest of the
+> tutorial.
+
+When the host computer reads data from the ONIX
+hardware, it retrieves a **ReadSize**-bytes sized chunk of data using the
+following procedure:
+
+1. A `ReadSize`-bytes long block of memory is allocated on the host computer's
+ RAM by the host API for the purpose of holding incoming data from ONIX.
+1. A pointer to that memory is provided to the
+ [RIFFA](https://open-ephys.github.io/ONI/v1.0/api/liboni/driver-translators/riffa.html)
+ driver (the PCIe backend/kernel driver for the ONIX system) which moves the
+ allocated memory block into a more privileged state known as kernel mode so
+ that it can initiate a [DMA
+ transfer](https://en.wikipedia.org/wiki/Direct_memory_access). DMA allows
+ data transfer to be performed by ONIX hardware without additional CPU
+ intervention.
+1. The data transfer completes once this block of data has been populated with
+ `ReadSize` bytes of data from ONIX.
+1. The RIFFA driver moves the memory block from kernel mode to user mode so
+ that it can be accessed by software. The API function returns with a pointer
+ to the filled buffer.
+
+During this process, memory is allocated only once by the API, and the transfer
+is [zero-copy](https://en.wikipedia.org/wiki/Zero-copy). The API-allocated
+buffer is written autonomously by ONIX hardware using minimal resources from
+the host computer.
+
+So far, all this occurs on the host-side. Meanwhile, on the ONIX-side:
+
+- If ONIX produces new data before the host is able to consume the data in the
+ API-allocated buffer, this new data is added to the back of ONIX hardware
+ buffer FIFO. The ONIX hardware buffer consists of 2GB of RAM that belongs to
+ the acquisition hardware (it is _not_ RAM in the host computer) dedicated to
+ temporarily storing data that is waiting to be transferred to the host. Data
+ is removed from the front of the hardware buffer and transferred to the host
+ once it's ready to accept more data.
+- If the memory is allocated on the host-side and the data transfer is
+ initiated by the host API before any data is produced, ONIX transfers new
+ data directly to the host bypassing the hardware buffer. In this case, ONIX
+ is literally streaming data to the host _the moment it is produced_. This
+ data becomes available for reading by the host once ONIX transfers the full
+ `ReadSize` bytes.
+
+## Tuning `ReadSize` to Optimize Closed Loop Performance
+
+ONIX provides a mechanism for tuning the value of `ReadSize` to optimize closed
+loop performance that takes into account the idiosyncrasies of your host
+computer and experimental acquisition setup.
+
+> [!NOTE]
+> If you are not familiar with the basic usage of the `OpenEphys.Onix1` library,
+> then visit the [Getting Started](xref:getting-started) guide to set up your
+> Bonsai environment and familiarize yourself with using the library to acquire
+> data from ONIX before proceeding.
+
+Copy the following workflow into the Bonsai workflow editor by hovering over the
+workflow image and clicking on the clipboard icon that appears. Open Bonsai and
+paste this workflow by clicking the Bonsai workflow editor pane and pressing
+Ctrl+V.
+
+::: workflow
+
+:::
+
+### Hardware Configuration
+
+The top-row configuration chain includes a
+ operator. This configures ONIX's Load
+Tester Device, which produces and consumes data at user-specified rates for
+testing and tuning the latency between data production and real-time feedback.
+This device is _not an emulator_. It is a real hardware device that produces and
+consumes data using the selected driver and physical link (e.g. PCIe bus) and
+thus provides accurate measurements of feedback performance for a given host
+computer.
+
+::: workflow
+
+:::
+
+We need to configure the load tester to produce and consume the same amount of
+data as our real experimental hardware would. For example, lets say that during
+our closed loop experiment, feedback signals will be generated as a function of
+data acquired from two Neuropixels 2.0 probes, each of which generates a 384
+channel sample at 30 kHz. The overall bandwidth is
+
+$$
+\begin{equation}
+ 2\,probes*\frac{384\,chan.}{probe}*\frac{30\,ksamp.}{sec\,chan.}*\frac{2\,bytes}{samp.} \approx 47\,MB/s
+ \label{eq:2xnpx2bw}
+\end{equation}
+$$
+
+To understand how we came up with this calculation, visit the
+ page.
+
+We'll setup `ConfigureLoadTester` to produce data at the same frequency and
+bandwidth as two Neuropixels 2.0 probes with the following settings:
+
+
+
+- `DeviceAddress` is set to 11 because that's how this device is indexed in
+ the ONIX system.
+- `DeviceName` is set to "Load Tester"
+- `Enable` is set to True to enable the LoadTester device.
+- `FramesPerSecond` is then set to 60,000 Hz. The rate at which frames are
+ produced by two probes, since each is acquired independently.
+- `ReceivedWords` is set to 392 bytes, the size of a single
+ including its clock members.
+- `TransmittedWords` is set to 100 bytes. This simulates the amount of data
+ required to e.g. send a stimulus waveform.
+
+> [!NOTE]
+> The `DeviceAddress` must be manually configured because
+> is used for diagnostics and testing
+> and therefore is not made available through
+> like the rest of the local
+> devices (analog IO, digital IO, etc.). The device address can be found using
+> [oni-repl](https://open-ephys.github.io/onix-docs/Software%20Guide/oni-repl/usage.html#repl-commands).
+
+Next we configure 's
+ and
+ properties.
+
+`WriteSize` is set to 16384 bytes. This defines a readily-available pool of
+memory for the creation of output data frames. Data is written to hardware as
+soon as an output frame has been created, so the effect on real-time performance
+is typically not as large as that of the `ReadSize` property.
+
+To start, `ReadSize` is also set to 16384. Later in this tutorial, we'll examine
+the effect of this value on real-time performance.
+
+### Real-time Loop
+
+The bottom half of the workflow is used to stream data back to the load testing
+device from hardware so that it can perform a measurement of round trip latency.
+The operator acquires a sequence of
+[LoadTesterDataFrames](xref:OpenEphys.Onix1.LoadTesterDataFrame) from the
+hardware each of which is split into its
+ member and
+ member.
+
+::: workflow
+
+:::
+
+The `HubClock` member indicates the acquisition clock count when the
+`LoadTesterDataFrame` was produced. The `EveryNth` operator is a
+ operator which only allows through every Nth
+element in the observable sequence. This is used to simulate an algorithm, such
+as spike detection, that only triggers closed loop feedback in response to input
+data meeting some condition. The value of `N` can be changed to simulate
+different feedback frequencies. You can inspect its logic by double-clicking the
+node when the workflow is not running. In this case, `N` is set to 100, so every
+100th sample is delivered to .
+
+`LoadTesterLoopback` is a _sink_ which writes HubClock values it receives back
+to the load tester device. When the load tester device receives a HubClock from
+the host computer, it's subtracted from the current acquisition clock count.
+That difference is sent back to the host computer as the `HubClockDelta`
+property of subsequent `LoadTesterDataFrames`. In other words, `HubClockDelta`
+indicates the amount of time that has passed since the creation of a frame in
+hardware and the receipt of a feedback signal in hardware based on that frame:
+it is a complete measurement of closed loop latency. This value is converted to
+milliseconds and then is used to help visualize
+the distribution of closed-loop latencies.
+
+Finally, at the bottom of the workflow, a
+ operator is used to examine the state
+of the hardware buffer. To learn about the
+ branch, visit the [Breakout Board
+Memory Monitor](xref:breakout_memory-monitor) page.
+
+::: workflow
+
+:::
+
+### Relevant Visualizers
+
+The desired output of this workflow are the [visualizers](xref:visualize-data)
+for the Histogram1D and PercentUsed nodes. Below is an example of each which we
+will explore more in the next section:
+
+
+
+
+The Histogram1D visualizer shows the distribution of closed-loop feedback
+latencies. The x-axis is in units of μs, and the y-axis represents the number of
+samples in a particular bin. The histogram is configured to have 1000 bins
+between 0 and 1000 μs. For low-latency closed-loop experiments, the goal is to
+concentrate the distribution of closed-loop feedback latencies towards 0 μs as
+much as possible.
+
+The PercentUsed visualizer shows a time-series of the amount of the hardware
+buffer that is occupied by data as a percentage of the hardware buffer's total
+capacity. The x-axis is timestamps, and the y-axis is percentage. To ensure data
+is available as soon as possible from when it was produced and avoid potential
+buffer overflow, the goal is to maintain the percentage at or near zero.
+
+### Real-time Latency for Different `ReadSize` Values
+
+#### `ReadSize` = 16384 bytes
+
+With `ReadSize` set to 16384 bytes, start the workflow and open the visualizers for the PercentUsed and
+Histogram1D nodes:
+
+
+
+
+The Histogram1D visualizer shows that the average latency is about 300 μs, with
+most latencies ranging from ~60 μs to ~400 μs. This roughly matches our
+expectations. Since data is produced at about 47MB/s, it takes about 340 μs to
+produce 16384 bytes of data. This means that the data contained in a single
+`ReadSize` block was generated in the span of approximately 340 μs. Because we
+are using every 100th sample to generate feedback, the sample that is actually
+used to trigger LoadTesterLoopback could be any from that 340 μs span resulting
+in a range of latencies. The long tail in the distribution corresponds to
+instances when the hardware buffer was used or the CPU was busy with other
+tasks.
+
+The PercentUsed visualizer shows that the percent of the hardware buffer being
+used remains close to zero. This indicates minimal usage of the hardware buffer,
+and that the host is safely reading data faster than the ONIX produces that
+data. For experiments without hard real-time constraints, this latency is
+perfectly acceptable.
+
+For experiments with harder real-time constraints, let's see how much lower we
+can get the closed-loop latency.
+
+#### `ReadSize` = 2048 bytes
+
+Set `ReadSize` to 2048 bytes, restart the workflow (`ReadSize` is a
+[](xref:OpenEphys.Onix1#configuration)
+property so it only updates when a workflow starts), and open the same visualizers:
+
+
+
+
+The Histogram1D visualizer shows closed-loop latencies now average about 80
+μs with lower variability.
+
+The PercentUsed visualizer shows the hardware buffer is still stable at
+around zero. This means that, even with the increased overhead associated
+with a smaller `ReadSize`, the host is reading data rapidly enough to prevent
+excessive accumulation in the hardware buffer. Let's see if we can decrease
+latency even further.
+
+#### `ReadSize` = 1024 bytes
+
+Set `ReadSize` to 1024 bytes, restart the workflow, and open the same visualizers.
+
+
+
+
+The Histogram1D visualizer appears to be empty. This is because the latency
+immediately exceeds the x-axis upper limit of 1 ms. You can see this by
+inspecting the visualizer for the node prior to Histogram1D. Because of the very
+small buffer size (which is on the order of a single Neuropixels 2.0 sample),
+the computer cannot perform read operations at a rate required to keep up with
+data production. This causes excessive accumulation of data in the hardware
+buffer. In this case, when new data is produced, it gets added to the end of the
+hardware buffer queue, requiring several read operations before this new data
+can be read. As more data accumulates in the buffer, the duration of time from
+when that data was produced and when that data can finally be read increases. In
+other words, latencies increase dramatically, and closed loop performance
+collapses.
+
+The PercentUsed visualizer shows that the percentage of the hardware buffer that
+is occupied is steadily increasing. The acquisition session will eventually
+terminate in an error when the MemoryMonitor PercentUsed reaches 100% and the
+hardware buffer overflows.
+
+#### Summary
+
+The results of our experimentation are as follows:
+
+| `ReadSize` | Latency | Buffer Usage | Notes |
+| ----------- | -------------- | -------------- | -------------------------------------------------------------------------------------------------- |
+| 16384 bytes | ~300 μs | Stable at 0% | Perfectly adequate if there are no strict low latency requirements, lowest risk of buffer overflow |
+| 2048 bytes | ~80 μs | Stable near 0% | Balances latency requirements with low risk of buffer overflow |
+| 1024 bytes | Rises steadily | Unstable | Certain buffer overflow and terrible closed loop performance |
+
+These results may differ for your experimental system. For example, your system
+might have different bandwidth requirements (if you are using different devices,
+data is produced at a different rate) or use a computer with different
+performance capabilities (which changes how quickly read operations can occur).
+For example, here is a similar table made by configuring the Load Tester device
+to produce data at a rate similar to a single 64-channel Intan chip (such as
+what is on the ), ~4.3 MB/s:
+
+
+
+| `ReadSize` | Latency | Buffer Usage | Notes |
+| ---------- | ------- | ------------ | --------------------------------------------------------------------------------- |
+| 1024 bytes | ~200 μs | Stable at 0% | Perfectly adequate if that are no strict low latency requirements |
+| 512 bytes | ~110 μs | Stable at 0% | Lower latency, no risk of buffer overflow |
+| 256 bytes | ~80 μs | Stable at 0% | Lowest achievable latency with this setup, still no risk of buffer overflow |
+| 128 bytes | - | - | Results in error -- 128 bytes is too small for the current hardware configuration |
+
+Regarding the last row of the above table, the lowest `ReadSize` possible is
+determined by the size of the largest data frame produced by enabled devices
+(plus some overhead). Even with the lowest possible `ReadSize` value, 256 bytes,
+there is very little risk of overflowing the buffer. The PercentUsed visualizer
+shows that the hardware buffer does not accumulate data:
+
+
+
+> [!TIP]
+> - The only constraint on `ReadSize` is the lower limit as demonstrated in
+> the example of tuning for `ReadSize` for a single 64-channel Intan chip.
+> We only tested `ReadSize` values that are a power of 2, but `ReadSize` can
+> be fine-tuned further to achieve even tighter latencies if necessary.
+> - **As of OpenEphys.Onix1 0.7.0:** As long as you stay above the minimum
+> mentioned in the previous bullet point, `ReadSize` can be set to any value
+> by the user. The OpenEphys.Onix1 Bonsai package will round this `ReadSize`
+> to the nearest multiple of four and uses that value instead. For example,
+> if you try to set `ReadSize` to 887, the software will use the value 888
+> instead.
+> - If you are using a data I/O operator that has capacity to produce data at
+> various rates (like ), test your chosen
+> `ReadSize` by configuring the load tester to produce data at the lower and
+> upper limits that you expect data to be produced during your experiment.
+> This will help ensure excess data doesn't accumulate in the hardware
+> buffer and desired closed-loop latencies are maintained throughout the
+> range of data throughput of these devices.
+> - Running other processes that demand the CPU's attention might cause
+> spurious spikes in data accumulation in the hardware buffer. Either reduce
+> the amount other processes or test that they don't interfere with your
+> experiment.
+
+These two tables together demonstrate why it is impossible to recommend a
+single correct value for `ReadSize` that is adequate for all experiments. The
+diversity of experiments (in particular, the wide range at which they produce
+data) requires a range of `ReadSize` values.
+
+Last, in this tutorial, there was minimal computational load imposed by the
+workflow used in this tutorial. In most applications, some processing is
+performed on the data to generate the feedback signal. It's important to take
+this into account when tuning your system and potentially modifying the workflow
+to perform computations on incoming data in order to account for the effect of
+computational demand on closed loop performance.
+
+### Measuring Latency in Actual Experiment
+
+After tuning `ReadSize`, it is important to experimentally verify the latencies
+using the actual devices in your experiment. For example, if your feedback
+involves toggling ONIX's digital output (which in turn toggles a stimulation
+device like a [Stimjim](https://github.com/open-ephys/stimjim) or a [RHS2116
+external trigger](xref:OpenEphys.Onix1.ConfigureRhs2116Trigger.TriggerSource)),
+you can loop that digital output signal back into one of ONIX's digital inputs
+to measure when the feedback physically occurs. This can be used to measure your
+feedback latency by taking the difference between the clock count when the
+trigger condition occurs and the clock count when the feedback signal is
+received by ONIX.
+
+You might wonder why you'd even use the LoadTester device if you can measure
+latency using the actual devices that you intend to use in your experiment. The
+benefit of the LoadTester device is that you're able to collect at least tens of
+thousands of latency samples to plot in a histogram in a short amount of time.
+Trying to use digital I/O to take as many latency measurements in a similar
+amount of time can render your latency measurements inaccurate for the actual
+experiment you intend to perform. In particular, toggling digital inputs faster
+necessarily increases the total data throughput of
+`DigitalInput`. If the data throughput of
+`DigitalInput` significantly exceeds what is required for your experiment,
+the latency measurements will not reflect the latencies you will experience
+during the actual experiment.
+
+
\ No newline at end of file
diff --git a/images/tutorials/tune-readsize/histogram1d_1024.webp b/images/tutorials/tune-readsize/histogram1d_1024.webp
new file mode 100644
index 0000000..bb975f2
Binary files /dev/null and b/images/tutorials/tune-readsize/histogram1d_1024.webp differ
diff --git a/images/tutorials/tune-readsize/histogram1d_16384.webp b/images/tutorials/tune-readsize/histogram1d_16384.webp
new file mode 100644
index 0000000..259e3e7
Binary files /dev/null and b/images/tutorials/tune-readsize/histogram1d_16384.webp differ
diff --git a/images/tutorials/tune-readsize/histogram1d_2048.webp b/images/tutorials/tune-readsize/histogram1d_2048.webp
new file mode 100644
index 0000000..93db118
Binary files /dev/null and b/images/tutorials/tune-readsize/histogram1d_2048.webp differ
diff --git a/images/tutorials/tune-readsize/load-tester-configuration_properties-editor.webp b/images/tutorials/tune-readsize/load-tester-configuration_properties-editor.webp
new file mode 100644
index 0000000..5399714
Binary files /dev/null and b/images/tutorials/tune-readsize/load-tester-configuration_properties-editor.webp differ
diff --git a/images/tutorials/tune-readsize/load-tester-configuration_properties-editor_64ch.webp b/images/tutorials/tune-readsize/load-tester-configuration_properties-editor_64ch.webp
new file mode 100644
index 0000000..c1269a0
Binary files /dev/null and b/images/tutorials/tune-readsize/load-tester-configuration_properties-editor_64ch.webp differ
diff --git a/images/tutorials/tune-readsize/percent-used_1024.webp b/images/tutorials/tune-readsize/percent-used_1024.webp
new file mode 100644
index 0000000..f70338c
Binary files /dev/null and b/images/tutorials/tune-readsize/percent-used_1024.webp differ
diff --git a/images/tutorials/tune-readsize/percent-used_16384.webp b/images/tutorials/tune-readsize/percent-used_16384.webp
new file mode 100644
index 0000000..5effe5e
Binary files /dev/null and b/images/tutorials/tune-readsize/percent-used_16384.webp differ
diff --git a/images/tutorials/tune-readsize/percent-used_2048.webp b/images/tutorials/tune-readsize/percent-used_2048.webp
new file mode 100644
index 0000000..6231002
Binary files /dev/null and b/images/tutorials/tune-readsize/percent-used_2048.webp differ
diff --git a/images/tutorials/tune-readsize/percent-used_256_lower-payload.png b/images/tutorials/tune-readsize/percent-used_256_lower-payload.png
new file mode 100644
index 0000000..80c5080
Binary files /dev/null and b/images/tutorials/tune-readsize/percent-used_256_lower-payload.png differ
diff --git a/src/bonsai-onix1 b/src/bonsai-onix1
index b9bdc3c..6dbe5b8 160000
--- a/src/bonsai-onix1
+++ b/src/bonsai-onix1
@@ -1 +1 @@
-Subproject commit b9bdc3c0bb340843ff4288b6145308176307d81e
+Subproject commit 6dbe5b83875fcb14d7e187d03aaf9c9e7da146df
diff --git a/template/partials/hardware/configuration.tmpl.partial b/template/partials/hardware/configuration.tmpl.partial
index dadc33d..ba173a2 100644
--- a/template/partials/hardware/configuration.tmpl.partial
+++ b/template/partials/hardware/configuration.tmpl.partial
@@ -61,8 +61,10 @@
{{{blockReadSize}}} bytes, meaning data collection will wait until {{{blockReadSize}}} bytes of
data have been produced by the hardware. At {{{dataRate}}} MB/s the hardware will produce
{{{blockReadSize}}} bytes every ~{{{timeUntilFullBuffer}}}. This is a hard bound on the latency of
- the system. If lower latencies were required, the hardware would need to produce data more quickly
- or the ReadSize property value would need to be reduced.
+ the system. If lower latencies are required, the hardware would need to produce data more quickly
+ or the ReadSize property value would need to be reduced. To learn about the process of tuning ReadSize,
+ check out the
+ Tune ReadSize tutorial.