# SD Card Profiling

This document shows profiling of the SD Card to gauge write performance and derice the speed of which we can write.  When doing *fast* writes to the SD card, there's concerns that:
- The speed of which we're writing won't be able to keep up with the runloop, causing the entire vario to run slower when we're collecting telemetry
- There are times when we're flushing to the card, or some background filesystem process has to happen, which will cause occasional runs of the runloop to be stalled.

# Test
The test was done by writing into a data file, the microseconds, as quickly as we can, then using this script to unpack the data.  Microseconds on the Arduino is 4 bytes in length.

# Findings:
- There is no significant difference between running this in the Arduino vs PlatformIO environments
- P95 is 9 microseconds.  P99 is 13, and anything over is a bit hairy
- lots of records in the > P99 having ~34ms.
- Towards the P100 end, we're seeing the runloop durations of ~276ms

## Typical Write Speed
- To err on the side of caution, if we don't want *typically* backup the runloop and allow 1ms for writing telemetry data, we should NOT be logging more than 134 bytes.  Ideally, we should be using a binary data structure as in ASCII, this is only 134 characters.


# Setup
```bash
[scott@sob-desktop local-scott]$ python3 -m venv ~/data_science_env
[scott@sob-desktop local-scott]$ . ~/data_science_env/bin/activate_env
(data_science) [scott@sob-desktop local-scott]$ pip install -r requirements.txt
```

In [27]:
import struct
import numpy as np
import matplotlib.pyplot as plt
import plotly.graph_objects as go
import pandas as pd

def collect_values(file_path):
    values = []
    with open(file_path, "rb") as f:
        while chunk := f.read(4):
            if len(chunk) != 4:
                print("Incomplete data at the end of file.")
                break
        
            value = struct.unpack("<i", chunk)[0]  # Assumes little-endian 4-byte integers
            values.append(value)
    return values


def collect_intervals(values):
    intervals = []
    prev_value = values[0]
    for value in values[1:]:    
        interval = value - prev_value
        intervals.append(interval)
        prev_value = value
    return intervals



def graph_percentiles(data):
    percentiles = np.percentile(data, np.arange(1, 101))
    fig = go.Figure()
    fig.add_trace(go.Scatter(x=np.arange(1, 101), y=percentiles, mode='lines+markers'))
    fig.update_layout(
        title='Percentile Plot',
        xaxis_title='Percentile',
        yaxis_title='Value'
    )
    fig.show()

def table_percentiles(data):
    percentiles = np.percentile(data, np.arange(0, 101, 10))  # Example: 0th to 100th percentiles in steps of 10
    return pd.DataFrame({
        'Percentile': np.arange(0, 101),
        'Value': percentiles
    })


In [28]:
# Collect Millisecond intervals for Arduino IDE
arduino_values = collect_values('/tmp/arduino.dat')
platformio_values = collect_values('/tmp/platformio.dat')

arduino_intervals = collect_intervals(arduino_values)
platformio_intervals = collect_intervals(platformio_values)

In [29]:
print("# Arduino IDE Stat example")
print(f"Datapoints processed: {len(arduino_intervals)}")
graph_percentiles(arduino_intervals)

print("# PlatformIO Stat example")
print(f"Datapoints processed: {len(platformio_intervals)}")
graph_percentiles(platformio_intervals)



# Arduino IDE Stat example
Datapoints processed: 15272999


# PlatformIO Stat example
Datapoints processed: 15496999


In [30]:
# Create a pandas DataFrame for arduino_intervals
pd.options.display.float_format = '{:.2f}'.format
df_arduino_intervals = pd.DataFrame(arduino_intervals, columns=['Interval'])
df_arduino_intervals.describe()

Unnamed: 0,Interval
count,15272999.0
mean,29.6
std,1012.68
min,8.0
25%,9.0
50%,9.0
75%,9.0
max,276250.0


In [31]:
# Last dozen records of the df_arduino_intervals, sorted by Interval
df_arduino_intervals_sorted = df_arduino_intervals.sort_values(by='Interval')
df_arduino_intervals_sorted.tail(1000)

Unnamed: 0,Interval
946999,36730
3170999,36742
1821999,36742
10235999,36830
14585999,36832
...,...
12357999,253984
3047999,254751
6074999,260235
10784999,264125


In [32]:
df_platformio_intervals = pd.DataFrame(platformio_intervals, columns=['Interval'])
df_platformio_intervals.describe()

Unnamed: 0,Interval
count,15496999.0
mean,29.65
std,1014.53
min,8.0
25%,9.0
50%,9.0
75%,9.0
max,277137.0


In [33]:
# Write speed statistics

total_bytes = len(platformio_values) * 4 # 4 bytes per recording
total_duration = platformio_values[-1] - platformio_values[0] # Time to write all of these

# Write speed was:
total_seconds = total_duration / 1000 / 1000
print(f"Write speed was: {total_bytes / total_seconds / 1024}Kb/s") 

# Bytes we can write per millisecond
print(f"Bytes to write in a ms: {total_bytes / total_seconds / 1000} b/ms")

Write speed was: 131.76071265046716Kb/s
Bytes to write in a ms: 134.92296975407837 b/ms
