# Working out measurement timings and interpolation for sensors
This notebook is developed to show how QuinCe handles different intervals of taking measurements used by various sensors.

This is rarely a problem for ships which have the power to operate continuously, taking regular measurements (e.g. every minute). However, many moorings have limited power, so will either take one measurement every few hours or take a group of measurements before sleeping for a period.

## General principle
Assuming that QuinCe is taking a carbon dioxide measurement, it will look for corresponding measurements for other required parameters (e.g. SST, Salinity) at the same time. If no such measurement exists, QuinCe will look for measurements either side of the CO₂ measurement within five minutes and perform a simple linear interpolation to estimate the value at the required time. If no such measurements are available, the measurement is not processed.

For instruments that take measurements at extended intervals, the five minute limit is not suitable. QuinCe must therefore detect this situation and act differently. There are a number of measurements regimes that must be handled.

- Regular measurements at short intervals (< 5 minutes)
- Regular single measurements at long intervals (> 5 minutes)
- Groups of measurements at long intervals (e.g. 5 measurements at one minute intervals, every 4 hours)

QuinCe will interpolate values differently for each of these situatons.

### Regular measurements at short intervals (CONTINUOUS mode)
QuinCe will find the closest value before and after the required time within five minutes, and perform a linear interpolation between them. If a found value is not flagged Good, it will look for further values within five minutes. If no such value is found, it will use either a Questionable or Bad value within five minutes (and flag the calculation result accordingly), or not perform the calculation if there are no available values at all.

### Regular single measurements at long intervals (SPACED mode)
QuinCe determine the length between measurements and use that as the interpolation limit instead of five minutes. This will result in QuinCe finding the closest values within that time and interpolate between them. If there is no value within that time (e.g. if the sensor skipped one measurement), then QuinCe will not look further to find a usable value. Flagged values will be treated as for the short interval situation above.

### Groups of measurements at long intervals (GROUP PLUS SLEEP mode)
In this situation QuinCe will not treat values individually. Instead the values from each group will be averaged, and then those averages will be treated as single values and interpolated per 'Regular single measurements at long intervals'. This will happen even if the required timestamp falls within one of the groups of measurements.

When grouping measurements, any values flagged Questionable or Bad will be excluded from the mean, so only Good values are used. If only Questionable or Bad values are availble in a group then that flag will be applied to the calculated mean value (and passed on to the calculation result). If no values are available to construct a mean within the time limit of the longer period, then QuinCe will not look beyond that limit. Selecting values according to their flags is beyond the scope of this notebook, which is purely for understanding the measurement grouping algorithm.

*Note:* When calibration gas standards are processed, these are averaged within their group. However, this calculation is independent of the detection of the measurement mode and does not need to be considered here.

### Combining GROUP PLUS SLEEP and SPACED
In practice, the SPACED mode can be treated in exactly the same way as the GROUP PLUS SLEEP mode, since this is conceptually the same with groups of only one measurement. Therefore SPACED mode will not be used further in this notebook.

## Determining which strategy to use
This notebook contains the algorithm for determining which of the above strategies will be used. The first part of the notebook is a step-by-step guide to how the algorithm works, and the second part runs the complete algorithm for a number of different scenarios.

# Part One: The Algorithm
This section builds the algorithm step by step using a single example so it's easy to follow the logic.

## Python Setup
First we initialise all the libraries and constants we'll need.

In [5]:
import pandas as pd
from IPython.display import display, HTML

CONTINUOUS_MEASUREMENT_INTERVAL = 300

## Example 1

### Load file and extract timestamps
First we load the data file. For this notebook, each example file contains two columns: `Date/Time` and `Value`. The `Value` column is not used to define groups of measurements.

In [6]:
data = pd.read_csv('example1.csv', parse_dates=['Date/Time'])
timestamps = data['Date/Time']
display(HTML(pd.DataFrame(timestamps).head(15).to_html(index=False)))

Date/Time
2023-05-12 19:54:41+00:00
2023-05-12 19:54:42+00:00
2023-05-12 19:54:43+00:00
2023-05-12 19:54:44+00:00
2023-05-12 19:54:45+00:00
2023-05-12 19:54:46+00:00
2023-05-12 19:54:47+00:00
2023-05-12 19:54:48+00:00
2023-05-12 19:54:49+00:00
2023-05-12 19:54:50+00:00


### Calculate intervals between measurements
We convert all the timestamps to UNIX seconds, and for each pair of measurements calculate the time between them.

In [7]:
# Convert timestamps to seconds
seconds = timestamps.apply(lambda x: x.round('s').timestamp())

# Get the difference between each timestamp
# The first row will be empty (because there is no previous stamp to
# get a difference from), so we drop NA values
timesteps = seconds.diff().dropna().astype('int64')

# For display
timesteps_df = pd.DataFrame(timesteps).rename(columns={'Date/Time':'Time steps'})
display(HTML(timesteps_df.head(15).to_html(index=False)))

Time steps
1
1
1
1
1
1
1
1
1
21591


### Group measurments in time
Now we can collect the measurements into groups of "consecutive" measurements. We define measurements as being in a group if the time between them is less than the limit used for continuous measurements, i.e. 5 minutes.

To build these groups, we iterate through the `timesteps` series above. If the timestep is ≤ the 5 minute limit, we add it to the current group. If the timestep is larger than that, we start a new group.

In [8]:
groups = []

group_size = 0
for i in range(0, len(timesteps)):
    if timesteps.iloc[i] > CONTINUOUS_MEASUREMENT_INTERVAL:
        if group_size > 0:
            groups.append(group_size)
            group_size = 0

    group_size += 1

if group_size > 0:
    groups.append(group_size)

print(groups)

[9, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 9, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 9, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10

### Analysing the groups
We pull out some basic statistics about the groups:

In [9]:
print(f'Number of groups: {len(groups)}')
print(f'Mean group size: {sum(groups) / len(groups)}')
print(f'Max group size: {max(groups)}')

Number of groups: 374
Mean group size: 9.98663101604278
Max group size: 10


### Determination of measurement mode
The largest group contains 10 measurements, and here all bar one of the groups contains 10 measurements, giving a mean group size of very close to 10. From this it is easy to conclude that the measurements are taken in GROUP PLUS SLEEP mode.

## A second example
Here is another, slightly more complex example. It's from a General Oceanics system, which is configured to take 100 measurements, then 5 measurements from gas standards. This results in group sizes alternating between 100 and 5.

In [10]:
data = pd.read_csv('example2.csv', parse_dates=['Date/Time'])
timestamps = data['Date/Time']

# Convert timestamps to seconds
seconds = timestamps.apply(lambda x: x.round('s').timestamp())

# Get the difference between each timestamp
# The first row will be empty (because there is no previous stamp to
# get a difference from), so we drop NA values
timesteps = seconds.diff().dropna().astype('int64')

groups = []

group_size = 0
for i in range(0, len(timesteps)):
    if timesteps.iloc[i] > CONTINUOUS_MEASUREMENT_INTERVAL:
        if group_size > 0:
            groups.append(group_size)
            group_size = 0

    group_size += 1

if group_size > 0:
    groups.append(group_size)

print(groups)
print()
print(f'Number of groups: {len(groups)}')
print(f'Mean group size: {sum(groups) / len(groups)}')
print(f'Max group size: {max(groups)}')

[4, 100, 5, 100, 5, 100, 5, 100, 5, 100, 5, 100, 5, 100, 5, 100, 5, 100, 5, 100, 5, 100, 5, 100, 5, 100, 5, 100, 5, 100, 5, 100, 5, 100, 5, 100, 5, 100, 5, 100, 5, 100, 5, 100, 5, 100, 5, 100, 5, 100, 5, 100, 5, 100, 5, 100, 5, 100, 5, 100, 5, 100, 5, 100, 5, 100, 5, 100, 5, 100, 5, 100, 5, 100, 5, 100, 5, 100, 5, 100, 5, 100, 5, 100, 5, 100, 5, 100, 5, 100, 5, 100, 5, 100, 5, 100, 5, 100, 5, 100, 5, 100, 5, 56, 41, 5, 100, 5, 100, 5, 39]

Number of groups: 111
Mean group size: 51.44144144144144
Max group size: 100


### Conclusion
In this case both the mean group size and maximum group size are much larger, indicating that there are extended periods of measurements. We will treat these as CONTINUOUS mode measurements.

## Thresholds
The determining factors for determining between CONTINUOUS mode and GROUP WITH SLEEP mode are the mean group size and maximum group size. For both of these, experimentation shows that a threshold of 25 is a good cutoff. Therefore the decision algorithm will be:

```python
if mean_group_size <= 25 or max_group_size <= 25:
    mode = GROUP_WITH_SLEEP
else:
    mode = CONTINUOUS
```

# Part Two: More Examples
This section contains the complete algorithm in one function, together with several examples of its use.

## The complete algorithm

In [11]:
CONTINUOUS_MEASUREMENT_INTERVAL = 300
MODE_THRESHOLD = 25

def extract_timestamps(file):
    data = pd.read_csv(file, parse_dates=['Date/Time'])
    return data['Date/Time']

def get_strategy(timestamps):
    # Convert the timestamps to seconds, and calculate the time differences between each
    seconds = timestamps.apply(lambda x: x.round('s').timestamp())
    timesteps = seconds.diff().dropna().astype('int64')
    
    groups = []
    
    group_size = 0
    for i in range(0, len(timesteps)):
        if timesteps.iloc[i] > CONTINUOUS_MEASUREMENT_INTERVAL:
            if group_size > 0:
                groups.append(group_size)
                group_size = 0
        
        group_size += 1
    
    if group_size > 0:
        groups.append(group_size)
    
    mean_group_size = sum(groups) / len(groups)
    max_group_size = max(groups)
    
    if mean_group_size <= MODE_THRESHOLD or max_group_size <= MODE_THRESHOLD:
        mode = 'GROUP_WITH_SLEEP'
    else:
        mode = 'CONTINUOUS'
    
    print(f'Group sizes: {groups}')
    print()
    print(f'Mean group size: {sum(groups) / len(groups)}')
    print(f'Max group size: {max(groups)}')
    print(f'Mode: {mode}')

def run_example(filename):
    timestamps = extract_timestamps(filename)
    get_strategy(timestamps)

## Example 3: Varying Continuous Measurements
This dataset is from a ship that measures a little more often than every two minutes, with occasional periods where no measurements are taken. This should be detected as CONTINUOUS mode.

In [12]:
run_example('example3.csv')

Group sizes: [1602, 1644, 1624, 1370, 1621, 1649, 1674, 1605, 1641, 1564, 1576, 1579, 1577, 1554, 1535, 1488, 1516, 2, 1543, 1587, 465, 67, 30, 829, 16, 1585, 1585, 1631, 1604, 1576, 1623, 1585, 1613, 1537, 1636, 1591, 1632, 1629, 1617, 5, 1572, 1653, 1661, 1670, 1668, 1683, 1585, 1677, 1660, 1660, 1644, 1665, 1645, 1647, 1684, 1667, 1675, 1662, 1662, 1649, 27, 1636, 1663, 1633, 1672, 1645]

Mean group size: 1442.0
Max group size: 1684
Mode: CONTINUOUS


## Example 4: General Oceanics-like
This dataset is from an instrument similar to a General Oceanics system that measures approximately every two minutes. It is configured to take 80 measurements between gas standards. This should be detected as CONTINUOUS mode.

In [13]:
run_example('example4.csv')

Group sizes: [4, 80, 5, 80, 5, 80, 5, 80, 5, 80, 5, 80, 5, 80, 5, 80, 5, 80, 5, 80, 5, 80, 5, 80, 5, 80, 5, 80, 5, 80, 5, 80, 5, 80, 5, 80, 5, 80, 5, 80, 5, 80, 5, 80, 5, 80, 5, 80, 5, 80, 5, 80, 5, 80, 5, 80, 5, 80, 5, 80, 5, 80, 5, 80, 5, 80, 5, 80, 5, 80, 5, 80, 5, 80, 5, 80, 5, 80, 5, 80, 5, 80, 5, 80, 5, 80, 5, 80, 5, 80, 5, 80, 5, 80, 5, 80, 5, 80, 5, 80, 5, 80, 5, 80, 5, 80, 5, 80, 5, 80, 5, 80, 5, 80, 5, 80, 5, 80, 5, 80, 5, 80, 5, 80, 5, 80, 5, 80, 5, 80, 5, 80, 5, 80, 5, 80, 5, 80, 5, 80, 5, 80, 5, 80, 5, 80, 5, 80, 5, 80, 5, 80, 5, 80, 5, 80, 5, 80, 5, 80, 5, 80, 5, 80, 5, 80, 5, 15, 5, 80, 5, 78]

Mean group size: 42.104651162790695
Max group size: 80
Mode: CONTINUOUS


## Example 5: Varying Continuous Measurements II
This dataset is from an instrument that has a variety of running modes of various length. Importantly, though, there are periods of several hundred continuous measurements. These can be considered the 'real' measurements, while the short groups of 6 or 10 measurements are gas standards, flushing, or other non-measurement activities. This should be detected as CONTINUOUS mode.

In [14]:
run_example('example5.csv')

Group sizes: [5, 6, 10, 10, 6, 10, 6, 10, 10, 6, 10, 547, 679, 671, 679, 546, 6, 6, 10, 6, 361, 6, 6, 10, 469, 6, 6, 10, 477, 6, 6, 10, 486, 6, 6, 10, 428, 6, 6, 10, 386, 6, 6, 10, 6, 10, 10, 6, 10, 6, 473, 386, 287, 64, 6, 207, 6, 6, 10, 309, 6, 6, 10, 448, 6, 6, 10, 6, 91, 6, 6, 10, 6, 10, 10, 6, 10, 378, 6, 6, 10, 423, 6, 6, 10, 343, 6, 6, 10]

Mean group size: 108.34831460674157
Max group size: 679
Mode: CONTINUOUS


## Example 6: Two measurements then sleep
This dataset is from a mooring that wakes up, takes two measurements five seconds apart, and then sleeps. Sometimes the sleep fails, resulting two group of measurements being taken consecutively. This should be detected as GROUP_PLUS_SLEEP.

In [15]:
run_example('example6.csv')

Group sizes: [1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,

## Example 7: CONTROS
This example is from a CONTROS HydroC sensor. After a startup period of consecutive measurements, it settles into its configured setup of taking 5 measurements between sleeps. While the maximum group size is above the threshold for CONTINUOUS measurements, the mean group size is not.

This example shows that the algorithm does not need to have knowledge of the instruments' running modes - the statistics of the timestamps will be sufficient to detect the true measurement mode. While it is true that a very short dataset encompassing only the first four or five groups would likely be misclassified, in reality a dataset that small would not be considered large enough for processing in QuinCe.

In [16]:
run_example('example7.csv')

Group sizes: [18, 30, 5, 5, 1, 5, 5, 5, 2, 5, 5, 5, 5, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5]

Mean group size: 6.0
Max group size: 30
Mode: GROUP_WITH_SLEEP


## Example 8: Single measurements every 30 minutes
Data from a mooring that takes one measurement every 30 minutes, and then goes to sleep. This would qualify as SPACED measurements (see introduction), but as described there using GROUP_PLUS_SLEEP is equivalent.

In [17]:
run_example('example8.csv')

Group sizes: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,