# 2.06 - Defining an Objective Function
This workbook takes us through the formulation of an objective function to help indicate the 'goodness' of a night, in terms of glycaemic control, where the night is defined as the period between 22:00 and 06:00 the following day. To recap, the following measures have been studied and will contribute as components to the objective function. These are recorded at each 30 minutes interval in our dataset:
- Blood Glucose Mean
- Blood Glucose Standard Deviation
- Coefficient of Variation
- Intervals Outside Level 1 Target Range
- Intervals Outside Level 2 Target Range
- Amplitude of Glycaemic Variability (absolute value above 1 SD from the mean)
- Peaks of Carbohydrate Intake

The goal is still to produce a continuous variable where the score indicates the quality of the night, encompassing the physiological impact observed through glycaemic metrics and carbohydrate intake, with a higher score indicating greater disturbance.

For each night (22:00 to 06:00), the 30-minutes intervals are aggregated data for the following metrics:
- Blood Glucose Mean (BG_Mean_Night): The average of all 30-minute BG means throughout the night. A higher average BG suggests potential discomfort or dysregulation.
- Blood Glucose Standard Deviation (BG_SD_Night): The average of all 30-minute BG standard deviations (or the overall standard deviation of all BG readings during the night). High variability can indicate instability and potential physiological stress.
- NOTE: The Law of Total Variance for Aggregated Data (for which we have at the 30 minute intervals, from the original irregular time series provides a more statistically robust measure of variability than the simple average of standard deviations):
  - $\text{Var}(X) = E[\text{Var}(X|Y)] + \text{Var}(E[X|Y])$
  - This means that the variance of the aggregated data is the sum of the expected variance within each group and the variance of the group means. The theoretical formula for the overall standard deviation of the aggregated data is as follows:
Let $X$ be the original, high-frequency blood glucose readings.
Let $Y$ represent the 30-minute time intervals (or groups).
Let $g_i$ be the blood glucose mean for interval $i$.
Let $s_i$ be the blood glucose for interval $i$.
Let $s_i^2$ be the variance within interval $i$ (which is $s_i$ squared).
Let $n_i$ be the number of original readings that were aggregated into interval $i$.
Leading to the formula for variance:

$\text{Var}_{overall} = \frac{\sum_{i=1}^K n_i \cdot s_i^2}{\sum_{i=1}^K} + \frac{\sum_{i=1}^K n_i \cdot (g_i-\bar G)^2}{\sum_{i=1}^K n_i}$

Where:
- $K$ is the total number of 30-minute intervals in the night.
- $\bar G$ is the overall mean of all blood glucose readings during the night.

- Coefficient of Variation (CV_Night): The average of the 30-minute Coefficient of Variation values. This normalizes variability to the mean, providing a robust measure of relative glucose fluctuations, which can be disturbing.
- Intervals Outside Level 1 Target Range (L1_Excursions_Night): The sum or count of 30-minute intervals where a Level 1 (clinically defined) excursion occurred. These indicate deviations from a generally acceptable range, potentially causing discomfort or impacting restorative processes.
- Intervals Outside Level 2 Target Range (L2_Excursions_Night): The sum or count of 30-minute intervals where a Level 2 (more severe) excursion occurred. These represent more significant deviations, very likely disruptive.
- Mean Amplitude of Glycaemic Variability (MAGE): The sum or average of the absolute amplitude values (above 1 SD from the mean). These capture significant, rapid swings in glucose, which are often associated with physiological stress or symptoms that could disturb sleep, even if within clinical thresholds. If the dataset is very sparse for this, consider summing non-zero amplitudes or taking the maximum amplitude for the night to highlight the presence of any large swings.
- Peaks of Carbohydrate Intake (COB_Peaks): The count of peaks of COB for the night.

- $G_t$:Blood glucose reading at time t.
- $T$: Total number of time intervals in the observed period.
- $N$: The set of all glucose readings during the nocturnal period, ${{G_1, G_2, ..., G_T}}$.
- $G_{L1\_lower}$: Lower threshold for Level 1 excursion.
- $G_{L1\_upper}$: Upper threshold for Level 1 excursion.
- $G_{L2\_lower}$: Lower threshold for Level 2 excursion.
- $G_{L2\_upper}$: Upper threshold for Level 2 excursion.

All features are scaled using `Standard Scaler` to ensure they contribute equally to the objective function. The scaling is done using the mean and standard deviation of each feature across all nights:

$X_{scaled} = \frac{X - \mu_X}{\sigma_X}$

where $X$ is the original feature value, $\mu_X$ is the mean of the feature across all nights, and $\sigma_X$ is the standard deviation of the feature across all nights.

1. Minimising Variance Using Standard Deviation

$\mu_G = \frac{1}{T} \sum_{t=1}^{T} G_t$ is the mean of the glucose readings during the nocturnal period, thus the standard deviation is calculated as $\sigma_G = \sqrt{\frac{1}{T} \sum_{t=1}^{T} (G_t - \mu_G)^2}$.

3. Minimising Glycaemic Variability Using Coefficient of Variation

Standard deviation is a measure of variability, but it does not account for the mean level of glucose. The Coefficient of Variation (CV) is defined as $CV = \frac{\sigma_G}{\mu_G}$, which normalises the standard deviation by the mean glucose level, providing a relative measure of variability.

4. Minimising Excursions Outside Target Ranges

The ranges used here provide a clinically relevant context for evaluating glucose levels. The Level 1 excursion is defined as $G_{L1\_lower} \leq G_t \leq G_{L1\_upper}$, and the Level 2 excursion is defined as $G_{L2\_lower} \leq G_t \leq G_{L2\_upper}$. The number of intervals outside these ranges can be counted as follows:
- $L1\_Excursions = \sum_{t=1}^{T} \mathbb{1}_{G_t < G_{L1\_lower} \lor G_t > G_{L1\_upper}}$
- $L2\_Excursions = \sum_{t=1}^{T} \mathbb{1}_{G_t < G_{L2\_lower} \lor G_t > G_{L2\_upper}}$

5. Minimising Amplitude of Glycaemic Variability

....

6. Minimising Peaks of Carbohydrate Intake

Carbohydrate intake is a significant factor in glucose regulation. Peaks of carbohydrate intake can be counted as the number of times the carbohydrate intake exceeds a certain threshold during the nocturnal period. This can be defined as:
- $COB\_Peaks = \sum_{t=1}^{T} \mathbb{1}_{COB_t > COB\_threshold}$, where $COB_t$ is the carbohydrate intake at time t.
No carbohydrate height parameter is set as any peak is considered an indicator that the person is awake and eating, which is not ideal for a good night.

The objective function $J$ can be defined as a weighted sum of these components:

$J = \frac{w_1 \cdot BG\_Mean\_Night + w_2 \cdot BG\_SD\_Night + w_3 \cdot CV\_Night + w_4 \cdot L1\_Excursions\_Night + w_5 \cdot L2\_Excursions\_Night + w_6 \cdot MAGE + w_7 \cdot COB\_Peaks}$


In [1]:
%load_ext autoreload
%autoreload 2

from src.nights import Nights, consolidate_df_from_nights
from datetime import time
import pandas as pd
from src.config import PROCESSED_DATA_DIR
from src.features import FeatureSet

[32m2025-07-16 21:38:15.561[0m | [1mINFO    [0m | [36msrc.config[0m:[36m<module>[0m:[36m11[0m - [1mPROJ_ROOT path is: C:\Users\ross\OneDrive\Documents\Masters\Project\masters_project[0m


In [2]:
df_all = pd.read_parquet(PROCESSED_DATA_DIR / 'night_clusters.parquet')
df_all['time'] = df_all.index.get_level_values('datetime').time

sample_rate = 30
nights_objects = []

night_start = time(22, 0)
morning_end = time(6, 0)

for zip_id, df_ind in df_all.groupby('id'):
    df_ind_reset = df_ind.reset_index(level='id', drop=True)
    nights_objects.append(
        Nights(df=df_ind_reset, zip_id=zip_id, night_start=night_start,
               morning_end=morning_end, sample_rate=sample_rate))

df_overnight = consolidate_df_from_nights(nights_objects)

In [3]:
features = FeatureSet(df_overnight, sample_rate=sample_rate)
df_features = features.get_all_features(scale=False)

bg min    float64
bg max    float64
dtype: object


In [5]:
features_scaled = FeatureSet(df_overnight, sample_rate=sample_rate)
df_scaled = features_scaled.get_all_features(scale=True)

bg min    float64
bg max    float64
dtype: object
Scaling ['iob mean', 'cob mean', 'bg mean', 'bg min', 'iob max', 'cob max', 'bg max', 'day_type_weekday', 'day_type_weekend', 'iob mean_rate_of_change', 'cob mean_rate_of_change', 'bg mean_rate_of_change', 'iob mean hourly_mean', 'cob mean hourly_mean', 'bg mean hourly_mean', 'bg max_peaks_above_mean', 'iob max_peaks_above_mean', 'cob max_peaks_above_mean', 'hour_of_day', 'hour_sin', 'hour_cos', 'l1_hypo', 'l1_hyper', 'l2_hypo', 'l2_hyper', 'excursion_amplitude', 'excursion_flag', 'cob_peaks'] columns


In [24]:
feature_cols = ['bg mean', 'l1_hypo', 'l1_hyper', 'l2_hypo', 'l2_hyper', 'excursion_amplitude', 'excursion_flag', 'cob_peaks']

In [25]:
df_scaled.columns

Index(['iob mean', 'cob mean', 'bg mean', 'bg min', 'iob max', 'cob max',
       'bg max', 'day_type_weekday', 'day_type_weekend',
       'iob mean_rate_of_change', 'cob mean_rate_of_change',
       'bg mean_rate_of_change', 'iob mean hourly_mean',
       'cob mean hourly_mean', 'bg mean hourly_mean',
       'bg max_peaks_above_mean', 'iob max_peaks_above_mean',
       'cob max_peaks_above_mean', 'hour_of_day', 'hour_sin', 'hour_cos',
       'l1_hypo', 'l1_hyper', 'l2_hypo', 'l2_hyper', 'excursion_amplitude',
       'excursion_flag', 'cob_peaks'],
      dtype='object')