# 2.01 - Modelling COB Peaks Timeseries
Using the peaks to assess the distribution of the data is a helpful approach to understand the distribution of meal intake over time. It has made it easier to assess the correctness of the data mapping to a daily pattern, especially given the issues with datetimes not aligning to the timezones that they are in. We'll now use the peaks to identify the COB values we are interested in modelling. The aim is to be able to assess what a standard day looks like and whether it is possible to idenfity where days are not standard, which may be due to errors in the data or due to the individual having a different pattern of meal intake. The peaks will be used to identify the COB values that are relevant for modelling, and then the timeseries will be used to assess the distribution and amplitude of those values over time. We will use the 15-minute resampled data here and focus on one of the candidates with the most defined distributions that shows a 3-meal intake clearly.

In [4]:
%load_ext autoreload
%autoreload 2
import pandas as pd
from loguru import logger
import matplotlib.pyplot as plt
import seaborn as sns

from src.cob_analysis import Cob
from src.data_processing.read import read_profile_offsets_csv
from src.config import INTERIM_DATA_DIR
from src.configurations import Configuration

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [9]:
logger.remove()

candidates = [13029224, 21946407, 27700103, 32407882, 41131654, 42360672, 67208817, 74175219, 79526193, 86025410, 95851255, 96254963, 96805916, 97417885]
individual = 41131654
args = {'height': 15, 'distance': 5, 'suppress': False}
config = Configuration()

profile_offsets = read_profile_offsets_csv(config)

cob = Cob()
cob.read_interim_data(file_name='15min_iob_cob_bg', sampling_rate=15)
df = cob.process_one_tz_individuals(profile_offsets, args)


Number of records: 786757
Number of people: 133
Systems used: 	['OpenAPS']
Categories (1, object): ['OpenAPS']
From 120, ignored 22 individuals not found in dataset, leaving 98 processed records.
The following stats are based on parameters h=15 and d=5:
	Number of records: 637125
	Number of days with peaks: 1459
	Number of peaks: 19539


The data has a 'cob max' column that we need to transform such that it only holds the values that are relevant for modelling. The peaks will be used to identify the COB values that are used for features. That removes any noise from other values. Note, the imputed values are not used and would be irrelevant anyway, given that we are focussing purely on the values that are peaks only. These would alway be original values.

In [16]:
df = df[['date', 'time', 'cob max', 'peak']].loc[individual]
df['cob max'] = df['cob max'].where(df['peak'], None)

Unnamed: 0_level_0,date,time,cob max,peak
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2019-09-01 01:15:00,2019-09-01,01:15:00,0.0,0
2019-09-01 01:30:00,2019-09-01,01:30:00,,0
2019-09-01 01:45:00,2019-09-01,01:45:00,,0
2019-09-01 02:00:00,2019-09-01,02:00:00,,0
2019-09-01 02:15:00,2019-09-01,02:15:00,,0
