## Limits of Autoregulation - Summary Statistics

In this notebook we look at the characteristics of the physiologic data when patients move from inside to outside the limits of autoregulation.

#### Useful Imports and Preliminaries

In [None]:
import os
import sys
sys.path.append("..")  # add project root

import h5py

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns

from tqdm import tqdm

from src.data_utils import *
from src.constants import *

# pending useful sklearn imports

In [None]:
pd.options.display.float_format = '{:10,.2f}'.format

In [None]:
np.random.seed(420)

In [None]:
sns.set_theme(context="talk")

In [None]:
# path constants
data_dir = "/home/mr2238/project_pi_np442/mr2238/accelerate/data"
img_dir = "/home/mr2238/project_pi_np442/mr2238/accelerate/imgs/overview"
global_path = "/home/mr2238/project_pi_np442/mr2238/accelerate/data/processed/all_data.hdf5"
labels_path = os.path.join(data_dir, "labels")
raw_data_path = os.path.join(data_dir, "raw_data")

In [None]:
# list files
h5py_files = [f for f in os.listdir(raw_data_path) if f.endswith(".icmh5")]
print(f"Number of h5py files: {len(h5py_files)}")
print(f"Example file: {h5py_files[0]}")

### In and Out Statistics

#### Define Limits

Here we explore how much a given patient is within their autoregulatory limits, to get a sense of how to distribute this.
1) Definition: Patient is outside limits of autoregulation if at a given time *t*, the mean ABP value over the minute prior to time *t* is outside the limits calculated at *t*.
2) What do we want?
    - Distribution of % time outside limits
    - Duration of time outside limits: per patient, distribution, median, average
    - Number of times outside limits per patient
    - Correlation between time elapsed and likelihood of being outside of limits: t vs. at time t, what fraction of patients are outside limits?
    - time spent outside autoregulation vs time to autoregulation calc

#### Count Time In and Out

In [None]:
true_false = []
ins = []
mode = 'mean'
with h5py.File(global_path, "r") as f:
    for pt in f:
        in_out_df = pd.DataFrame(f[f"{pt}/processed/in_out_{mode}/in_out"][...])
        in_out = in_out_df.sum() / in_out_df.shape[0]

        true_false.append(in_out)
        ins.append(in_out_df)


In [None]:
# quick check that the ins bool arrays make sense
bool_in_out = pd.concat(ins).reset_index(drop=True)
bool_in_out.describe()

In [None]:
# calculate percent time in/out per patient
percent_time = 1 - np.array(true_false).squeeze()
print(percent_time.shape)

In [None]:
# to actually get percentage of time spent outside autoregulation, we need to get 

In [None]:
# plot distribtion
with plt.rc_context({'xtick.bottom': True, 'ytick.left': True}):
    fig, ax = plt.subplots(figsize=(12,6))
    fig.tight_layout(pad=2)
    sns.histplot(percent_time * 100, ax=ax, stat="probability", edgecolor=(0, 0, 0, 0.5), alpha=0.5)
    # sns.kdeplot(
    #     percent_time / percent_time.sum(),
    #     ax=ax,
    #     bw_adjust=1,
    #     linewidth=2,
    #     cut=0,
    # )
    ax.set_title(f"Percentage of time spent outside autoregulatory limits (192 files, 189 patients)")
    ax.set_xlabel('Percentage of Time Outside Limits')
    ax.set_ylabel("Density")
    ax.set_ylim(0, 0.25)
    ax.set_xticks(np.arange(0, 1, 0.1), minor=True)

    mean_value = pd.Series(percent_time).mean()
    ax.axvline(x=mean_value * 100, color='red', linestyle='--', label=f'Mean: {mean_value * 100:.1f}%')
    ax.legend()

    med_value = pd.Series(percent_time).median()
    ax.axvline(x=med_value * 100, color='green', linestyle='--', label=f'Median: {med_value * 100:.1f}%')
    ax.legend()

    img_name = f"in_out_percent_distribution.png"
    plt.savefig(os.path.join(img_dir, img_name), bbox_inches='tight')
    plt.show()