# Saliva Example

<div class="alert alert-block alert-info">
This example notebook illustrates how import saliva data (cortisol, amylase, etc.), how to compute often used parameters and how to export it to perform futher analysis.
</div>

In [None]:
from pathlib import Path

import re

import pandas as pd
import numpy as np

import biopsykit as bp

import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib widget
%load_ext autoreload
%autoreload 2

In [None]:
plt.close('all')
sns.set_theme(style='ticks')

palette = bp.colors.fau_palette

sns.set_palette(palette)
palette

In [None]:
path = Path("../example_data")

## Set Saliva Time Points

In [None]:
saliva_times = [-30, -1, 30, 40, 50, 60, 70]

## Load Condition List

In [None]:
condition_list = bp.io.load_subject_condition_list(path.joinpath("condition_list.csv"), return_dict=False)
condition_list.head()

## Load Data

### Option 0: Load BioPsyKit example data 

In [None]:
df_cort = bp.example_data.get_saliva_example()

#### Example: Exclude Subjects 'Vp01' and 'Vp02' from Condition List and Cortisol DataFrame

In [None]:
dict_result = bp.utils.data.exclude_subjects(['Vp01', 'Vp02'], condition_list=condition_list, cortisol=df_cort)

dict_result
# reassign cleaned data
# df_cort = dict_result['cortisol']
# condition_list = dict_result['condition_list']

### Option 1: Use BioPsyKit to load saliva data in 'plate' format

Load Data into pandas Dataframe

In [None]:
cort_path = path.joinpath("cortisol_sample_plate.xlsx")
df_cort = bp.io.saliva.load_saliva_plate(file_path=cort_path, biomarker_type="cortisol", condition_list=condition_list)

df_cort.head()

Load and directly pass a 'condition list' to the function

In [None]:
cort_path = path.joinpath("cortisol_sample_plate.xlsx")
df_cort = bp.io.saliva.load_saliva_plate(file_path=cort_path, biomarker_type="cortisol", condition_list=condition_list)

df_cort.head()

Speficy your custom regular expressioin string to extract Subject ID and Saliva ID (see the documentation of `bp.saliva.io.load_saliva_plate()` for further information)

For example, this regex_str will extract the subject IDs **without** the `Vp` prefix and sample IDs **without** the `S` prefix

In [None]:
cort_path = path.joinpath("cortisol_sample_plate.xlsx")
regex_str = "Vp(\d+) S(\d)"
df_cort = bp.io.saliva.load_saliva_plate(file_path=cort_path, biomarker_type="cortisol", regex_str=regex_str)

df_cort.head()

### Option 2: Use BioPsyKit to load saliva data that's already in the 'correct' format

In [None]:
cort_path = path.joinpath("cortisol_sample.csv")
df_cort = bp.io.saliva.load_saliva_wide_format(file_path=cort_path, biomarker_type="cortisol", condition_col='condition')
df_cort.head()

## Save Data

Save Dataframe as csv (in standardized format)

In [None]:
#saliva.io.save_saliva(path.joinpath("cortisol_example.csv"), df_cort)

## Compute Parameters

In [None]:
df_cort.head()

### Mean and Standard Error over all Samples

In [None]:
df_cort_mean_se = bp.saliva.saliva_mean_se(df_cort)
df_cort_mean_se

### Other parameters

Compute a set of "Standard Features", including:
* `argmax`: location of maximum
* `mean`: mean value
* `std`: standard deviation
* `skew`: skewness
* `kurt`: kurtosis

In [None]:
bp.saliva.standard_features(df_cort).head()

Area under the Curve (AUC), in different variations (according to Pruessner et al. 2003):
* `auc_g`: Total Area under the Curve
* `auc_i`: Area under the Curve with respect to increae
* `auc_i_post`: (if `compute_auc_post=True`) Area under the Curve with respect to increase *after* the stressor: This assumes that we have an acute stress scenario and only *one* saliva sample before the stress test (except a possible `S0` for baseline)

In [None]:
bp.saliva.auc(df_cort, remove_s0=True, saliva_times=saliva_times).head()

Absolute maximum increase (or the relative increase in percent if `percent=True`) between the *first* sample in the data and *all others*:

In [None]:
bp.saliva.max_increase(df_cort, remove_s0=True).head()

Slope between two saliva samples (specified by `sample_idx`):

In [None]:
bp.saliva.slope(df_cort, sample_idx=(1, 4), saliva_times=saliva_times).head()

## Plot Data

### Using Seaborn (some very simple Examples)

In [None]:
fig, ax = plt.subplots(figsize=(7,5))
sns.lineplot(data=df_cort.reset_index(), x='sample', y='cortisol', hue='condition', hue_order=['Control', 'Intervention'], ci=None, ax=ax);
ax.set_xticks(df_cort.index.get_level_values('sample').unique())
ax.legend().remove()
ax.set_ylabel("Cortisol [nmol/l]")
ax.set_xlabel("Messzeitpunkte")
fig.tight_layout()

In [None]:
sns.relplot(data=df_cort.reset_index(), x='sample', y='cortisol', kind='line', hue='condition', hue_order=['Control', 'Intervention'])

In [None]:
fig, ax = plt.subplots()
sns.boxplot(data=df_cort.reset_index(), x='sample', y='cortisol', hue='condition', hue_order=['Control', 'Intervention'], ax=ax)

In [None]:
fig, ax = plt.subplots()
sns.boxplot(data=saliva.max_increase(df_cort).reset_index(), x='condition', y='cortisol_max_inc', order=['Control', 'Intervention'])

In [None]:
fig, ax = plt.subplots()

display(saliva.standard_features(df_cort).groupby('condition').mean())

data_long = pd.wide_to_long(saliva.standard_features(df_cort).reset_index(), stubnames="cortisol", sep='_', i=['subject', 'condition'], j='feature', suffix=r"\w+")
sns.boxplot(data=data_long.reset_index(), x='feature', y='cortisol', hue='condition', hue_order=['Control', 'Intervention'], ax=ax);

### Using functions from `BioPsyKit`

In [None]:
df_cort_mean_se.T

In [None]:
bp.protocols.plotting.saliva_plot(df_cort_mean_se, biomarker="cortisol", saliva_times=saliva_times[1:], test_times=[0, 30], figsize=(10, 5), test_text="TEST")