# Questionnaire Example

<div class="alert alert-block alert-info">
This example notebook illustrates how to process questionnare data.
</div>

## Setup and Helper Functions

In [None]:
from pathlib import Path

import re

import pandas as pd
import numpy as np

import biopsykit as bp
import pingouin as pg

import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib widget
%load_ext autoreload
%autoreload 2

In [None]:
plt.close("all")

palette = bp.colors.fau_palette
sns.set_theme(context="notebook", style="ticks", palette=palette)

plt.rcParams['figure.figsize'] = (8,4)
plt.rcParams['pdf.fonttype'] = 42
plt.rcParams['mathtext.default'] = "regular"

palette

## Load Questionnaire Data

In [None]:
data = pd.read_csv("../example_data/questionnaire_sample.csv", index_col='subject')
data.head()

## Example 1: Compute PSS

**Slice Dataframe and Select Columns**  
(here: all items belonging to *PSS*)

In [None]:
data_pss, columns_pss = bp.questionnaires.utils.find_cols(data, starts_with="PSS")
data_pss.head()

In [None]:
# Option 1: pass the sliced dataframe, containing only columns of the questionnaire
pss = bp.questionnaires.pss(data_pss)

# Option 2: pass the whole dataframe + a list of columns containing the questionnaire column names 
# (better suited for loops, more on that later)
pss = bp.questionnaires.pss(data, columns=columns_pss)

What we notice is that the `pss` function in `BioPsyKit` throws an error. This is because the *PSS* items in this dataset are coded from `1` to `5`, the *PSS* score, however, is computed from items that are coded from `0` to `4`. Hence, we need to convert the scores into the correct scale first by subtracting all values by `-1` using the function `biopsykit.questionnaires.utils.convert_scale()`:

**Convert Scale**

In [None]:
# For Option 1: convert the sliced PSS dataframe
data_pss_conv = bp.questionnaires.utils.convert_scale(data_pss, offset=-1)
data_pss_conv.head()

In [None]:
# For Option 2: convert only the PSS columns, leave the other columns
data_conv = bp.questionnaires.utils.convert_scale(data, cols=columns_pss, offset=-1)
data_conv.head()

Now the scores are in the correct range and we can compute the *PSS* score:

In [None]:
# Option 1: the sliced PSS dataframe
pss = bp.questionnaires.pss(data_pss_conv)
pss.head()

In [None]:
# Option 2: the whole dataframe + PSS columns
pss = bp.questionnaires.pss(data_conv, columns=columns_pss)
pss.head()

## Example 2: Compute PANAS

In our Study, PANAS was assessed *pre* and *post* Stress.

In [None]:
data_panas_pre, columns_panas_pre = bp.questionnaires.utils.find_cols(data, starts_with="PANAS", ends_with="Pre")
data_panas_post, columns_panas_post = bp.questionnaires.utils.find_cols(data, starts_with="PANAS", ends_with="Post")

In [None]:
panas_pre = bp.questionnaires.panas(data_panas_pre)
panas_pre.head()

In [None]:
panas_post = bp.questionnaires.panas(data_panas_post)
panas_post.head()

## Compute Multiple Scores at Once

Build a dictionary where each key corresponds to the questionnaire score to be computed and each value corresponds to the columns of the questionnaire. If some scores were assessed repeatedly (e.g. PANAS was assessed *pre* and *post*) separate the suffix from the computation by a `-` (e.g. `panas-pre` and `panas-post`).

In [None]:
from biopsykit.questionnaires.utils import find_cols
dict_scores = {
    'pss': find_cols(data, starts_with='PSS')[1],
    'pasa': find_cols(data, starts_with='PASA')[1],
    'panas-pre': find_cols(data, starts_with='PANAS', ends_with='Pre')[1],
    'panas-post': find_cols(data, starts_with='PANAS', ends_with='Post')[1],
}

In [None]:
# Convert scale
data_conv = bp.questionnaires.utils.convert_scale(data, cols=dict_scores['pss'], offset=-1)

In [None]:
# Compute all scores and store in result dataframe
data_scores = bp.questionnaires.utils.compute_scores(data_conv, dict_scores)
data_scores.head()

## Convert Scores into Long Format

In [None]:
data_scores.head()

Questionnaires that only have different *subscales* => Create one new index level `subscale`:

In [None]:
print(list(data_scores.filter(like='PASA').columns))

In [None]:
pasa = bp.questionnaires.utils.wide_to_long(data_scores, quest_name='PASA', levels=['subscale'])
pasa.head()

Questionnaires that have different *subscales* and different *assessment times* => Create two new index levels `subscale` and `time`

In [None]:
print(list(data_scores.filter(like='PANAS').columns))

`bp.questionnaires.questionnaire_wide_to_long()` converts the data into the wide format recursively from the *first* level (here: `subscale`) to the *last* level (here: `time`):

In [None]:
panas = bp.questionnaires.utils.wide_to_long(data_scores, quest_name='PANAS', levels=['subscale', 'time'])
panas.head()

## Plotting

### In one Plot

In [None]:
panas

In [None]:
fig, ax = plt.subplots()
bp.plotting.feature_boxplot(data=panas, x="subscale", y="PANAS", hue="time", hue_order=["pre", "post"], ax=ax)
fig.tight_layout()

### In Subplots

#### Regular

In [None]:
fig, axs = plt.subplots(ncols=3)
bp.plotting.multi_feature_boxplot(
    data=panas, 
    x="time", 
    y="PANAS", 
    features=["NegativeAffect", "PositiveAffect", "Total"], 
    group="subscale", 
    order=["pre", "post"], 
    ax=axs
)
fig.tight_layout()

#### With Significance Brackets

**Note**: See `StatsPipeline_Plotting_Example.ipynb` for further information!

In [None]:
pipeline = bp.stats.StatsPipeline(
    steps=[
        ("prep", "normality"),
        ("prep", "equal_var"),
        ("test", "pairwise_ttests")
    ],
    params={
        "dv": "PANAS",
        "groupby": "subscale",
        "subject": "subject",
        "within": "time"
    }
)

pipeline.apply(panas);

In [None]:
fig, axs = plt.subplots(ncols=3)

features = ["NegativeAffect", "PositiveAffect", "Total"]

box_pairs, pvalues = pipeline.sig_brackets(
    "test", 
    stats_type="within", 
    plot_type="single", 
    x="time", 
    features=features, 
    subplots=True
)

bp.plotting.multi_feature_boxplot(
    data=panas, 
    x="time", 
    y="PANAS", 
    features=features, 
    group="subscale", 
    order=["pre", "post"], 
    stats_kwargs={"box_pairs": box_pairs, "pvalues": pvalues},
    ax=axs
)
for ax, feature in zip(axs, features):
    ax.set_title(feature)

fig.tight_layout()