# CARWatch – Questionnaire Data Cleaning and Processing

This Notebook processes questionnaire data and extracts relevant columns. The informaton used from the questionnaire data are:
* Chronotype: assessed by *Morningness-Eveningness Questionnaire (MEQ)*
* Sleep Information: Self-reported Bed Time, Sleep Onset, Wake Onset

As Questionnaire for Chronotype Assessment we use the Morningness Eveningness Questionnaire (MEQ) from Horne and Östberg (1976).

```
Horne, J. A., & Östberg, O. (1976). A self-assessment questionnaire to determine morningness-eveningness in human circadian rhythms. International journal of chronobiology.
```

In [None]:
from pathlib import Path
import json

import pandas as pd
import numpy as np
import pingouin as pg

import biopsykit as bp
from biopsykit.questionnaires import pss
from biopsykit.questionnaires.utils import invert, find_cols, wide_to_long, convert_scale
from biopsykit.utils.dataframe_handling import int_from_str_idx, camel_to_snake

from fau_colors import cmaps

import matplotlib.pyplot as plt
import seaborn as sns

from carwatch_analysis.datasets import CarWatchDatasetRaw

%matplotlib widget
%load_ext autoreload
%autoreload 2

In [None]:
plt.close("all")

palette = sns.color_palette(cmaps.faculties)
sns.set_theme(context="notebook", style="ticks", palette=palette)

plt.rcParams["figure.figsize"] = (8, 4)
plt.rcParams["pdf.fonttype"] = 42
plt.rcParams["mathtext.default"] = "regular"

pg.options["round"] = 4

palette

## Load Questionnaire Data

In [None]:
deploy_type = "local"

In [None]:
# build path to data folder
config_dict = json.load(Path("../../../config.json").open(encoding="utf-8"))
base_path = Path("..").joinpath(config_dict[deploy_type]["base_path"])
base_path

In [None]:
export_path = base_path.joinpath("questionnaire/processed")
bp.utils.file_handling.mkdirs(export_path)

In [None]:
dataset = CarWatchDatasetRaw(base_path)

In [None]:
quest_data = dataset.questionnaire
quest_data

## Condition

In [None]:
cond_data = bp.questionnaires.utils.wide_to_long(quest_data.filter(like="condition"), quest_name="condition", levels="night")
cond_data = bp.utils.dataframe_handling.int_from_str_idx(cond_data, idx_levels="night", regex="N(\d)", func = lambda x: x-1)
cond_data = cond_data.reset_index().set_index(["subject", "night", "condition"])

cond_data = bp.utils.dataframe_handling.apply_codebook(cond_data, dataset.codebook)
cond_data.head()

## Chronotype

### Convert MEQ Questionnaire Items Into Right Format

In [None]:
df_meq = find_cols(quest_data, starts_with="MEQ", ends_with="[0-9]")[0]

# Recode MEQ01
df_meq.loc[:, "MEQ_01"].replace({1: 1, 2: 1, 3: 2, 4: 3, 5: 3, 6: 4, 7: 4, 8: 5}, inplace=True)
# Recode MEQ02
df_meq.loc[:, "MEQ_02"].replace({1: 1, 2: 1, 3: 2, 4: 3, 5: 3, 6: 4, 7: 4, 8: 5}, inplace=True)
# Recode MEQ10
df_meq.loc[:, "MEQ_10"].replace({1: 1, 2: 1, 3: 2, 4: 3, 5: 3, 6: 4, 7: 4, 8: 5}, inplace=True)

# Invert columns that were in inverted order in questionnaire (to comply with biopsykit implementation)
invert_cols = ["MEQ_03", "MEQ_08", "MEQ_09", "MEQ_19"]
invert(df_meq.loc[:, invert_cols], score_range=[1, 4], inplace=True)

# Invert columns that were in inverted order in questionnaire (to comply with biopsykit implementation)
invert_cols = ["MEQ_17", "MEQ_18"]
invert(df_meq.loc[:, invert_cols], score_range=[1, 5], inplace=True)

meq = bp.questionnaires.meq(df_meq)

### Compute MEQ and Classify Chronotypes

From the MEQ score Chronotypes can be classified in two different ways:
* Fine Classification (5 levels, column `Chronotype_Fine`):
    - 0: definite evening type (MEQ score 14-30)
    - 1: moderate evening type (MEQ score 31-41)
    - 2: intermediate type (MEQ score 42-58)
    - 3: moderate morning type (MEQ score 59-69)
    - 4: definite morning type (MEQ score 70-86)
* Coarse Classification (3 levels, column `Chronotype_Coarse`):
    - 0: evening type (MEQ score 14-41)
    - 1: intermediate type (MEQ score 42-58)
    - 2: morning type (MEQ score 59-86)

In [None]:
meq.head()

### Further Information

#### MEQ Histogram

In [None]:
fig, ax = plt.subplots()
meq["MEQ"].plot(kind="hist", ax=ax)
ax.axvline(41, color="grey", ls="--")
ax.axvline(58, color="grey", ls="--")
ax.set_xlabel("MEQ Score")
ax.set_ylabel("Count")
fig.tight_layout()

#### Chronotype Prevalence

In [None]:
pd.DataFrame(meq["Chronotype_Coarse"].value_counts())

In [None]:
meq.describe().T

## Sleep Information

### Ideal Bedtime Ranges

In [None]:
bedtime_ranges = {
    1: ["01:45:00", "03:00:00"], 
    2: ["00:30:00", "01:45:00"], 
    3: ["22:15:00", "00:30:00"], 
    4: ["21:00:00", "22:15:00"], 
    5: ["20:00:00", "21:00:00"]
}

bedtime_ranges = pd.DataFrame(bedtime_ranges, index=["start", "end"]).T
bedtime = pd.DataFrame({
    "ideal_bed_{}".format(key): df_meq["MEQ_02"].replace(bedtime_ranges[key])
    for key in ["start", "end"]
})

### Self-Report Sleep Data

In [None]:
sleep_cols = ["bed", "sleepOnset", "wakeOnset", "getup"]
times_selfreport = quest_data.filter(regex=f"({'|'.join(sleep_cols)})Selfreport_*")
times_selfreport.head()

## Restructure Questionnaire Data for Export

In [None]:
# drop all unnecessary columns:
## bedtimes are exported separately
quest_copy = quest_data.drop(columns=times_selfreport.columns)
## cortisol values are exported separately
quest_copy = quest_copy.drop(columns=quest_copy.filter(like="cort").columns)
## not needed anymore (sleep endpoints are computed new)
quest_copy = quest_copy.drop(columns=quest_copy.filter(regex="(sleepOnset|wakeOnset)Sensor_*").columns)
## condition is extracted separately
quest_copy = quest_copy.drop(columns=quest_copy.filter(like="condition").columns)
## Night 3 is not needed
quest_copy = quest_copy.drop(columns=quest_copy.filter(like="N3").columns)
## MEQ is exported separately
quest_copy = quest_copy.drop(columns=quest_copy.filter(like="MEQ").columns)

## manual weekend and chronotype labels not needed
quest_copy = quest_copy.drop(columns=["chronotypeManual", "hasWeekend"])

## extract PSS data and convert to long-format
pss_columns = wide_to_long(quest_copy, "PSS", levels=["night"])
## drop PSS columns from dataframe
quest_copy = quest_copy.drop(columns=quest_copy.filter(like="PSS"))
## PSS-L (Labor) is inconsistent => drop
pss_columns = pss_columns.drop("L", level="night")

## compute PSS scores
pss_data = convert_scale(pss_columns, -1)
pss_data = pss(pss_data)


nightly_data = quest_copy.filter(regex="\w+_N\d")
quest_copy = quest_copy.drop(columns=nightly_data.columns)

nightly_data = pd.wide_to_long(
    nightly_data.reset_index(), 
    stubnames=["wakeupSource", "SubjectiveSleepQuality"], 
    i="subject", 
    j="night", 
    sep="_", 
    suffix="N\w"
)

quest_copy = quest_copy.join(pss_data).join(nightly_data)

# extract night-id from index and let it start from 0
quest_copy = int_from_str_idx(quest_copy, "night", regex="N(\d)", func=lambda x: x-1)

quest_copy = quest_copy.join(dataset.condition_map)
quest_copy = quest_copy.set_index("condition", append=True)
quest_copy.head()

## Export

### Merge Data and Convert to Long-Format

In [None]:
quest_sleep = pd.concat([bedtime, meq, times_selfreport], axis=1)
quest_sleep = pd.wide_to_long(
    df=quest_sleep.reset_index(), 
    stubnames=["{}Selfreport".format(s) for s in sleep_cols], 
    i="subject", 
    j="night", 
    sep="_", 
    suffix="\w+"
).sort_index()
quest_sleep = int_from_str_idx(quest_sleep, "night", "N(\w)", lambda x: x-1)
quest_sleep.head()

### Rename columns for consisting naming

In [None]:
quest_sleep = quest_sleep.rename(
    columns={s: camel_to_snake(s) for s in [f"{col}Selfreport" for col in sleep_cols]}
)
quest_sleep = quest_sleep.rename(
    columns={s: s.lower() for s in ["Chronotype_Coarse", "Chronotype_Fine"]}
)
quest_sleep.head()

In [None]:
quest_copy = quest_copy.rename(
    columns={col: camel_to_snake(col) for col in quest_copy.columns if "PSS" not in col}
)
quest_copy.head()

In [None]:
quest_copy.to_csv(export_path.joinpath("questionnaire_data.csv"))

In [None]:
quest_sleep.to_csv(export_path.joinpath("chronotype_bedtimes.csv"))

In [None]:
cond_data.to_csv(export_path.joinpath("condition_map.csv"))