# Analysis of Experimental Data

Experimental data will be available for download from our [Gnode repository](https://gin.g-node.org/pspitzner/stimulating_modular_cultures) (after acceptance).
Place the downloaded files in the base directroy of _this_ repo, i.e. `./dat/experiments/`

Most of the analysis is the same for experiments and simulations.
The comparison across conditions is implemented in a stand-alone script `ana/process_conditions.py`, that can be run from a terminal and takes a the following arguments:
* `-i` the base path to the folder where the data is stored.
* `-t` the type of experiment (yields the right subfolders and file names):
    - `exp` for main results, from optogenetic stimulation and different topologies
    - `exp_chemical` for the experiments with KCl
    - `exp_bic` for the experiments with Bicuculline
    - `sim_partial` for simulations where only part of the system was targeted.
* `-o` where to store the output path.

To create the preprocessed data, navigate to the base directory and run:
```bash
python ./ana/process_conditions.py -t exp -i ./dat/experiments/raw/  -o ./dat/experiments/processed/
python ./ana/process_conditions.py -t exp_chemical -i ./dat/experiments/raw/ -o ./dat/experiments/processed/
python ./ana/process_conditions.py -t exp_bic -i ./dat/experiments/raw/ -o ./dat/experiments/processed/
```


This should yield the following files:
```bash
>>> tree -L 2 --dirsfirst ./dat/experiments/processed/
dat/experiments/processed/
├── 1b
│   ├── 210315_A
│   ├── 210315_C
│   ├── 210405_C
│   ├── 210406_B
│   ├── 210406_C
│   ├── 210719_B
│   ├── 210719_C
│   └── 210726_B
├── 3b
│   ├── 210316_A
│   ...
├── Bicuculline_1b
│   ├── 210907_1bB
│   ...
├── KCl_1b
│   ├── 210420_C
│   ...
├── merged
│   ├── 210401_A
│   ...
├── 1b.hdf5
├── 3b.hdf5
├── Bicuculline_1b.hdf5
├── KCl_1b.hdf5
└── merged.hdf5
```

where the `*.hdf5` files contain the preprocessed data and the folders for each experiment have some additional info. See also `save_analysed_h5f` in `ana/process_conditions.py`.

Low-level plotting functions are contained in `ana/plot_helper.py` and
the higher-level wrappers as well as further analysis are in `ana/paper_plots.py`.
In particular, most contend of this notebook can also be found in `paper_plots.py/fig_x()`

Experiments are depicted in Figures 1 and 2, and in the Supplemental Material.
For fine-grained control, we produced every figure panel as a stand-alone and combined them later.

# Plotting

In [None]:
# The autoreload extension allows you to tweak the code in the imported modules (`pp`)
# and rerun cells to reflect the changes.
%load_ext autoreload
%autoreload 2
%load_ext ipy_dict_hierarchy
%matplotlib inline
%config InlineBackend.figure_format = 'retina'


import sys
sys.path.append("../ana/")
sys.path.append("./../")

from ana import paper_plots as pp
# reduce the printed output, we have lots of details on the INFO level.
pp.log.setLevel("ERROR")

In [None]:
# print(pp.fig_1.__doc__)
pp.show_ylabel = True
pp.show_title = True
pp.fig_1()

In [None]:
# print(pp.fig_2.__doc__)
pp.log.setLevel("ERROR")
pp.show_xlabel = False
pp.show_ylabel = True
pp.show_title = True
pp.fig_2(
    pd_folder = f"{pp.p_exp}/processed",
    out_prefix = f"{pp.p_fo}/exp_f2_",
)

In [None]:
# table s5
df = pp.table_for_violins()
# we want core delay in miliseconds
df["Core delays (ms)"] = df["Core delays"].apply(lambda x: x * 1000)
df = df.drop("Core delays", axis=1)

df.to_excel(f"{pp.p_exp}/processed/fig_2_table_violins.xlsx", engine="openpyxl")
df.to_latex(
    f"{pp.p_exp}/processed/fig_2_table_violins.tex",
    na_rep="",
    bold_rows=False,
    multirow=True,
    multicolumn=True,
    float_format="{:3.2f}".format,
)
df

In [None]:
# table s4
df = pp.table_for_rij()
df.to_excel(f"{pp.p_exp}/processed/fig_2_table_rij_barplots.xlsx", engine="openpyxl")
df.to_latex(
    f"{pp.p_exp}/processed/fig_2_table_rij_barplots.tex",
    na_rep="",
    bold_rows=False,
    multirow=True,
    multicolumn=True,
    float_format="{:3.2f}".format,
)
df

## Supplemental Material and Statistical Tests

In [None]:
# print(pp.fig_sm_exp_trialwise_observables.__doc__)
pp.log.setLevel("ERROR")
pp.show_xlabel = True
pp.show_ylabel = True
pp.show_title = False
pp.fig_sm_exp_trialwise_observables(
    pd_folder=f"{pp.p_exp}/processed",
    out_prefix=f"{pp.p_fo}/exp_sr_rij500_",
)


In [None]:
# table s6
df = pp.table_for_trials()
# add the number of trials of the trial data frame to the layout description
df = df.reset_index()
df["layout"] = df["layout"] + " ($N=" + df["trials"].map(str) + "$ realizations)"
df = df.drop("trials", axis=1)
df = df.set_index(["layout", "condition", "kind"])

df.to_excel(f"{pp.p_exp}/processed/fig_2_table_trial_estimates.xlsx", engine="openpyxl")
df.to_latex(
    f"{pp.p_exp}/processed/fig_2_table_trial_estimates.tex",
    na_rep="",
    bold_rows=False,
    multirow=True,
    multicolumn=True,
    float_format="{:3.2f}".format,
)
df

### NHST
Frequentist perspective, with Null-hypothesis of pre- and stim conditions being equal.
The table below contains p-values from two-sided, paired-sample t-tests
(corresponding to the stick-plots above)

In [None]:
# table s1
# the test results are logged at INFO level and
# returned as a pandas dataframe.
pp.log.setLevel("WARNING")
nhst_stats = pp.nhst_pairwise_for_trials(
    observables=[
        "Mean Rate",
        "Median Fraction", # this is the event size
        "Median Neuron Correlation",
        "Functional Complexity",
        "Mean IBI",
        "Mean Core delays",
    ],
    layouts=["1b", "3b", "merged", "KCl_1b"],
)

nhst_stats.to_excel(f"{pp.p_exp}/processed/fig_2_table_pvals.xlsx", engine="openpyxl")
nhst_stats.to_latex(
    f"{pp.p_exp}/processed/fig_2_table_pvals.tex",
    na_rep="",
    bold_rows=False,
    multirow=True,
    multicolumn=True,
    float_format="{:5.4f}".format,
)
nhst_stats

### Bayesian
Taking a bayesian perspective, we ask what changes of the observables are _credible_ given the recorded data. We assume the differences between the conditions to follow a student-t distribution, sample the posterior of the _mean of the differences_ and list:
* the highest density intervals (HDI) of the posterior distribution
* the probability of direction (PD)
* PD converted to a two-sided p-value

#TODO: add refs

In [None]:
bayesian_stats = pp.bayesian_best_for_trials(
    observables=["Mean Correlation", "Mean Fraction", "Functional Complexity"],
    layouts=["1b", "3b", "merged", "KCl_1b"],
)

In [None]:
bayesian_stats