### Import libraries 

In [1]:
import numpy as np
import pandas as pd
import seaborn as sns
import scipy.io as sio
import os
import matplotlib.pyplot as plt 
import itertools
from itertools import groupby, combinations
from all_observables import *
from discrepancy_measures import *
from test_retest_reliability import *

### Load data

In [2]:
file_path = "/.../fmri_state_transition_dynamics/sample_data/" # Please complete the directory when you use the code.
# Use the first session of the first participant 
df = pd.read_csv(file_path+"sample_data_participant1_session1.csv")

### Assign the cluster label to each time point and compute the centroid of each cluster using K-means clustering. [NM: Improve. kmeans_labels and labels just below are the same things, right? Confusing. And labels is too generic a name. Rename to cluster_id or cluster_label through this notebook and all the codes? Sorry to be picky, but these details enhance usability of the code in general.] [Saiful: Now is it okay?]

**If you want only cluster labels and centroids of the clusters by any method (e.g. Kmeans) use the following command.**

In [3]:
data = np.array(df)
cluster_labels, cluster_centroids = Kmeans(data = data, num_clusters = 4)

### Get cluster labels, GEV, and all observables.

**If you want only cluster labels, GEV and all five observables by any method (e.g. Kmeans) use the following command**

In [4]:
cluster_labels, GEV, observables = lav_observ(data = data,  method = Kmeans, num_clusters = 4)

### Discrepancy measures
- **Dissimilarity between centroid position**
- **Total variation (TV) of coverage time** 
- **TV of frequency** 
- **TV of average lifespan** 
- **Frobenius distance of two transition probability matrices** 

To compare observables between two sessions from the same participant or different participants, the provided code outputs a list all_results [NM: I've added all_results here. Am I correct doing that?] containing discrepancy measures. Each element of the list is [NM: what? List? array? composed of what? please fill.][Saiful: Let me explain and hopefully it will resolved both of your comments. In the first command the output "all_results" is not discrepancy measure (you can see "lav_observ" function used in the list comprehension). "all_results" is a list of lists. Each inner list contains all observables from a session. The outer list of length equal to the number of participants. So, if there are P participant and each of them has S sesssion then "all_results" is a list of length P and each element of them is another list of length S. The seccond code line outputs "within_participants" which contains the discrepancy measures of ovservables between two sessions from the same participants and the third command line ouputs "between_participants" which contains the discrepancy measures of ovservables between two sessions from different participants. Both "within_participants" and "between_participants" are list. Each element of these list is an numpy array consisting of the dissimilarity in the centroid position between two sessions, TV of coverage time, TV of frequency of appearance, TV of average lifespan, and the Frobenius distance between two transition probability matrices ] For within-participant comparison, the length of the output list is $p \times {s \choose 2}$, where $p$ is the number of participants for which we want the discrepancy measures, and $s$ is the number of sessions for each sparticipant. For between-participant comparison, the length of the list is $s \times {p \choose 2}$. In the example code, dummy data with $p=2$ participants and $s=2$ sessions per participant is used for illustration.

In [5]:
# all observables from all sessions
all_results = [[lav_observ(data = np.array(pd.read_csv(file_path+"sample_data_participant"+str(p)+"_session"+str(s)+".csv")),  
           method = Kmeans, num_clusters = 4)[2] for s in range(1,3)] for p in range(1,3)]

# Discrepancy interms of the five observables between sessions within the same participant and between sessions from different participants. 
within_participants = reproducibility(all_results = np.array(all_results), distance = "cosine", 
                                      comparison = "WP")
between_participants = reproducibility(all_results = np.array(all_results), distance = "cosine", 
                                       comparison = "BP")

### Permutation test
We hypothesize that the state-transition dynamics estimated from fMRI data is more consistent between different sessions of the same participant than between different participants. To test this hypothesis, we compare the dissimilarity between two sessions originating from the same participant and the dissimilarity between two sessions originating from different participants. For each observable, we compare the within-participant dissimilarity and between-participant dissimilarity using the normalized distance ND combined with the permutation test. [NM: Also explain the three outputs, please.]. The three outputs are "ND_empirical" which is a numpy array provides the ND values of five observables computed from empirical data (See Eq.17 in the article [I will embed the link here]); "ND_permuted" is a (10000,5) numpy array each row provides the ND values of five observables computed after each permutation; "p_value" is the p-value calculated from "ND_empirical" and "ND_permuted" (See section 2.6.3 in the article [I will embed the link here].).

In [6]:
ND_empirical, ND_permuted, p_value = permuted_ND(N = 10000, 
                                                 all_results = np.array(all_results), 
                                                 distance = "cosine")

If you want to get only the ND value for the five observables from empirical data, without running the permutation test, then you can use the following command.

In [7]:
ND_empirical = ND_value(all_results = np.array(all_results), distance = "cosine")