Skip to content

harp-lab/brainPH

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Brain Persistent Homology

Using persistent homology and multidimensional scaling on Wasserstein distance matrix

Data pre-processing

Input matrix

  • S x C x N x N, where S = #subjects, C = #cohorts, N = #ROIs
  • Mat file
  • Size: 316 x 3 x 114 x 114
  • Three cohorts: mx645, mx1400, std2500

Data normalization

  • Removed NaN values by removing column 24 and row 24 from each 114 x 114 matrix
  • Used correlation coefficients on transposed matrix and then applied square root on the 1 - squared distance
  • 316 x 3 files, 3 files for each subject
  • Each file contains 113 x 113 matrix
  • Example: subject_1_mx645.txt, subject_1_mx1400.txt, subject_1_std2500.txt

Pipeline 1: Comparison across cohorts

Persistent homology

  • Computed 0-dimensional persistent homology (PH) for all three cohorts of each subjects
  • Generated 0-dimensional barcodes from calculated PH values with maximum value of 1
  • To use persistent homology features from Gudhi library set manual=False in get_barcodes_single_subject method in distance_calculation.py. Otherwise, set manual=True for raw calculation of persistent homology and 0-dimensional barcodes.

Distance calculation

  • Computed 1-Wasserstein Distance (WD) between cohorts for each subjects from the 0-dimensional barcodes
  • For each subject, computed WD on the 0-dimensional barcodes:
    • WD(mx645 - mx1400)
    • WD(mx1400 - std2500)
    • WD(std2500 - mx645)
  • Generated 1 JSON file with 316 arrays, each array contains 3 values
  • Generated file: distances_between_cohorts_ws.json

Pipeline 2: Comparison within a cohort

Persistent homology

  • Computed 0-dimensional persistent homology (PH) for all three cohorts of each subjects
  • Generated 0-dimensional barcodes from calculated PH values with maximum value of 1
  • To use persistent homology features from Gudhi library set manual=False in get_barcodes_single_subject method in distance_calculation.py. Otherwise, set manual=True for raw calculation of persistent homology and 0-dimensional barcodes.

Distance matrix (Wasserstein distance)

Multidimensional scaling (Wasserstein distance)

  • Applied classical metric Multidimensional scaling (MDS) with precomputed distance (1-Wasserstein)
  • Calculated MDS of 2 components for each 1-Wasserstein distance matrix
  • Generated 3 JSON files each with 316 x 2 matrix
  • Generated files:
  • Applied Kmeans++ clustering by selecting the number of clusters n using Silhouette Coefficient.

Statistical analysis

  • Calculate p-value using ANOVA test on the [316 x 3] size Wasserstein distances between the cohorts
  • ANOVA test p-value: 0.133
  • Wasserstein distance for the following three pairs: (1) TR=645ms and TR=1400ms, (2) TR=1400ms and TR=2500ms and, (1) TR=2500ms and TR=645ms plotted using box plots: boxplots
  • Plot WD distances between:
  • Plot MDS value for all three cohorts: mds graph
  • Clustering on the MDS results

Local Setup

Requirements

  • Python 3

Install dependencies

  • Clone the repository.
  • Open a terminal / powershell in the cloned repository.
  • Create a virtual environment and activate it. If you are using Linux / Mac:
python3 -m venv venv
source venv/bin/activate

Create and activate venv in Windows (Tested in Windows 10):

python -m venv venv
Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser
.\venv\Scripts\Activate.ps1

After activating venv, the terminal / powershell will have (venv) added to the prompt.

  • Check pip version:
pip --version

It should point to the pip in the activated venv.

  • Install required packages:
pip install -r requirements.txt

Run the project:

  • Calculate distance between cohorts and MDS within a cohort using WD:
python distance_calculation.py --method ws --start 1 --end 316 --distance y --mds y --data_dir full_data_linear --output_dir output_linear
  • Draw plots and ANOVA test:
python statistical_calculation_linear.py --output_dir output_linear
  • Generate clusters on the MDS data:
python cluster_calculation.py --output_dir output_linear

Results

  • Generate distance and MDS:
python distance_calculation.py --method ws --start 1 --end 316 --distance y --mds y --data_dir full_data_linear --output_dir output_linear
  • Running statistical analysis on the generated file:
python statistical_calculation_linear.py --output_dir output_linear
T-values:
0.059044 0.459634 0.286013 
P-values:
0.088131 0.518936 0.387180 
ANOVA test p-value: 0.289941
Mean WD_MX645_MX1400: 4.304
Mean WD_MX1400_STD2500: 4.01
Mean WD_STD2500_MX645: 4.135
WD_MX645_MX1400: Distance:   2, number of subjects:  42, percentage: 13.29%
WD_MX645_MX1400: Distance:   5, number of subjects: 207, percentage: 65.51%
WD_MX645_MX1400: Distance>  10, number of subjects:   4, percentage: 1.27%
Method main executed in 128.7125 seconds
  • T-values and p-values obtained by pairwise t-tests comparing the WDs between data cohorts. Since all p-values are greater than 0.05, the means of WD distributions for each cohort comparison are statistically similar.
t-value p-value
WD(P1, P2) WD(P2, P3) 0.059044 0.088131
WD(P2, P3) WD(P3, P1) 0.459634 0.518936
WD(P3, P1) WD(P1, P2) 0.286013 0.387180
  • Wasserstein distance for the following three pairs: (1) TR=645ms and TR=1400ms, (2) TR=1400ms and TR=2500ms and, (1) TR=2500ms and TR=645ms plotted using box plots: alt boxplots
  • WD for all 316 subjects for mx645 and mx1400: alt WD_mx645_mx1400
  • WD for all 316 subjects for mx1400 and std2500: alt WD_mx1400_std2500
  • WD for all 316 subjects for std2500 and mx645: alt WD_std2500_mx645
  • Clustering result for all three cohorts using Wasserstein distance:
    • mx1400: alt clusters_mx1400_ws
    • mx645: alt clusters_mx645_ws
    • std2500: alt clusters_std2500_ws

Generated files:

Clustering results (review update)

TDA cluster generation (linear data)

Clustering result (within cohort):

python cluster_calculation.py --output_dir output_linear

Number of clusters in 3 cohorts: [2, 2, 2]
output_linear:
Cluster group: 000: #match: 24
Cluster group: 001: #match: 7
Cluster group: 010: #match: 26
Cluster group: 011: #match: 83
Cluster group: 100: #match: 115
Cluster group: 101: #match: 12
Cluster group: 110: #match: 20
Cluster group: 111: #match: 29

Max + reverse: 115 + 83 = 198

645-1400 : 236
1400-2500 : 251
2500-645 : 225

Adjacency matrix:
output_linear:
Rows X Columns: [645 clusters, 1400 clusters, 2500 clusters]
140 0 31 109 50 90 
0 176 127 49 135 41 
31 127 158 0 139 19 
109 49 0 158 46 112 
50 135 139 46 185 0 
90 41 19 112 0 131 
  • Clustering result for full data for all three cohorts using Wasserstein distance:
    • mx645: alt clusters_mx645_tda_linear
    • mx1400: alt clusters_mx1400_tda_linear
    • std2500: alt clusters_std2500_tda_linear

Statistical analysis on tda pipeline with linear data (across cohort):

python statistical_calculation_linear.py --output_dir output_linear
T-values:
0.059044 0.459634 0.286013 
P-values:
0.088131 0.518936 0.387180 
ANOVA test p-value: 0.289941
  • T-values and p-values obtained by pairwise t-tests comparing the WDs between data cohorts. Since all p-values are larger than 0.05, the means of WD distributions for each cohort comparison are statistically similar.
t-value p-value
WD(P1, P2) WD(P2, P3) 0.059044 0.088131
WD(P2, P3) WD(P3, P1) 0.459634 0.518936
WD(P3, P1) WD(P1, P2) 0.286013 0.387180

TDA cluster generation (random data - 1 sample)

Clustering result (within cohort):

python cluster_calculation.py --output_dir output_random
Number of clusters in 3 cohorts: [2, 2, 2]
output_random:
Cluster group: 000: #match: 35
Cluster group: 001: #match: 38
Cluster group: 010: #match: 34
Cluster group: 011: #match: 43
Cluster group: 100: #match: 36
Cluster group: 101: #match: 32
Cluster group: 110: #match: 42
Cluster group: 111: #match: 56

Max + reverse: 56 + 35 = 91

Adjacency matrix:
output_random:
Rows X Columns: [645 clusters, 1400 clusters, 2500 clusters]
150 0 73 77 69 81 
0 166 68 98 78 88 
73 68 141 0 71 70 
77 98 0 175 76 99 
69 78 71 76 147 0 
81 88 70 99 0 169 
  • Clustering result for random data for all three cohorts using Wasserstein distance:
    • mx645: alt clusters_mx645_tda_random
    • mx1400: alt clusters_mx1400_tda_random
    • std2500: alt clusters_std2500_tda_random

Mean and standard deviation of random clusters (49 out of 50)

Mean value of (Max + Reverse): 84.06122448979592
Standard deviation value of (Max + Reverse): 5.738786759358441

Notes

  • Within cohort: clustering
  • Across cohorts: statistical analysis
  • Original dataset: timeseries.Yeo2011.mm316.mat
  • Total negative in correlation coefficient: 1234732 from all_positive_linear.m
  • Total positive in correlation coefficient: 10870280 from all_negative_linear.m
  • 3 out of 148 random dataset returns cluster 2, 2, 4, 1 returns 4, 2, 2, and 144 returns 2, 2, 2.

To Do:

  • non-TDA experiments for within cohort and comparison across cohort
  • nonTDA on random for second pipeline
  • create two matrices one for positive values and one for negative values and apply the distance function on them. Since, this will be a lot of experiments, if we do this for everything, let us just start by doing with only pipeline 1 (box plots, p/t-value tests). the original mat file which we normalized using matlab. 113 x 113 with all positive (padded by 0) and 113 x 113 with all negative (padded by 0).

References

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published