## Overview of the Script
This notebook is designed to read and analyze outputs generated from bootstrapped analyses. It involves steps to load the bootstrapped data,
summarize the results, and visualize the distribution of key statistics such as F-statistics, p-values, or other relevant metrics.

**Overview of the Script:**
1. **Importing Libraries:**
   - Necessary Python libraries are imported, including `pandas` for data manipulation, `numpy` for numerical operations,
     and visualization libraries like `matplotlib` and `seaborn`.

2. **Loading Bootstrapped Data:**
   - The notebook reads in the bootstrapped data outputs from previously run analyses. This involves loading data from files or directories
     and organizing it for further analysis.

3. **Summarizing Results:**
   - The core of the notebook involves summarizing the bootstrapped results, which may include calculating mean, median, confidence intervals,
     and other summary statistics across the bootstrapped samples.

4. **Visualizing Distributions:**
   - The notebook includes steps to visualize the distribution of key statistics using histograms, box plots, or other relevant visualizations.
     These plots help in understanding the variability and robustness of the bootstrapped results.

5. **Interpreting Outputs:**
   - Finally, the notebook provides a means to interpret the summarized and visualized results, helping to draw conclusions about the stability
     and significance of the findings.

**Purpose:**
The primary goal of this notebook is to provide a structured approach to analyzing and interpreting bootstrapped outputs, ensuring that
the findings from the bootstrap analysis are robust and well-understood.


In [6]:
import sys
sys.path.append('/pl/active/banich/studies/Relevantstudies/abcd/env/lib/python3.7/site-packages')
sys.path.append('/pl/active/banich/studies/Relevantstudies/abcd/data/clustering/analysis/')
from functions import *

In [163]:
import pandas as pd
import glob

folder_path = '/pl/active/banich/studies/Relevantstudies/abcd/data/clustering/analysis/group_differences/bootstrapped_outputs/anova/msd'

# Use glob to get a list of file paths that match the pattern "*.csv" in the specified folder
rest_files = sorted(glob.glob(folder_path + '/*rest.csv'))
dti_files = sorted(glob.glob(folder_path + '/*dti.csv'))
smri_files = sorted(glob.glob(folder_path + '/*smri.csv'))

measures = ['adversity', 'cbcl', 'cog', 'ef', 'nback', 'stroop', 'upps']

rest_dfs = []
for i,j in zip(rest_files, measures):
    rest_dfs.append(pd.read_csv(i).assign(datatype='rest', measure=j))
    
#dti_dfs = []
#for i,j in zip(dti_files, measures):
#    dti_dfs.append(pd.read_csv(i).assign(datatype='dti', measure=j))
    
#smri_dfs = []
#for i,j in zip(smri_files, measures):
#    smri_dfs.append(pd.read_csv(i).assign(datatype='smri', measure=j))

In [168]:
rm_list = ['nb_all_beh_c2b_rate', 'nb_mean_rt_corect', 'nb_total_n_correct', 'nb_total_rate_correct', 'adversity']

dfs = (pd.concat([pd.concat(rest_dfs),
                 pd.concat(dti_dfs),
                 pd.concat(smri_dfs)])
       .query('var not in @rm_list')
       .drop('Unnamed: 0', axis=1)
       .sort_values(['var', 'Sample', 'measure']))


dfs['bootF'] = dfs['f_mean'].astype(str) + ' (' + dfs['f_std'].astype(str) + ')' 
dfs['bootp'] = dfs['p_mean'].astype(str) + ' (' + dfs['p_std'].astype(str) + ')' 

dfs = dfs[['Sample', 'var', 'bootF', 'bootp', 'datatype', 'measure']]

In [180]:
rest_df = (dfs.query('datatype == "rest"')
           .reset_index(drop=True).reset_index()
           .pivot(index=['var', 'measure'], columns='Sample', values=['bootF', 'bootp'])
          ).reset_index().sort_values(['measure', 'var'])


#dti_df = (dfs.query('datatype == "dti"')
#           .reset_index(drop=True).reset_index()
#           .pivot(index=['var', 'measure'], columns='Sample', values=['bootF', 'bootp'])
#          ).reset_index().sort_values(['measure', 'var'])

#smri_df = (dfs.query('datatype == "smri"')
#           .reset_index(drop=True).reset_index()
#           .pivot(index=['var', 'measure'], columns='Sample', values=['bootF', 'bootp'])
#          ).reset_index().sort_values(['measure', 'var'])


In [184]:
rest_df 

Unnamed: 0_level_0,var,measure,bootF,bootF,bootF,bootp,bootp,bootp
Sample,Unnamed: 1_level_1,Unnamed: 2_level_1,Full_Sample,Sample1,Sample2,Full_Sample,Sample1,Sample2
24,attention_problems_r,cbcl,3.897 (1.338),4.09 (1.425),1.225 (0.729),0.046 (0.069),0.057 (0.091),0.496 (0.261)
25,ext_factor,cbcl,5.91 (1.675),4.743 (1.572),4.234 (1.385),0.01 (0.02),0.031 (0.054),0.075 (0.105)
26,int_factor,cbcl,2.696 (1.127),1.086 (0.679),2.329 (1.042),0.122 (0.138),0.489 (0.255),0.248 (0.2)
28,p_factor,cbcl,3.272 (1.255),2.877 (1.195),1.078 (0.66),0.076 (0.106),0.12 (0.147),0.525 (0.258)
36,social_problems_r,cbcl,4.71 (1.491),3.738 (1.325),2.021 (0.942),0.027 (0.052),0.068 (0.099),0.328 (0.243)
37,thought_problems_r,cbcl,0.559 (0.415),0.493 (0.378),0.646 (0.505),0.673 (0.222),0.727 (0.201),0.664 (0.234)
8,LMT_r,cog,16.782 (2.898),11.447 (2.319),6.183 (1.767),0.0 (0.0),0.0 (0.0),0.004 (0.01)
9,Pattern_r,cog,3.701 (1.345),2.771 (1.154),1.826 (0.96),0.037 (0.061),0.093 (0.118),0.23 (0.205)
10,Picture_r,cog,14.995 (2.72),10.771 (2.234),5.384 (1.724),0.0 (0.0),0.0 (0.0),0.009 (0.021)
11,RAVLT_r,cog,18.595 (3.163),8.997 (2.131),9.971 (2.333),0.0 (0.0),0.0 (0.004),0.0 (0.001)
