## Analyze statistical maps

Using data created wth PrepareMaps.ipynb

Hypotheses to be tested:

Parametric effect of gain:

1. Positive effect in ventromedial PFC - for the equal indifference group
2. Positive effect in ventromedial PFC - for the equal range group
3. Positive effect in ventral striatum - for the equal indifference group
4. Positive effect in ventral striatum - for the equal range group

Parametric effect of loss:
- 5: Negative effect in VMPFC - for the equal indifference group
- 6: Negative effect in VMPFC - for the equal range group
- 7: Positive effect in amygdala - for the equal indifference group
- 8: Positive effect in amygdala - for the equal range group

Equal range vs. equal indifference:

- 9: Greater positive response to losses in amygdala for equal range condition vs. equal indifference condition.


In [1]:
import numpy,pandas
import nibabel
import json
import pickle
import os,glob
import nilearn.image
import nilearn.input_data
import nilearn.plotting
from collections import OrderedDict,Counter
import shutil
import warnings
import sklearn
import matplotlib.pyplot as plt
import seaborn
import scipy.cluster
import scipy.stats
from sklearn.cluster import AgglomerativeClustering
from utils import get_masked_data,get_metadata,get_decisions,get_teamID_to_collectionID_dict,matrix_jaccard
from narps import Narps,NarpsDirs

hypotheses= {1:'+gain: equal indiff',
            2:'+gain: equal range',
            3:'+gain: equal indiff',
            4:'+gain: equal range',
            5:'-loss: equal indiff',
            6:'-loss: equal range',
            7:'+loss: equal indiff',
            8:'+loss: equal range',
            9:'+loss:ER>EI'}

# we don't neet work with 3 and 4 because maps are the same as 2 and 3, just different regions of interest
hypnums = [1,2,5,6,7,8,9]

# create some variables used throughout

cut_coords = [-24,-10,4,18,32,52,64]

unthresh_dataset_to_use = 'zstat' # or 'zstats' to limit to those with valid zstat images



In [2]:
# set an environment variable called NARPS_BASEDIR with location of base directory
if 'NARPS_BASEDIR' in os.environ:
    basedir = os.environ['NARPS_BASEDIR']
else:
    basedir = '/data'
assert os.path.exists(basedir)


narps = Narps(basedir,overwrite=False)
narps.load_data()

orig_dir = os.path.join(basedir,'orig')
metadata_dir = os.path.join(basedir,'metadata')
output_dir = narps.dirs.dirs['output']
figure_dir = os.path.join(basedir,'figures')
if not os.path.exists(figure_dir):
    os.mkdir(figure_dir)
template_img = narps.dirs.MNI_template
mask_img = narps.dirs.MNI_mask


found 54 input directories
found 54 teams with complete original datasets


### Load metadata

Metadata file contains details regarding the analysis of each team.

In [3]:
metadata = pandas.read_csv(os.path.join(metadata_dir,'all_metadata.csv')) 
#decisions = get_decisions(os.path.join(metadata_dir,'all_metadata.csv'))
#teamID_to_collectionID_dict = get_teamID_to_collectionID_dict(metadata)  

In [4]:
metadata.columns

Index(['Unnamed: 0', 'teamID', 'Decision', 'varnum', 'Similar', 'Confidence',
       'NV_collection_string', 'results_comments', 'preregistered',
       'link_preregistration_form', 'regions_definition', 'softwares',
       'TSc_SW', 'Unnamed: 8', 'n_participants', 'exclusions_details',
       'used_fmriprep_data', 'preprocessing_order', 'brain_extraction',
       'segmentation', 'slice_time_correction', 'motion_correction', 'motion',
       'gradient_distortion_correction', 'intra_subject_coreg',
       'distortion_correction', 'inter_subject_reg', 'intensity_correction',
       'intensity_normalization', 'noise_removal', 'volume_censoring',
       'spatial_smoothing', 'TSc_smoothing', 'preprocessing_comments',
       'data_submitted_to_model', 'spatial_region_modeled',
       'independent_vars_first_level', 'independent_vars_higher_level',
       'model_type', 'model_settings', 'inference_contrast_effect',
       'search_region', 'statistic_type', 'pval_computation',
       'multiple

In [5]:
Counter(metadata.query('hyp==1').TSc_SW)

Counter({'FSL': 16,
         'SPM12': 18,
         'SPM': 4,
         'nistats': 3,
         'randomise': 4,
         'AFNI': 8})

In [6]:
Counter(metadata.query('hyp==1').used_fmriprep_data)

Counter({'Yes': 30, 'No': 23})

## Diagnostics on statistical images

#### Overlap maps for thresholded images

Showing proportion of supra-threshold voxels across teams.

In [7]:
# display overlap maps for thresholded maps

masker = nilearn.input_data.NiftiMasker(mask_img=mask_img)
max_overlap = {}
fig, ax = plt.subplots(7,1,figsize=(12,24))
for i,hyp in enumerate(hypnums):
    imgfile=os.path.join(output_dir,'overlap_binarized_thresh/hypo%d.nii.gz'%hyp)
    nilearn.plotting.plot_stat_map(imgfile, threshold=0.1, display_mode="z", 
                colorbar=True,title='hyp %d:'%hyp+hypotheses[hyp],vmax=1.,cmap='jet',
                                  cut_coords = cut_coords,axes = ax[i],figure=fig)
    # compute max and median overlap
    thresh_concat_file = os.path.join(output_dir,'thresh_concat_resampled/hypo%d.nii.gz'%hyp)
    thresh_concat_data = masker.fit_transform(thresh_concat_file)
    overlap = numpy.mean(thresh_concat_data,0)
    print('hyp%d'%hyp,numpy.max(overlap))
    max_overlap[hyp]=overlap
plt.savefig(os.path.join(figure_dir,'overlap_map.png'))


hyp1 0.6111111
hyp2 0.7407407
hyp5 0.5185185
hyp6 0.3888889
hyp7 0.7222222
hyp8 0.537037
hyp9 0.24074075


#### Range and standard deviation maps

Showing range/standard deviation of statistical values in unthresholded maps across teams

In [8]:
# show range maps
#fig = plt.fig
fig, ax = plt.subplots(7,1,figsize=(12,24))
for i,hyp in enumerate(hypnums):
    range_img=nibabel.load(os.path.join(output_dir,'unthresh_range_%s/hypo%d.nii.gz'%(unthresh_dataset_to_use,hyp)))
    nilearn.plotting.plot_stat_map(range_img, threshold=.1, display_mode="z", 
                           colorbar=True,title='Range: hyp %d:'%hyp+hypotheses[hyp],vmax=25,
                                  cut_coords = cut_coords,axes = ax[i])
plt.savefig(os.path.join(figure_dir,'range_map.pdf'))

In [9]:
# show std maps
fig, ax = plt.subplots(7,1,figsize=(12,24))
for i,hyp in enumerate(hypnums):
    std_img=nibabel.load(os.path.join(output_dir,'unthresh_std_%s/hypo%d.nii.gz'%(unthresh_dataset_to_use,hyp)))
    nilearn.plotting.plot_stat_map(std_img, threshold=.1, display_mode="z", 
                           colorbar=True,title='SD: hyp %d:'%hyp+hypotheses[hyp],vmax=4,
                                   cut_coords = cut_coords,axes = ax[i])
plt.savefig(os.path.join(figure_dir,'std_map.pdf'))

#### Display unthresholded maps 

Display rectified unthresholded maps for each team and compute some statistics on them.

In [10]:
imgtype='unthresh'
imginfo = {}
plot_data=False
show_md = False
nnz = []
nonzero_volume = []

dim_values = []
missing_metadata = []


for hyp in hypnums:
    hmaps = glob.glob(os.path.join(output_dir,'%s/*/hypo%d_unthresh.nii.gz'%(unthresh_dataset_to_use,hyp)))
    hmaps.sort()
    fig, ax = plt.subplots(len(hmaps),1,figsize=(12,len(hmaps)*2.5))
    print('making figure for hypothesis',hyp,len(hmaps),'maps')
    # load all maps and get dims
    for i,m in enumerate(hmaps):
        img = nibabel.load(m)
        collection = m.split('/')[-2]
        collection_string,teamID = collection.split('_')
        dims = img.header.get_data_shape()
        dim_values.append(dims)
        
        md = metadata.query('varnum==%d'%hyp).query('NV_collection_string == "%s"'%collection_string).replace(numpy.nan,'na')
        if md.shape[0]==0:
            # try other identifier
            md = metadata.query('varnum==%d'%hyp).query('teamID == "%s"'%teamID)
            if md.shape[0]==0:
                missing_metadata.append(collection)
                continue
        qform = img.header.get_qform()

        # check for thresholding
        imgdata=img.get_data()
        nonzero_vox = numpy.nonzero(imgdata)
        n_nonzero_vox = len(nonzero_vox[0])
        nnz.append(n_nonzero_vox)
        vox_vol = numpy.prod(dims)
        nonzero_volume.append(n_nonzero_vox*vox_vol)
        #print(collection,dims, numpy.prod(dims),n_nonzero_vox*vox_vol)
        if show_md:
            print(md['inter_subject_reg'].values)
        if plot_data:
            if md['used_fmriprep_data'].values[0].find('Yes')>-1:
                prep_string = 'fmriprep'
            else:
                prep_string = 'other'
            nilearn.plotting.plot_stat_map(img, threshold=2., display_mode="z", 
                           colorbar=True,title='_'.join([collection,md['TSc_SW'].values[0],prep_string]),
                                          cut_coords = cut_coords,axes=ax[i])
    if plot_data:
        plt.savefig(os.path.join(figure_dir,'hyp%d_individual_maps.pdf'%hyp))


making figure for hypothesis 1 53 maps
making figure for hypothesis 2 53 maps
making figure for hypothesis 5 53 maps
making figure for hypothesis 6 53 maps
making figure for hypothesis 7 53 maps
making figure for hypothesis 8 53 maps
making figure for hypothesis 9 53 maps


#### Correlation maps for unthresholded images

Load the unthresholded images and compute the correlation between each map. These correlation matrices are clustered using Ward clustering, with the number of clusters for each hypotheses determined by visual examination.

In [11]:
dendrograms = {}
membership={}
cluster_colors = ['r','g','b','y','k']

corr_type = 'spearman'
n_clusters = {1:4,2:3,5:4,6:3,7:4,8:4,9:3}
use_dynamicTreeCut = False

cc_unthresh={}
for i,hyp in enumerate(hypnums):
    print('hypothesis',hyp)
    maskdata,labels = get_masked_data(hyp,mask_img,output_dir,dataset=unthresh_dataset_to_use)        
    if corr_type == 'spearman':
        cc = scipy.stats.spearmanr(maskdata.T).correlation
    else:
        cc = numpy.corrcoef(maskdata)
    cc = numpy.nan_to_num(cc)
    df = pandas.DataFrame(cc,index=labels,columns=labels)
    
    ward_linkage = scipy.cluster.hierarchy.ward(cc)
    distances = scipy.spatial.distance.pdist(cc, "euclidean")
    if use_dynamicTreeCut:
        try:
            clusters = dynamicTreeCut.cutreeHybrid(ward_linkage, distances,
                                               minClusterSize = 1)
        except: # sometimes it breaks with smaller clusters
            clusters = dynamicTreeCut.cutreeHybrid(ward_linkage, distances,
                                                  minClusterSize = 12)

        
        clustlabels = [s-1 for s in clusters['labels']]
    else:
        clustlabels = [s[0] for s in scipy.cluster.hierarchy.cut_tree(ward_linkage,n_clusters=n_clusters[hyp])]
        
    # get decisions for column colors
    md = metadata.query('varnum==%d'%hyp).set_index('teamID')
    
    col_colors = [cluster_colors[md.loc[teamID,'Decision']] for teamID in labels]
    
    row_colors = [cluster_colors[s-1] for s in clustlabels]
    cm = seaborn.clustermap(df,cmap='vlag',figsize=(16,16),method='ward',
                            row_colors=row_colors,col_colors=col_colors,center=0,vmin=-1,vmax=1)
    plt.title('hyp %d:'%hyp+hypotheses[hyp])
    cc_unthresh[hyp]=(cc,labels)
    plt.savefig(os.path.join(figure_dir,'hyp%d_%s_map_unthresh.pdf'%(hyp,corr_type)))
    dendrograms[hyp]=ward_linkage
    
    # get cluster membership
    membership[hyp]={}
    for j in cm.dendrogram_row.reordered_ind:
        cl=clustlabels[j]
        if not cl in membership[hyp]:
            membership[hyp][cl]=[]
        membership[hyp][cl].append(labels[j])

    
with open(os.path.join(output_dir,'unthresh_dendrograms_%s.pkl'%corr_type),'wb') as f:
    pickle.dump((dendrograms,membership),f)  

hypothesis 1
hypothesis 2
hypothesis 5
hypothesis 6
hypothesis 7
hypothesis 8
hypothesis 9


### Clustering of unthresholded images

Use dendrogram computed by seaborn clustermap to identify clusters, and then create separate mean statstical map for each cluster.
N=4 was selected based on visualization of the clustering solutions.

*NB*: The cluster numbers do not align with the ordering of the clusters from left to right in the heatmap dendrograms.


In [12]:
# use clustering from seaborn to separate the different clusters into images

corr_type = 'spearman'
mean_smoothing = {}
mean_decision = {}
thresh = 2 # for plotting


# Ensure variable is defined
with open(os.path.join(output_dir,'unthresh_dendrograms_%s.pkl'%corr_type),'rb') as f:
        dendrograms,membership = pickle.load(f)  

use_surface = False  # this doesn't work very well, so not using it for now

if use_surface:
    if not os.path.exists(os.path.join(figure_dir,'cluster_surface_plots')):
        os.mkdir(os.path.join(figure_dir,'cluster_surface_plots'))

masker = nilearn.input_data.NiftiMasker(mask_img=mask_img)

for i,hyp in enumerate(hypnums):
    n_clusters=4
    print('hyp',hyp)
    clusters = list(membership[hyp].keys())
    clusters.sort()
    if not use_surface:
        fig, ax = plt.subplots(len(clusters),1,figsize=(12,12))
    mean_smoothing[hyp]={}
    mean_decision[hyp]={}
    for i,cl in enumerate(clusters):
        # get all images for this cluster and average them
        member_maps = []
        members_cids = []
        member_smoothing = []
        member_decision = []
        for member in membership[hyp][cl]:
            member_md = metadata.query('varnum==%d'%hyp).query('teamID=="%s"'%member)
            cid = narps.teams[member].datadir_label
            infile = os.path.join(output_dir,'%s/%s/hypo%d_unthresh.nii.gz'%(unthresh_dataset_to_use,cid,hyp))
            if os.path.exists(infile):
                member_maps.append(infile)
                member_smoothing.append(metadata.query('varnum==%d'%hyp).query('teamID=="%s"'%member)['fwhm'].iloc[0])
                member_decision.append(metadata.query('varnum==%d'%hyp).query('teamID=="%s"'%member)['Decision'].iloc[0])
                
        #members_metadata = 
        print('found %d maps'%len(member_maps))
        mean_smoothing[hyp][cl]=numpy.mean(numpy.array(member_smoothing))
        mean_decision[hyp][cl]=numpy.mean(numpy.array(member_decision))
        print('mean fwhm:',mean_smoothing[hyp][cl])
        print('pYes:',mean_decision[hyp][cl])
        maskdata = masker.fit_transform(member_maps)
        meandata = numpy.mean(maskdata,0)
        mean_img = masker.inverse_transform(meandata)
        
        if use_surface:
            splot = nilearn.plotting.view_img_on_surf(mean_img,threshold=thresh)
            splot.save_as_html(os.path.join(figure_dir,'cluster_surface_plots/hyp%d_cluster%d_means_surf.html'%(hyp,cl)))
        else:
            nilearn.plotting.plot_stat_map(mean_img, threshold=thresh, display_mode="z", 
                    colorbar=True,title='hyp%d - cluster%d (fwhm=%0.2f, pYes = %0.2f)'%(hyp,cl,
                                                        mean_smoothing[hyp][cl],mean_decision[hyp][cl]),
                            cut_coords = cut_coords,axes=ax[i])

        if not use_surface:
            plt.savefig(os.path.join(figure_dir,'hyp%d_cluster_means.pdf'%hyp))
        

hyp 1
found 27 maps
mean fwhm: 11.152540386628276
pYes: 0.4444444444444444
found 18 maps
mean fwhm: 8.2520832327271
pYes: 0.2222222222222222
found 4 maps
mean fwhm: 9.243985771784967
pYes: 0.25
found 4 maps
mean fwhm: 11.295623277223605
pYes: 0.25
hyp 2
found 24 maps
mean fwhm: 9.890770732220277
pYes: 0.25
found 21 maps
mean fwhm: 8.99036053333939
pYes: 0.09523809523809523
found 8 maps
mean fwhm: 9.786418124715906
pYes: 0.375
hyp 5
found 30 maps
mean fwhm: 11.39896058326638
pYes: 0.9666666666666667
found 5 maps
mean fwhm: 10.161925342533033
pYes: 0.6
found 15 maps
mean fwhm: 7.45208877289627
pYes: 0.8
found 3 maps
mean fwhm: 8.495577984012028
pYes: 0.0
hyp 6




found 25 maps
mean fwhm: 8.134521558613267
pYes: 0.24
found 6 maps
mean fwhm: 8.958562398766805
pYes: 0.3333333333333333
found 22 maps
mean fwhm: 8.128358730932193
pYes: 0.3181818181818182
hyp 7




found 24 maps
mean fwhm: 11.450006610236498
pYes: 0.0
found 20 maps
mean fwhm: 8.748178343145671
pYes: 0.05
found 6 maps
mean fwhm: 7.554460760491547
pYes: 0.0
found 3 maps
mean fwhm: 11.119750285768134
pYes: 0.0
hyp 8




found 28 maps
mean fwhm: 8.270546265881936
pYes: 0.0
found 12 maps
mean fwhm: 9.014586967188707
pYes: 0.0
found 10 maps
mean fwhm: 7.094863274247362
pYes: 0.0
found 3 maps
mean fwhm: 8.608360097228086
pYes: 0.3333333333333333
hyp 9




found 35 maps
mean fwhm: 8.708953524284116
pYes: 0.02857142857142857
found 10 maps
mean fwhm: 7.3127345766993646
pYes: 0.0
found 8 maps
mean fwhm: 8.312796290208139
pYes: 0.0


In [13]:
# create a data frame containing cluster metadata
smoothness = pandas.read_csv(os.path.join(metadata_dir,'smoothness_est.csv'))

cluster_metadata={}
cluster_metadata_df = pandas.DataFrame(columns = ['hyp%d'%i for i in hypnums],
                                      index=metadata.teamID)
for i,hyp in enumerate(hypnums):
    cluster_metadata[hyp]={}
    print('Hypothesis',hyp)
    clusters = list(membership[hyp].keys())
    clusters.sort()
    for i,cl in enumerate(clusters):
        print('cluster %d (%s)'%(cl,cluster_colors[i-1]))
        print(membership[hyp][cl])
        cluster_metadata[hyp][cl]=metadata[metadata.teamID.isin(membership[hyp][cl])]
        for m in membership[hyp][cl]:
            cluster_metadata_df.loc[m,'hyp%d'%hyp]=cl
        
    print('')

cluster_metadata_df = cluster_metadata_df.dropna()

        

Hypothesis 1
cluster 0 (k)
['UI76', '2T6S', 'T54A', '27SS', '6VV2', 'X1Y5', 'DC61', '9U7M', '3C6G', 'C88N', 'J7F9', '46CD', 'C22U', 'I52Y', 'VG39', 'R9K3', 'R7D1', 'E6R3', '3TR7', 'Q6O0', 'U26C', 'L7J7', 'K9P0', 'X19V', '9Q6R', 'B5I6', '50GV']
cluster 1 (r)
['0ED6', 'R5K7', 'O6R6', 'SM54', 'O03M', 'B23O', '08MQ', '1KB2', '94GU', '0I4U', '51PW', '5G9K', '3PQ2', 'I9D6', '0JO0', 'AO86', '43FJ', 'O21U']
cluster 2 (g)
['L9G5', '9T8E', 'XU70', 'R42Q']
cluster 3 (b)
['UK24', '1P0Y', '80GC', 'IZ20']

Hypothesis 2
cluster 0 (k)
['2T6S', '3TR7', 'Q6O0', 'U26C', 'L7J7', 'DC61', '50GV', 'O21U', 'X1Y5', 'B5I6', '9Q6R', 'K9P0', 'X19V', '27SS', '6VV2', 'UI76', 'R7D1', 'E6R3', 'R9K3', 'J7F9', 'C22U', 'I52Y', 'C88N', '46CD']
cluster 1 (r)
['L9G5', '0ED6', 'R5K7', '3PQ2', '5G9K', 'I9D6', '0JO0', 'O03M', 'B23O', '3C6G', '43FJ', '9U7M', 'AO86', '9T8E', '08MQ', 'SM54', '0I4U', '1KB2', '1P0Y', '94GU', '51PW']
cluster 2 (g)
['VG39', '80GC', 'IZ20', 'T54A', 'O6R6', 'R42Q', 'XU70', 'UK24']

Hypothesis 5
cluste

In [14]:
# create a data frame containing cluster metadata

cluster_metadata={}
cluster_metadata_df = pandas.DataFrame(columns = ['hyp%d'%i for i in hypnums],
                                      index=metadata.teamID)
for i,hyp in enumerate(hypnums):
    cluster_metadata[hyp]={}
    print('Hypothesis',hyp)
    clusters = list(membership[hyp].keys())
    clusters.sort()
    for i,cl in enumerate(clusters):
        print('cluster %d (%s)'%(cl,cluster_colors))
        print(membership[hyp][cl])
        cluster_metadata[hyp][cl]=metadata[metadata.teamID.isin(membership[hyp][cl])]
        for m in membership[hyp][cl]:
            cluster_metadata_df.loc[m,'hyp%d'%hyp]=cl
        
    print('')

cluster_metadata_df = cluster_metadata_df.dropna()

        

Hypothesis 1
cluster 0 (['r', 'g', 'b', 'y', 'k'])
['UI76', '2T6S', 'T54A', '27SS', '6VV2', 'X1Y5', 'DC61', '9U7M', '3C6G', 'C88N', 'J7F9', '46CD', 'C22U', 'I52Y', 'VG39', 'R9K3', 'R7D1', 'E6R3', '3TR7', 'Q6O0', 'U26C', 'L7J7', 'K9P0', 'X19V', '9Q6R', 'B5I6', '50GV']
cluster 1 (['r', 'g', 'b', 'y', 'k'])
['0ED6', 'R5K7', 'O6R6', 'SM54', 'O03M', 'B23O', '08MQ', '1KB2', '94GU', '0I4U', '51PW', '5G9K', '3PQ2', 'I9D6', '0JO0', 'AO86', '43FJ', 'O21U']
cluster 2 (['r', 'g', 'b', 'y', 'k'])
['L9G5', '9T8E', 'XU70', 'R42Q']
cluster 3 (['r', 'g', 'b', 'y', 'k'])
['UK24', '1P0Y', '80GC', 'IZ20']

Hypothesis 2
cluster 0 (['r', 'g', 'b', 'y', 'k'])
['2T6S', '3TR7', 'Q6O0', 'U26C', 'L7J7', 'DC61', '50GV', 'O21U', 'X1Y5', 'B5I6', '9Q6R', 'K9P0', 'X19V', '27SS', '6VV2', 'UI76', 'R7D1', 'E6R3', 'R9K3', 'J7F9', 'C22U', 'I52Y', 'C88N', '46CD']
cluster 1 (['r', 'g', 'b', 'y', 'k'])
['L9G5', '0ED6', 'R5K7', '3PQ2', '5G9K', 'I9D6', '0JO0', 'O03M', 'B23O', '3C6G', '43FJ', '9U7M', 'AO86', '9T8E', '08MQ', 'SM

In [15]:
# create membership data frame for computing cluster similarity across hypotheses

randmtx = numpy.zeros((10,10))
for i,j in enumerate(hypnums):
    for k in hypnums[i:]:
        if j==k:
            continue
        randmtx[j,k]=sklearn.metrics.adjusted_rand_score(cluster_metadata_df['hyp%d'%j],cluster_metadata_df['hyp%d'%k])
        if randmtx[j,k]>.2:
            print(j,k,randmtx[j,k])
        


1 2 0.5928993036048549
1 6 0.27824595533836705
1 7 0.3235055949597047
1 8 0.5502648969609603
2 6 0.3255885915439646
2 7 0.2649898620925434
2 8 0.48603833340485864
5 6 0.35204566989905917
5 7 0.3263584252564892
6 7 0.2717385443779359
6 8 0.4595361454186065
7 8 0.3443201363495264


### Determine distance from mean and compare across hypotheses



In [16]:
mean_corr=pandas.DataFrame(numpy.zeros((len(labels),len(hypnums))),
                                columns = ['hyp%d'%i for i in hypnums],
                                index=labels)
for i,hyp in enumerate(hypnums):
    print('hypothesis',hyp)
    maskdata,labels = get_masked_data(hyp,mask_img,output_dir,dataset=unthresh_dataset_to_use)   
    meandata = numpy.mean(maskdata,0)
    for t in range(maskdata.shape[0]):
        mean_corr.iloc[t,i] = scipy.stats.spearmanr(maskdata[t,:],meandata).correlation

hypothesis 1
hypothesis 2
hypothesis 5
hypothesis 6
hypothesis 7
hypothesis 8
hypothesis 9


#### Is distance from the mean similar across hypotheses?



In [17]:
mean_corr.corr(method='spearman')

Unnamed: 0,hyp1,hyp2,hyp5,hyp6,hyp7,hyp8,hyp9
hyp1,1.0,0.793904,0.496533,0.462587,0.703354,0.797855,0.470811
hyp2,0.793904,1.0,0.638849,0.63643,0.674085,0.785518,0.542977
hyp5,0.496533,0.638849,1.0,0.920013,0.718029,0.63522,0.424609
hyp6,0.462587,0.63643,0.920013,1.0,0.670537,0.660458,0.414933
hyp7,0.703354,0.674085,0.718029,0.670537,1.0,0.889856,0.460087
hyp8,0.797855,0.785518,0.63522,0.660458,0.889856,1.0,0.481132
hyp9,0.470811,0.542977,0.424609,0.414933,0.460087,0.481132,1.0


### Plot distance from mean across teams

This plot is limited to the teams with particularly low median correlations (<.4)

In [18]:
median_distance = mean_corr.median(1).sort_values()
plt.bar(median_distance.index,median_distance)

<BarContainer object of 53 artists>

In [19]:
median_distance_low = median_distance[median_distance<0.2]
plt.bar(median_distance_low.index,median_distance_low)

<BarContainer object of 4 artists>

In [20]:
median_distance_high = median_distance[median_distance>0.7]
plt.bar(median_distance_high.index,median_distance_high)
median_distance_high.shape

(24,)

In [21]:
median_distance_df = pandas.DataFrame(median_distance,columns=['mean_distance'])
# combine with metadata
median_distance_df.to_csv(os.path.join(metadata_dir,'mean_pattern_distance.csv'))

### Closer look at metadata

Focusing on Hypothesis 1, for each cluster print a summary of the metadata along with some details for each team.

In [22]:
# look more closely at the data for hyp 1, to try to better understand the clusters

hyp=1
strings_to_find = [('motion','movement'),'compcor']
for cl in range(4):
    print('cluster',cl)
    md = cluster_metadata[hyp][cl]
    print(Counter(md.TSc_SW))
    print(Counter(md.used_fmriprep_data))
    found={}
    for s in strings_to_find:
        found[s]=[]
    for i in md.index:
        print(i,md.loc[i,'TSc_SW'],md.loc[i,'used_fmriprep_data'])
        for s in strings_to_find:
            if isinstance(s, tuple):
                for ss in s:
                     if md.loc[i,'independent_vars_first_level'].lower().find(ss)>-1:
                        found[s].append(i)
                        #print('found possible use of',s[0])
               
            else:
                if md.loc[i,'independent_vars_first_level'].lower().find(s)>-1:
                    found[s].append(i)
                    #print('found possible use of',s)
        print(md.loc[i,'independent_vars_first_level'])
        print('')
    for s in strings_to_find:
        if isinstance(s, tuple):
            ss=s[0]
        else:
            ss=s
        print('found possible modeling of %s in %0.2f percent'%(ss,100*(len(found[s])/md.shape[0])))
    print('')

cluster 0
Counter({'SPM12': 90, 'FSL': 54, 'randomise': 36, 'AFNI': 36, 'SPM': 27})
Counter({'Yes': 171, 'No': 72})
0 FSL Yes
In FSL FEAT, we used an event-related design with three predictors: (1) the duration of the gamble choice period, (2) parametric modulation of the gain amount, and (3) parametric modulation of the loss amount. All 3 regressors were convolved with a double-gamma HRF. No orthogonalization of regressors was applied.

1 FSL No
Event-related design predictors:
- Modeled duration = 4
- EVs  (3): Mean-centered Gain, Mean-Centered Loss, Events (constant)
Block design:
- baseline not explicitly modeled
HRF:
- FMRIB's Linear Optimal Basis Sets
Movement regressors:
- FD, six parameters (x, y, z, RotX, RotY, RotZ)

3 SPM12 Yes
First-level analyses were performed using a GLM in an event related design. Stimulus onsets (gambling task) were convolved with the canonical HRF basis function as provided by the SPM with a stimulus duration of 4 s. We additionally included two param

#### SKIP FOR NOW Similarity maps for thresholded images

For each pair of thresholded images, compute the similarity of the thresholded/binarized maps using the Jaccard coefficient.