# Intersubject Correlation: Synchronizing our Social Cognition (Part 2)
In the previous notebook [isc-generation](https://github.com/lindseytepfer/psypose-isc/blob/main/code/isc-generation.ipynb), we correlated individual subjects with the rest of the individuals that watched the same movies that they did, performing this correlation every 20 seconds throughout the entire length of the movie. Then we averaged the subjects together (within each movie), so that we could ultimately model that data using a general linear model. 

However, before we get to the model, we'll need to prepare a few things in advance: namely, getting the regressors themselves in order, and then before we create a design matrix, we'll need to ensure that the regressors and averaged-isc data are the same shape.

In [None]:
#keys
tasknames = ['12yearsaslave','500daysofsummer','backtothefuture','citizenfour',
           'littlemisssunshine', 'pulpfiction','split','theprestige',
           'theshawshankredemption','theusualsuspects']
#values
vidnames = ['12_years_a_slave','500_days_of_summer','back_to_the_future','citizenfour',
           'little_miss_sunshine', 'pulp_fiction','split','the_prestige',
           'the_shawshank_redemption','the_usual_suspects']

zippedlist = zip(tasknames,vidnames)
tasktovidmap = dict(zippedlist)

## Speaker Change Detection


In [None]:
isc_outs = "/Volumes/Scraplab/psypose_fmri/isc_analysis/"

for task in tasknames:
    diar_df = pd.read_csv(isc_outs+task+os.sep+task+"_diarization_cleaned.csv")
    pose_df = pd.read_csv(isc_outs+task+os.sep+task+"_regressor.csv")
    
    track_timelines = []    
    speaker_list = []

    for i in range(diar_df.index.max()+1):
        start, stop = int(diar_df.loc[i, "start"]), int(diar_df.loc[i, "stop"])
        speaker = diar_df.loc[i, "speaker"]
        for r in range (start,stop+1):
            track_timelines.append(r)
            speaker_list.append(speaker)
    
    zippedlist = zip(track_timelines,speaker_list)
    speaker_tracks = dict(zippedlist)
    
    df = pd.DataFrame({"seconds":pose_df.index})
    df[["speaker_change","speaker"]] = 0,0

    for t in speaker_tracks:
        for i in range(df.index.max()+1):
            if df.loc[i,"seconds"] == t:
                df.loc[i,"speaker"] = speaker_tracks[t]
                
    df2 = df.loc[df.speaker != 0]
    df2_index = df2.index
    
    for a,b in enumerate(df2_index):
        try: 
            if df2.loc[b,"speaker"] != df2.loc[df2_index[a+1], "speaker"]:
                df2.loc[df2_index[a+1], "speaker_change"] = 1
            else:
                continue
        except:
            continue

    speaker_changes = df2["speaker_change"].reindex(range(df.index.max()), fill_value= '0')
    df["speaker_change"] = speaker_changes
    df.to_csv(isc_outs+task+os.sep+task+"_speaker_change.csv")

## Speaker Overlap

## Running a GLM
Now that all of our regressors have been generated, and we made sure that the shapes between regressors and average-isc data match, we can now run our GLM. 

In [None]:
task_betas = []
TR = 1

for task in tasknames[0:1]:
    #First, load the movie's average ISC-this is our dependent variable.
    avg_isc_nb = nb.load(isc_outs+task+"/"+task+"_average_isc.nii.gz")
    avg_isc_trimmed = avg_isc_nb.slicer[...,:avg_isc_nb.shape[3]-20] #trims 20 seconds off the end since the rolling window inherently extends beyond valid correlations.
    avg_isc = Brain_Data(avg_isc_trimmed)
    
    #Next, because of the naming mis-match, we grab the subject's task name from the first file to map it to the video
    video = "".join([val for key,val in tasktovidmap.items() if key in task])
    
    #Now we can load that task video and convolve it
    vid_regr = pd.read_csv(isc_outs+task+"/"+video+'_person_regr_shifted.csv')
    dm = Design_Matrix(vid_regr, sampling_freq=1./TR) 
    dm = dm.convolve()
    dm = dm.add_poly(order=0)
    
    #We ensure that any NaNs in our design matrix are filled, and remove columns with duplicate data 
    dm_cleaned = dm.clean(verbose=True)
    dm_cleaned.to_csv(isc_outs+task+os.sep+task+"_design_matrix.csv")

    #Set the design matrix, full_dm_cleaned, to the X attribute of the brain data object (subj_run_data)
    avg_isc.X = dm_cleaned
    stats = avg_isc.regress()
    
    #write our results to a beta map nii file. 
    stats['beta'].write(isc_outs+task+"/"+video+'_betamap.nii.gz')
