# Creating EVs
In order to analyze event-based fMRI data, we need to create an **event vector (EV)**, for each scan. An EV is a dataframe that contains information regarding when events happened during the scan, how long they lasted, and what categories the events belong to. This vector gets plugged into the analysis in order to build the linear model for brain activation related to particular events. 

A typical EV contains three columns, `condition`, `onset`, and `duration`. In this notebook, we will read in the `psychopy_csv` directory that contains the output from PsychoPy and use these files to create the EVs for run 1 and run 2 for each subject. These are the EV's that we are going to use in our analyses, so we will need to make sure that they are correct. 

Note that the timings in these EVs are based on the *un-trimmed* epi files, so we will need to trim them in our final stage of our pre-processing pipeline to subtract 12\* seconds (6 TRs) from the `onset` column.

<mark>\* **TODO:**</mark> We need to confirm how many triggers the code waits for at the beginning in order to begin the scan. I recall that it's five, but it might have 6.

In [14]:
import glob
import os
import pandas as pd
pd.options.display.max_rows = 999
pd.options.display.max_columns = 100

import numpy as np

First, we will read in one of the `.csv` files to get a sense of what it looks like. 

In [17]:
test_file = "../psychopy_csv/3/302_RunOne_2019_Aug_08_1306.csv"

In [18]:
test_df = pd.read_csv(test_file)

---
So it looks like we will need to use the `trigger_count` variable as our `onset` for now, although we could calculate an onset in seconds from the `routine_time` and `global_time` variables if we wanted to eventually. We will use the `cat` column as our `condition` and set our `duration` to 2 seconds (1 TR), as it looks like this is how long the images remained on screen. 

**TODO**: Need to consider the `duration` as the time that the image started vs the time that the image stopped. 

In [21]:
subs = list(range(1,31))
runs = [1, 2]

files = glob.glob("../psychopy_csv/*/*")
files

['../psychopy_csv/3/302_RunTwo_2019_Aug_08_1314.csv',
 '../psychopy_csv/3/302_RunOne_2019_Aug_08_1306.csv',
 '../psychopy_csv/18/206_RunOne_2019_Sep_23_1344.csv',
 '../psychopy_csv/18/206_RunTwo_2019_Sep_23_1353.csv',
 '../psychopy_csv/13/103_RunOne_2019_Aug_27_1435.csv',
 '../psychopy_csv/13/103_RunTwo_2019_Aug_27_1442.csv',
 '../psychopy_csv/21/205_RunTwo_2019_Oct_03_1356.csv',
 '../psychopy_csv/21/205_RunOne_2019_Oct_03_1349.csv',
 '../psychopy_csv/5/305_RunOne_2019_Aug_08_1408.csv',
 '../psychopy_csv/5/305_RunTwo_2019_Aug_08_1416.csv',
 '../psychopy_csv/17/203_RunTwo_2019_Aug_28_1040.csv',
 '../psychopy_csv/17/203_RunOne_2019_Aug_28_1033.csv',
 '../psychopy_csv/20/105.2_RunOne_2019_Oct_03_1323.csv',
 '../psychopy_csv/20/105_RunTwo_2019_Oct_03_1330.csv',
 '../psychopy_csv/14/309_RunOne_2019_Aug_27_1511.csv',
 '../psychopy_csv/14/309_RunTwo_2019_Aug_27_1519.csv',
 '../psychopy_csv/7/201_RunTwo_2019_Aug_23_1314.csv',
 '../psychopy_csv/7/201_RunOne_2019_Aug_23_1306.csv',
 '../psychopy_

In [19]:
test_df

Unnamed: 0,trial,text,cat,img_file,type,task_loop_1.thisRepN,task_loop_1.thisTrialN,task_loop_1.thisN,task_loop_1.thisIndex,trial_loop_1.thisRepN,trial_loop_1.thisTrialN,trial_loop_1.thisN,trial_loop_1.thisIndex,instructions_4.started,instructions_4.stopped,key_resp_2.keys,key_resp_2.rt,key_resp_2.started,key_resp_2.stopped,trigger_count,global_time,routine_time,runcount,text_4.started,text_4.stopped,text_3.started,text_3.stopped,studyimage.started,studyimage.stopped,fixscreen.started,fixscreen.stopped,key_resp_3.keys,key_resp_3.started,key_resp_3.stopped,key_resp_3.rt,fixscreen2.started,fixscreen2.stopped,text_2.started,text_2.stopped,participant,session,date,expName,psychopyVersion,frameRate,Unnamed: 45
0,,,,,,,,,,,,,,12.771513,,space,14.744663,12.771513,,,,,,,,,,,,,,,,,,,,,,302.0,1.0,2019_Aug_08_1306,RunOne,3.1.5,59.502594,
1,trial_2.csv,"Please rate the following images as either ""le...",alcohol,C:/Users/TTNI/Desktop/TTNI fMRI Studies/TTNI f...,liquor,0.0,0.0,0.0,1.0,0.0,0.0,0.0,4.0,,,,,,,6.0,36.761319,10.161671,1.0,27.566852,,39.348673,,49.49603,51.31815,51.331735,,,49.49603,51.31815,,,,,,302.0,1.0,2019_Aug_08_1306,RunOne,3.1.5,59.502594,
2,trial_2.csv,"Please rate the following images as either ""le...",nonalcohol,C:/Users/TTNI/Desktop/TTNI fMRI Studies/TTNI f...,juice,0.0,0.0,0.0,1.0,0.0,1.0,1.0,24.0,,,,,,,7.0,40.264377,3.503055,1.0,,,,,52.983573,54.825062,54.842066,52.98357290000422,2.0,52.983573,54.825062,1.464586,,,,,302.0,1.0,2019_Aug_08_1306,RunOne,3.1.5,59.502594,
3,trial_2.csv,"Please rate the following images as either ""le...",neutral,C:/Users/TTNI/Desktop/TTNI fMRI Studies/TTNI f...,neutral,0.0,0.0,0.0,1.0,0.0,2.0,2.0,18.0,,,,,,,9.0,44.254329,3.989949,1.0,,,,,56.97376,58.831556,58.847865,56.9737601000088,2.0,56.97376,58.831556,1.795452,,,,,302.0,1.0,2019_Aug_08_1306,RunOne,3.1.5,59.502594,
4,trial_2.csv,"Please rate the following images as either ""le...",family,C:/Users/TTNI/Desktop/TTNI fMRI Studies/TTNI f...,family,0.0,0.0,0.0,1.0,0.0,3.0,3.0,1.0,,,,,,,11.0,47.271266,3.016935,1.0,,,,,59.990923,61.837884,61.837884,59.99092340000789,2.0,59.990923,61.837884,1.053333,,,,,302.0,1.0,2019_Aug_08_1306,RunOne,3.1.5,59.502594,
5,trial_2.csv,"Please rate the following images as either ""le...",familyother,C:/Users/TTNI/Desktop/TTNI fMRI Studies/TTNI f...,familyother,0.0,0.0,0.0,1.0,0.0,4.0,4.0,10.0,,,,,,,13.0,51.284854,4.013585,1.0,,,,,64.003991,65.845733,65.862684,64.00399120000657,1.0,64.003991,65.845733,1.426669,,,,,302.0,1.0,2019_Aug_08_1306,RunOne,3.1.5,59.502594,
6,trial_2.csv,"Please rate the following images as either ""le...",nonalcohol,C:/Users/TTNI/Desktop/TTNI fMRI Studies/TTNI f...,juice,0.0,0.0,0.0,1.0,0.0,5.0,5.0,5.0,,,,,,,15.0,54.788132,3.503275,1.0,,,,,67.507179,69.352655,69.368849,67.50717860000441,,67.507179,69.352655,,,,,,302.0,1.0,2019_Aug_08_1306,RunOne,3.1.5,59.502594,
7,trial_2.csv,"Please rate the following images as either ""le...",familyother,C:/Users/TTNI/Desktop/TTNI fMRI Studies/TTNI f...,familyother,0.0,0.0,0.0,1.0,0.0,6.0,6.0,27.0,,,,,,,17.0,58.793222,4.005088,1.0,,,,,71.512774,73.359587,73.376149,71.51277370000025,1.0,71.512774,73.359587,1.401659,,,,,302.0,1.0,2019_Aug_08_1306,RunOne,3.1.5,59.502594,
8,trial_2.csv,"Please rate the following images as either ""le...",nonalcohol,C:/Users/TTNI/Desktop/TTNI fMRI Studies/TTNI f...,water,0.0,0.0,0.0,1.0,0.0,7.0,7.0,28.0,,,,,,,18.0,62.793529,4.000304,1.0,,,,,75.512985,77.365736,77.382583,75.5129847000062,2.0,75.512985,77.365736,1.343338,,,,,302.0,1.0,2019_Aug_08_1306,RunOne,3.1.5,59.502594,
9,trial_2.csv,"Please rate the following images as either ""le...",family,C:/Users/TTNI/Desktop/TTNI fMRI Studies/TTNI f...,family,0.0,0.0,0.0,1.0,0.0,8.0,8.0,19.0,,,,,,,20.0,66.311504,3.517973,1.0,,,,,79.030823,80.873596,80.873596,79.0308226000052,2.0,79.030823,80.873596,0.995476,,,,,302.0,1.0,2019_Aug_08_1306,RunOne,3.1.5,59.502594,


In [17]:
test_df.loc[:, "global_time"] - test_df.loc[1, "global_time"] + (test_df.loc[1, "trigger_count"] * 2)

0            NaN
1      12.000000
2      15.509929
3      19.024261
4      22.520969
5      25.862749
6      29.059560
7      32.051422
8      35.582041
9      38.656566
10     42.414430
11     45.607877
12     49.459512
13     52.641226
14     55.482385
15     58.472395
16     61.673366
17     64.691812
18     68.710168
19     72.699943
20     75.715949
21     79.740585
22     83.754831
23     86.801610
24     89.811350
25     93.806220
26     97.315227
27    100.333396
28    103.357492
29    106.345740
30    109.369443
31    112.187725
32    136.005881
33    140.050122
34    143.053326
35    146.065863
36    150.095173
37    153.131815
38    156.578179
39    160.600916
40    164.118919
41    167.445578
42    171.142775
43    175.157615
44    178.188869
45    181.744508
46    185.207571
47    189.213759
48    193.226848
49    196.238845
50    200.084982
51    204.299588
52    208.300051
53    212.309462
54    215.146048
55    218.159501
56    221.408372
57    224.384200
58    227.9019

In [24]:
for sub in subs:
    files = glob.glob(f"../psychopy_csv/{sub}/*")
    for file in files:
        # Catching the fact that sub-2 has different filenames.
        if sub == 2:
            run = file.split(".")[-2][-1]
        else:
            if file.split("_")[-5] == "RunOne":
                run = 1
            if file.split("_")[-5] == "RunTwo":
                run = 2
        
        # Creating the filename that we will use to store the file
        fname = f"sub-{sub:02d}_task-images_run-{run}_events.tsv"
        fpath = f"/lustre/scratch/mzielins/collab_files/preproc/fmriprep/sub-{sub:02d}/func/"
        
        # Creating an empty dataframe that contains the columns that we want in the final EV
        ev = pd.DataFrame(columns = ["condition", "onset", "duration"])
        
        # Reading in the CSV file that corresponds to each run
        df = pd.read_csv(file)
        
        # Removing any rows in which the "trigger_count" column is null.
        df = df.loc[df["trigger_count"].notnull()]
        
        # Replacing the "NaN" that corresponds to the break at the end of the run with "break"
        df[["cat"]] = df[["cat"]].replace({np.nan: "instructions"})
        
        ev["condition"] = df.loc[:, "cat"]
        ev["onset"] = test_df.loc[:, "global_time"] - test_df.loc[1, "global_time"] + (test_df.loc[1, "trigger_count"] * 2)
        ev["duration"] = df.loc[:, "studyimage.stopped"] - df.loc[:, "studyimage.started"]
        
        try:
            ev.to_csv(os.path.join(fpath,fname), sep = "\t", index = False)
        except(FileNotFoundError):
            print(f"no events file for sub-{sub:02d} run {run}!")

no events file for sub-20 run 1!
no events file for sub-20 run 2!


In [19]:
ev

Unnamed: 0,condition,onset,duration
1,family,12.0,1.76862
2,alcohol,15.509929,1.755767
3,family,19.024261,1.766915
4,familyother,22.520969,1.783789
5,neutral,25.862749,2.006866
6,neutral,29.05956,1.768348
7,familyother,32.051422,1.790095
8,familyother,35.582041,1.765105
9,family,38.656566,1.714674
10,neutral,42.41443,1.999996


## Trigger Counting
Because there is some doubt as to when the scan was started relative to when the spacebar was pressed to begin logging triggers, we need to create some dataframes that allow us to inspect the data a little bit more closely. In this dataframe, I will include `condition`, `trigger_count`, and `global_time` (the time relative to when the spacebar was pressed).

In [49]:
for sub in subs:
    files = glob.glob(f"../psychopy_csv/{sub}/*")
    for file in files:
        # Catching the fact that sub-2 has different filenames.
        if sub == 2:
            run = file.split(".")[-2][-1]
        else:
            if file.split("_")[-5] == "RunOne":
                run = 1
            if file.split("_")[-5] == "RunTwo":
                run = 2
        
        # Creating the filename that we will use to store the file
        fname = f"sub-{sub:02d}_task-images_run-{run}_events.tsv"
        fpath = f"/lustre/scratch/mzielins/collab_files/test_files"
        
        # Creating an empty dataframe that contains the columns that we want in the final EV
        ev = pd.DataFrame(columns = ["condition", "trigger_count", "diffs", "global_time"])
        
        # Reading in the CSV file that corresponds to each run
        df = pd.read_csv(file)
        
        # Removing any rows in which the "trigger_count" column is null.
#         df = df.loc[df["trigger_count"].notnull()]
        
        # Replacing the "NaN" that corresponds to the break and the end and instructions at the beginning of the run with "break"
        df[["cat"]] = df[["cat"]].replace({np.nan: "break"})
        
        ev["condition"] = df.loc[:, "cat"]
        ev["global_time"] = df.loc[:, "global_time"]
        ev["diffs"] = df.loc[:, "trigger_count"].diff()
        ev["trigger_count"] = df.loc[:, "trigger_count"]
        
        for i,row in ev.iterrows():
            if pd.isnull(ev.loc[i, "diffs"]) or ev.loc[i, "diffs"] < 1:
                ev.loc[i, "trigger_total"] =  ev.loc[i, "trigger_count"]
            else: ev.loc[i, "trigger_total"] =  ev.loc[i, "diffs"]
        
        ev = ev.loc[ev["trigger_total"].notnull()]
        ev["trigger_count"] = ev.trigger_total.cumsum()
        
        ev_ = ev.loc[:, ["condition", "global_time", "trigger_count"]]
        
        try:
            ev_.to_csv(os.path.join(fpath,fname), sep = "\t", index = False)
        except(FileNotFoundError):
            print(f"no events file for sub-{sub:02d} run {run}!")

## Creating a summary dataframe with all of the subjects, runs, and trigger counts. 

In [78]:
files = glob.glob("/lustre/scratch/mzielins/collab_files/test_files/*")
summ_df = pd.DataFrame(columns = ["sub", "run", "triggers"])
for i, file in enumerate(np.sort(files)):
    df = pd.read_csv(file, sep = "\t")
    summ_df.loc[i, "sub"] = file.split("_")[-4].split("/")[1]
    summ_df.loc[i, "run"] = file.split("_")[-2]
    summ_df.loc[i, "triggers"] = df.iloc[-1, df.columns.get_loc("trigger_count")]

In [81]:
summ_df.to_csv("../summ_df.tsv", sep = "\t", index = False)