# Exercise 1.  Data Organization and Signal Averaging

### This exercise introduces the organization of EEG data for analysis. There were many preprocessing steps prior to this point, specifically
*   Segmentation of each trial of the experiment
*   Removal of trials with excessive artifacts 
*   Signal processing to remove eye blinks and eye movements from the EEG 
*   Organization of EEG, trial labels, and behavioral data into a structure.  


## Target Detection Experiment 

### These data were extracted from the ERP CORE (https://osf.io/thsqg/wiki/home/).  

### These are data from the experiment, **Active Visual Oddball P3**

### Some details of the experiment - 
*   The stimulus consists of the letters A, B, C, D, E
*   In any block, one of the 5 letters was designated the "target" with a response with one hand, while the others were designated "standards" with response with the other hand. 
*   The probability of any letter appearing is 0.2.  
*   The probability of a target appearing is 0.2 

#### Load modules that we need 

In [None]:
import numpy as np 
from matplotlib import pyplot as plt
from hdf5storage import loadmat, savemat 


## Data Files 

### I provided example data files for this tutorial in a hdf5 format file.  Matlab users will know this as a .mat file.  
### There are native file formats in python called pickle, which are super convenient, but are not appropriate for sharing as they are insecure.    

In [None]:
data = loadmat('data/2_P3.mat')

### Loadmat will will load a datafile into a dictionary.  A **dictionary** is a data structure in python that allows us to keep related data (for example from one data collection) together.  

### To understand a dictionary contents, the best thing to do is to print out the **keys**.  

In [None]:
data.keys()

### I habitually copy out the elements of a dictionary into simple variables, to make my life easy.  This is not required, and may use up memory. 

### To track everything, I use the key names as my variable names

In [None]:
blocktarget = data['blocktarget']
channelnames = data['channelnames']
eeg = data['eeg']
eeg_time = data['eeg_time']
goodtrials = data['goodtrials']
nchannels = data['nchannels']
ntrials = data['ntrials']
response = data['response']
responsetime = data['responsetime']
samplingrate = data['samplingrate']
stimulus = data['stimulus']
target = data['target']

## README 

### The information about the datafiles in an experiment is normally placed in a README file. 
### For convenience I am going to place that information here instead.

###  This data was obtained from the archive ERP CORE and reorganized for this class. 
###  Each file contains the data of one participant, indicated in the filename. 
###  The variables contained here are 

*   `ntrials` - number of trials in the experiment 
*   `nchannels` - number of EEG channels 
*   `samplingrate` - number of samples of EEG in per second
*   `eeg` - eegdata of the experiment.  of dimensions, ntrials x nchannels x ntimepoints.  The EEG is provided in units of volts. 
*   `eeg_time` - the time relative to *stimulus onset* in each EEG observation.
*   `channelnames` - the name of the EEG channels indicating where it is located.    
*   `stimulus` - the stimulus presented on each good trial, 1 = A, 2 = B, 3 = C, 4 = D, 5 = E 
*   `target` - the target stimulus on each good trial 1 = A, 2 = B, 3 = C, 4 = D, 5 = E
*   `blocktarget` - indicates which trial had the target on each block, 1 if a target 0 if not a target. 
*   `response` - variable indicating the response accuracy 1 = correct, -1 = incorrect, 0 = no response,
*   `responsetime` - time after stimulus onset when the subject provided a response. 
*   `goodtrials` - vector with value 1 if the trial had a response, 0 if no response or multiple responses.   


### We can and should take a look at these variables manually before proceding 

### Stimulus and Target 

In [None]:
plt.plot(stimulus,'ro')
plt.plot(target,'bo')
plt.grid()
plt.legend(('Stimulus','Target'))
plt.show()

### Response 

In [None]:
[values, instances] = np.unique(response,return_counts = True)
print('Values are: ', values)
print('Occuring: ', instances)

### Response Time 

In [None]:
plt.hist(responsetime)
plt.title('Response Time Distribution')
plt.xlabel('Time (msec)')
plt.show()

In [None]:
test = (goodtrials == 1)


In [None]:
plt.hist(responsetime[goodtrials == 1])
plt.title('Response Time Distribution')
plt.xlabel('Time (msec)')
plt.show()

### Plot some EEG 

### The eeg variable is 3 dimensional.  The first dimension is the trial, the second dimension is the channel, 3rd dimension is time.  

#### I am going to plot the first trial, channel 20, 

In [None]:
plt.plot(eeg[0,20,:])
plt.show()

In [None]:
plt.plot(eeg_time,eeg[0,20,:])
plt.title('Channel '+channelnames[20])
plt.show()

In [None]:
plt.plot(eeg_time,eeg[0:4,12,:])
plt.title('Channel '+channelnames[12])
plt.xlabel('Time (msec)')
plt.grid()
plt.show()

## EVENT RELATED POTENTIALS 
### Its just a **mean**

In [1]:
erp = np.mean(eeg,axis =0)


NameError: name 'np' is not defined

In [None]:
plt.plot(eeg_time,erp[12,:])
plt.title('Channel '+channelnames[12])
plt.xlabel('Time (msec)')
plt.grid()
plt.show()

### I need to take care of locating good trials

In [None]:
erp = np.mean(eeg[goodtrials == 1,:,:],axis =0)

In [None]:
plt.plot(eeg_time,erp[12,:])
plt.title('Channel '+channelnames[12])
plt.xlabel('Time (msec)')
plt.grid()
plt.show()

## TASK CONTRAST

### In analysis neural data there is usually a task contrast, or patient/control contrast I am really interested in.  

### So averaging all the data together doesnt really inform me of everything. 

### In this task, the critical thing we are looking for is the difference between a target and a standard. 

### A target trial would correspond to when the stimulus and the target objective were the same. 

### So I need to interest the two conditions in a compound logical statement.  

In [None]:
erp_target = np.mean(eeg[(goodtrials == 1)&(stimulus==target),:,:],axis =0)

In [None]:
plt.plot(eeg_time,erp_target[12,:])
plt.title('Channel '+channelnames[12])
plt.xlabel('Time (msec)')
plt.grid()
plt.show()

In [None]:
erp_target = np.mean(eeg[(goodtrials == 1)&(blocktarget == 1),:,:],axis =0)
erp_standard = np.mean(eeg[(goodtrials == 1)&(blocktarget != 1),:,:],axis =0)

In [None]:
plt.plot(eeg_time,erp_target[12,:],'r')
plt.plot(eeg_time,erp_standard[12,:],'b')
plt.title('Channel '+channelnames[12])
plt.xlabel('Time (msec)')
plt.grid()
plt.show()

## Problem 1:

#### Make a new estimate of ERP_standard with the same number of trials included in the average as ERP_standard. For your convenience I have made a variable blocktarget which contains a value of 1 for each time the stimulus was the target and 0 otherwise. 

#### One way to do this is to use the trial just before the target as your "standard" trials.  The idea is that all other things are as equal as possible when you look at that response. 



## Problem #2: 

#### Create 5 separate ERP averages for each letter using only the standards (dont include the targets).  Also make an average for just the targets. 

#### Investigate the channels O1, O2, PO7, PO8 to examine if the deflection of the signal around 170 ms shows any difference for different letter identies or for the target. 

#### Make a plot of each channel with all 6 waveforms.  