The main purpose of this assignment is to guide you through navigating EEGLAB. We will load data, filter data and plot our results. Throughout this assignment, you will be asked questions. These questions will be answered with the accompanying Gradescope assignment. This assignment was created in a way that you should answer each question as it is presented to you. Please do not forget to submit every answer independently within Gradescope to ensure you do not lose any of your work.

## Install EEGLAB
We only need to do this one time, but let's install EEGLAB. EEGLAB is the most widely used/cited toolbox for EEG processing. It is written in MATLAB, although it does have a standalone port. Individual EEGLAB functions can be executed through Python vi direct interfacing between MATLAB and Python, however, if one is using an existing analysis downloaded from the internet or generated through `eegh`, it's more feasible to run the code in a MATLAB cell as opposed to translating the code to incorporate `.eng()` calls.

Execute the following line of code:

In [None]:
# Only run this once!
!git clone --recursive --depth=1 "https://github.com/sccn/eeglab.git"

## Setup
The code under here shouldn't be removed, but can be added to. Essentially it's just setting up our environment, loading libraries/toolboxes and getting things ready.

In [None]:
import numpy as np
import matplotlib.pyplot as plt

In [None]:
% EEGLAB is installed above in the 'Install EEGLAB' section.
addpath(genpath('eeglab'));

Let's confirm that we're able to locate EEGLAB by executing the following command. If this doesn't work, please make sure you have eeglab installed per the first cell and make sure MATLAB and Python are able to communicate (check README).

In [None]:
eeg_getversion()

You should see a version number. This should return `'dev'`

## Part 1: Loading the Data
Load in the synthetic data `SynthData.set` using EEGLAB's `pop_loadset()` function. The EEG data collected was sampled at 500 Hz. For those unfamiliar: 

$Hz = \frac{1}{second}$

Which means we collect 500 points of data per second at a data collection rate of 500 Hz.
This is referred to as the **sampling rate**, and is often denoted as: $F_s$ or $f_s$. EEGLAB stores this information under `EEG.srate`.

In [None]:
% Load data and use the EEG object to answer the following questions

#### Answer the following questions on Gradescope

**Question 1:** How many channels are in this dataset?

**Question 2:** What is the sampling rate of this dataset?

**Question 3:** What is the duration of this dataset (to the nearest second)

**Question 4:** Plot the first 1000 msec (1 sec) of data from every channel. Make sure the x-axis (time) is accurate. 

* Label your x-label as "Time (ms)" and y-axis as "Amplitude (uV)"

* *(hint: Use EEG.data and EEG.times and be careful with the dimensions)*

* *(you can simply plot such that all channels overlay each other)*

In [None]:
% You can use this cell to plot, or you can do it above.

## Part 2: Filtering
The data we have is contaminated (intentionally) with high frequency noise. Let's isolate our main signal by utilizing a 30 Hz lowpass filter.

For EEG data analysis, there are various decisions that need to be made when selecting an appropriate filter. All signals are corrupted by outside noise, and EEG data is particularly susceptible to noise from AC electronics, muscle movements, and skin conductance changes (sweat). Filtering allows us to get rid of some of the junk, but it is not magic. There is no substitute for collecting clean data. Filtering will also throw out some of the data we care about, and one of the goals is to minimize how much of the signal we are modifying.

**Question 5:** Apply a 30 Hz FIR filter (using EEGLAB's `pop_eegfiltnew()`) and save the output to the `EEG` variable. Plot the filtered data using the same time window as Question 4.
* Typing `pop_eegfiltnew()` will show you viable inputs

In [None]:
% Q5 answer here

## Part 3: Epoching
We will now investigate the events we have in our dataset and epoch to these events.

The events themselves can be found in `EEG.event`, which contains information about the type of event (`EEG.event.type`) and when the event ocurred (`EEG.event.latency`). In the case of our stimuli, the duration, which is the third field, will always be equal to `1`.

It is good to note that the values stored in `EEG.event.latency` are indices of `EEG.times` and are not themselves temporal values.

Let's explore the events with the following:

*Hint: you can capture all of the events that are in `EEG.event.type` by putting them in a cell array via `x = {EEG.event.type}`*

**Question 6:** What is the label of the very first event?

**Question 7:** How many events are there in total?

**Question 8:** How many unique events are there?

In [None]:
% Answer Q6 -> 8 here

In this case, the first marker is sent only one time and it serves as an indicator that recording has commenced. The remaining markers contain useful information as they represent a specific event occurring. We can extract the EEG activity relative to that information in a process known as binning, grouping, or epoching (which we'll use here).

Epoching is the step of pre-processing where we segment the data into the chunks we care about. We are only interested in the brief periods of time the participant has seen the stimulus. When epoching data, we must decide how much data is worth keeping. Our data in this case are synthetic and not collected from an actual experiment. In reality, the nature of the experiment will determine the length of your epoch, but it's not uncommon to look at signals a few hundred milliseconds before event onset all the way out to 1000 milliseconds after onset.

The window of time before the onset typically contains **baseline** information. Baseline activity should reflect task unrelated neurological activity that was can use to standardize each of our trials to. In traditional experiments, the baseline is often chosen during a time when participants are looking at a blank screen before they're presented with the next trial. Our synthetic example does not require baseline correction so we'll save that for next assignment.

In order to explore our waveforms, we'll define an epoch with the following parameters:
* Epoch start: 500 ms before event (stimulus) onset.
* Epoch end: 1000 ms after stimulus onset.

Once these parameters have been decided, we can epoch the data using the event codes using the function `pop_epoch()`

**Question 9:** Epoch the remaining events into their own variables (e.g. epoch0 and epoch1) after filling out the epoching parameters (take special note of the units). After you epoch, you ***must*** save the following data into these variables:
* `epoch0_data`: should contain the `.data` from `epoch0`
* `epoch1_data`: should contain the `.data` from `epoch1`
* `epoch_times`: should contain the `.times` from EITHER `epoch0` or `epoch1` (the data are the same)

After which, plot the first occurence of each epoched type in a plot and submit this. Plot only the first channel.

*note 1: the dimensionality of `epoch.data` is as follows: Number of channels $\times$ Number of time points $\times$ Number of epochs*

*note 2: you can plot two traces on the same figure by using the command `hold on`*

In [None]:
% Answer Q9 Here

We're almost done! For the final piece of this assignment, we'll import the data into Python for plotting.

Note: Subsequent assignments will utilize Python more.

**Question 10:** Transfer the necessary data over (i.e. your all of your `epoch0_data`, `epoch1_data`,  `epoch_times`) to Python using the `%get` magic and plot the data.

In [None]:
# Answer Q10 here!
# Remember, we're in Python now. Index with [] and plot with plt.plot()
# If epoch0_data and epoch1_data are not behaving well, you can cast them to numpy arrays with:
# epoch0_data = np.array(epoch0_data)