In [None]:
%matplotlib widget

import copy
import heartpy as hp
import matplotlib
import matplotlib.pyplot as plt
from matplotlib.patches import Ellipse
import neurokit2 as nk
import numpy as np
import os
import pandas as pd
import pyxdf
import systole

plt.style.use('ggplot')

# __Event Markers__

What are event markers!? 

They mark events but what is the significance of this. We can use event markers to mark when the experiment starts if we started the LSL recording before the actual experiment. You can also use event markers to mark when a part of the experiment ends. This way, we can __extract sections__ of the data and analyze them separately (rather than analyze the whole dataset where we waste time processing data that we won't look at). These unwanted sections may also have a lot of noise as the experiment transitions from different parts. 

However, event markers are only informative as the author of the code made them out to be. Sometimes, we may come across event markers that aren't informative at all (e.g. identical markers to mark the start of multiple experimental parts). Different parts of an experiment should be marked by unique events. However, repetitive markers are also useful. For instance, for an oddball experiment, repetitive stimuli is presented (audio or visual) that is interrupted by a deviant stimuli. For EEG, we can analyze event related potentials (ERPs) by using repetitive event markers, allowing us to mass extract these events. Sometimes, we may have missing event markers and may need to add our own events. 

# __Experimental Paradigm__

We will analyze the VR-CCT data where we record ECG (Channel 66 and 67 in the EEG stream) and EEG data from subjects meditating (guided by a talking meditation ball) in VR. We will extract some sections of the data when the meditation actually starts. In the first session (__Prestudy__), there is only one part where the subject meditated. In the second session (__Poststudy__), there are two meditation parts (one with a fixed time and the others with varied time). Before each experiment, a baseline (__Prebaseline__) was taken and can use this to normalize our data.

## __Prestudy__

We have an event marker stream called __EventStream__ (name) and there is one event we are interested in that marks when the meditation starts: __"Start meditation audio"__. However, we do not have a marker to mark when the meditating part ends. Fortunately, the same meditation audio is the same in all LSL recordings and each are the same length in time: 530 seconds. We will extract this 530 second section. 

## __Poststudy__

For this second session, we still have a marker stream called __Event Stream__ (name). For the first meditation part, the subject undergoes the same guided meditation as the first session but for only __237 seconds__. This part is guided with the same meditation ball. It uses the same event marker: __"Start meditation audio"__. In the second meditation part, the meditation is guided by a "compassionate tree", and the time is varied. We still have an event marker to mark when the meditation audio starts: __"Start second meditation audio"__. To get the time elapsed in this second meditation part, we look at the video to see how much time elapsed before the end of the meditation audio. We created an Excel Spread sheet (__PostStudy_TreeEvents.xlsx__) that records what the time elapsed in the second meditation audio (for any who did the second session). 

# Naming Convention

Please refer to the __001_ecg_fundamental__ Jupyter Notebook in the same directory as this one is stored. It should provide information on the importance on naming datasets. 

# __1. Overview__

We will process subject data from the VR-CCT project (see above for description). A pre-requisite of this is to go over the __00_lsl_fundamentals__  and the __00_ecg_fundamentals__ Jupyter Notebooks. If you haven't please go back and go over these Notebooks. 

For the prestudy data, we will find the event marker and create a boolean/logic array that represents time stamps between the onset of the marker and 530 seconds after onset. We can then extract this data. 

For the poststudy data, we will find the event markers and create boolean arrays. While the first meditation part can be extracted in the same way as the prestudy, the second part requires the additional task of extracting time elapsed after the second marker from the __PostStudy_TreeEvents__ spreadsheet.  

This will be an interactive Notebook in which you will need to write code to extract the data. I will provide the answers in a different notebook, but please try to do it yourself. 

# __2. Prestudy Extraction__

## __2.1 Defining Necessary Variables__

In the following code block, we provide variables (not all will be used) for you to use in the interactive exercises. 

In [None]:
#Directory where data is located (change this to where you will store your data)
dir_data = 'data'

#Directory where xdf is stored: 
dir_xdf = os.path.join(dir_data,'xdf')

#XDF file names
file_format = 'vrcct_%s_%s_%s.%s'

#Subject Identifiers: 
subjects = ['p16']

#Study Identifier: 
studies = ['prestudy','poststudy']

#Experiment Part: 
parts = ['condition','prebaseline']

#Directories we need to use
dir_data = 'data'
dir_xdf = os.path.join(dir_data,'xdf')

## __2.2 INTERACTIVE EXERCISE: Loading ECG Data__

Load ECG data from the prestudy data using the __pyxdf.load_xdf__ function (for more info use the __?pyxdf.load_xdf__ help information). The ECG is not stored as a dedicated stream, so we need to load the EEG stream and extract ECG.
    
- Load in the only EEG stream from the prestudy dataset as a variable called __EEG_stream__
- ECG is the channel 66 of the EEG stream (remember Python uses zero indexing); extract it and store as a variable called __ECG__. 
- Extract the timestamps of the ECG/EEG stream and store it as a variable called __ECG_ts__

In [None]:
#Subject id,study(prestudy),part(condition is meditation)
sub = subjects[0]
study = studies[0]
part = parts[0]

#Load in the Prestudy EEG stream and store it as a variable called EEG_stream

# Extract ECG from the EEG stream and store it as a variable called EEG (remember what the output is from the last lines)

# Extract timestamps for ECG/EEG stream and store it as a variable called ECG_ts


## __2.3 INTERACTIVE EXERCISE: Loading Marker Stream__

Load the marker stream (named __EventStream__) using the __pyxdf.load_xdf__ function. 

- Load in the __EventStream__ stream frm the prestudy dataset as a variable called __event_stream__
- Extract the markers. Note that these are markers are stored as a list of a string. Please use __list comprehension__ to extract them as a list variable called __events__
    - Convert this list as a numpy array using the function __np.asarray__. 
- Extract timestamps for these markers and store it as a variable called __events_ts__

In [None]:
# Load in the Prestudy Keyboard stream and store it as a variable called event_stream

# Extract markers and store as varible named events using list comprehension

# Convert list events in last line to a numpy array 

# Extract timestamps for events and store as variable called events_ts


## __2.4 Extracting Prestudy Meditation Part__

As mentioned earlier, for the prestudy guided meditation ball part, the audio played for a set amount of time (530 seconds). We will find the time when the meditation audio starts (event: __"Start meditation audio"__). Then extract our desired section. We will use boolean/logic array to do this. Additionally, we will create a list with the sampling rate (you'll see why we do this in the next section). 

In [None]:
#Finding the time the meditation audio starts
t0 = events_ts[events=='Start meditation audio']

#Defining time when audio ends
tf = t0 + 530 

#Boolean array to ECG within these times: 
ba = (ECG_ts >= t0) & (ECG_ts <= tf)

#Extract ECG and timestamps
ECG_medball = copy.deepcopy(ECG[ba])
ECG_medball_ts = copy.deepcopy(ECG_ts[ba])

#Samping rate stored as a list
ECG_srate = [EEG_stream[0]['info']['effective_srate']] * len(ECG_medball)

## __2.5 Organizing Extracted ECG as a DataFrame__

In this section we will finally organize everything into a Pandas DataFrame. 

In [None]:
#DataFrame: 
df_prestudy_medBall = pd.DataFrame(
    {'timestamp_lsl' : ECG_medball_ts,
     'timestamp_norm' : ECG_medball_ts - t0,#normalizing timestamps
     'ecg_raw' : ECG_medball,
     'srate': ECG_srate}
)

df_prestudy_medBall.head()

# __3 Post Study Extraction__

## __3.1 Helper Function__

The meditation ball part of the poststudy can be extracted in the same way as the Prestudy. In this case, the audio only plays for 257 seconds instead of the 530 seconds. The compassion tree part of the poststudy can also be extracted the same, but we have the varied time elapsed after the second. Rather than writing/copying code over and over again, we can write a helper function that we can use for future analysis. Feel free to modify it for your needs. You might even be so inclined to add preprocessing in this function.

In [None]:
def load_ecg(
    pathFile,
    channel = 0,
    select_streams = None,
    event_markers = None
):
       
    """
    Description: Function to load in xdf files (and other types) and extract ecg data

    Parameters
    ----------
    pathFile: String
        Path to xdf recording where ecg is stored
    channel: Integer
        Channel number where ecg data is stored 
    select_streams: list of dictionaries (default: None)
        The stream you want to load in if there are multiple data streams in the xdf file
    event_markers: dictionary (default: None)
        The information for what event stream, event marker to start from, and the time 
        elapsed after the event marker. The dictionary should have the following keys and 
        values: 'select_streams' should contain a list of dictionaries that is used by 
        pyxdf.load_xdf to find the event stream, 'start_event' should contain the 
        marker that you want start the extraction, and 'duration' should contain a 
        float that is the time elapsed after the start event. 

    Returns
    -------
    ecgOut: Pandas DataFrame
        The DataFrame will have four columns corresponding to timestamps, normalized 
        timestamps (normalized by the start event), raw ecg data, and sampling rate.

    """
     #Loading in our ECG data: 
    stream, _ = pyxdf.load_xdf(
        pathFile,
        select_streams=select_streams
    )
    
    # Extract ECG from the EEG stream and store it as a variable called EEG (remember what the output is from the last lines)
    ECG = stream[0]['time_series'][:,int(channel)]

    # Extract timestamps for ECG/EEG stream and store it as a variable called ECG_ts
    ECG_ts = stream[0]['time_stamps']
    
    t0 = ECG_ts[0]
    
    #Extracting only sections of our ECG data: 
    if not(event_markers is None):
        
        try: 
            event_stream,_ = pyxdf.load_xdf(
                pathFile,
                select_streams=event_markers['select_streams']
            )
            markers = [m[0] for m in event_stream[0]['time_series']]
            markers_t = event_stream[0]['time_stamps']
            
            i = markers.index(event_markers['start_event'])
            t0 = markers_t[i]
            tf = t0 + event_markers['duration']
            
            #Boolean array to ECG within these times: 
            ba = (ECG_ts >= t0) & (ECG_ts <= tf)

            #Extract ECG and timestamps
            ECG = copy.deepcopy(ECG[ba])
            ECG_ts = copy.deepcopy(ECG_ts[ba])
            
        except Exception as e: 
            print('Could not extract Event Markers')
            print(e)
            return

    #Samping rate stored as a list
    ECG_srate = [stream[0]['info']['effective_srate']] * len(ECG)
    
    #OutPut DataFrame: 
    df_output = pd.DataFrame(
        {'timestamp_lsl' : ECG_ts,
         'timestamp_norm' : ECG_ts - t0,#normalizing timestamps
         'ecg_raw' : ECG,
         'srate': ECG_srate}
    )
    
    return df_output

## __3.2 Extracting Post Study Meditation Ball with Helper Function__

In [None]:

#We load/Extract our Poststudy LSL recordings 
study = studies[1]

#Events Dictionary to extract meditation part
event_dict = {
    'select_streams' : [{'name' : 'EventStream'}],
    'start_event' : 'Start meditation audio',
    'duration' : 257
}

#Using our Helper Function
df_poststudy_medBall = load_ecg(
    os.path.join(dir_xdf,file_format%(sub,study,part,'xdf')),
    channel=65,
    select_streams=[{'type':'EEG'}],
    event_markers = event_dict
)

df_poststudy_medBall.head()

## __3.3 INTERACTIVE EXCERCISE: Extracting Poststudy Compassion Tree with Helper Function__

To extract this section we need to refer to our __PostStudy_TreeEvents.xlsx__ located in the data directory (dir_data). There are three columns, but only need two of them. The two that we need are the columns __subject__ and __time_elapsed_since_start_audio__. We can load it in as a Pandas DataFrame. We can get the time elapsed using our __sub__ variable. After getting time elapsed, we can use our Helper Function. Perform the following in the next code block. 

- Load in the PostStudy_TreeEvents spreadsheet as a Pandas DataFrame (call it whatever you want)
- Get the time elapsed for our particular subject (call it whatever you want)
- Modify the event_dict variable (used in the previous code block) to change the __start_event to "Start second meditation audio"__ and __duration to time elapsed in previous step__
- Use the Helper Function to extract the Compassion Tree meditation part and call the dataframe as __df_poststudy_compTree__

In [None]:
#Load in PostStudy_TreeEvents spreadsheet as dataframe

#Get time elasped for subject

#modify event_dict variable

#Extract ECG


# __4 Saving All DataFrames__

Finally, we can save our data into a folder in our data directory called csv. With these CSVs, we can preprocess them and find R-peaks. Moreover, we can calculate HRV metrics. Look at the 001_ecg_fundamentals Jupyter Notebook to process it and find peaks. Additionally, you can modify the Helper Function to process and find peaks. You may wonder why we have have the subjects, studies, and parts lists. You can iterate over lists with for loops. These for loops can be used to automate our preprocessing pipeline, so I encourage you to do so. 

In [None]:
#Creating our csv directory: 
dir_csv = os.path.join(dir_data,'csv')

if not(os.path.isdir(dir_csv)):
    os.mkdir(dir_csv)
    print('Created %s'%(dir_csv))
    
df_prestudy_medBall.to_csv(
    os.path.join(dir_csv,file_format%(sub,studies[0],'medBall','csv')),
    index=False
)
df_poststudy_medBall.to_csv(
    os.path.join(dir_csv,file_format%(sub,studies[1],'medBall','csv')),
    index=False
)
df_poststudy_compTree.to_csv(
    os.path.join(dir_csv,file_format%(sub,studies[1],'compTree','csv')),
    index=False
)