In [None]:
%matplotlib widget
import matplotlib.pyplot as plt
import numpy as np
import os
import pyxdf

#Directory where data is located
dir_data = 'data'

#XDF Files we will use in this tutorial:
xdf_oddball = os.path.join(dir_data,'hm_visual_oddball_s01_cond1.xdf')
xdf_soundHealing = os.path.join(dir_data,'sound_healing_pilot04_bowls.xdf')
xdf_soundHealing_ecg = os.path.join(dir_data,'meditation_music_pilot_tibetinbowls.xdf')
xdf_meditation = os.path.join(dir_data,'vrcct_p34_poststudy_condition.xdf')

# __Main Points__

1. __Lab Streaming Layer__ is an open-source middleware ecosystem to stream, receive, synchronize, and record time series measurements from a wide array of sensor hardware (e.g. EEG headsets, ECG devices, and eye trackers). It has __various programming language interfaces__ (C, C++, Python, Java, C#, MATLAB) and is __cross-platform__ (OS Support: Win / Linux / MacOS / Android / iOS; Architecture Support: x86 / amd64 / arm)

2. Data is recorded using __LabRecorder__ and records the data into the __XDF file format (Extensible Data Format)__. You can find more information about this file format [here](https://github.com/sccn/xdf).

3. XDF files can be loaded in Python with the package [__pyxdf__](https://github.com/xdf-modules/pyxdf), allowing you analyze it. There are various types of data we need be aware of that include EEG, event markers, eye tracking, and ECG.


# __What is Lab Streaming Layer?__

From the [documentation](https://labstreaminglayer.readthedocs.io/info/intro.html), __Lab Streaming Layer__ (__LSL__) is a system for the unified collection of measurement time series in research experiments that handles both networking and time-synchronization (sub-millisecond accuracy). The suite of tools on top of the LSL distribution allow for (near-) real-time access, viewing, and recording of the data. This system was developed by researchers at the Swartz Center for Computational Neuroscience (SCCN). 

While this definition may sound somewhat nebulous, many (if not all) experiments carried out at the SCCN use LSL. With LSL, researchers can simultaneously record neural, physiological, and behavioral data acquired from a diverse collection of sensor hardware. That is, a single LSL recording, along with metadata, can be comprised of EEG, ECG, eye tracking, and audio data, all of which can have different sampling rates and multiple channels, synchronized to one clock! Moreover, with its various programming language interfaces, we can integrate LSL with hardware API using several lines of code, allowing various kinds of data to be sent via LSL. It can be simple to use. 

So __how do you record this data?__ One way to record data is through the use of [__LabRecorder__](https://github.com/labstreaminglayer/App-LabRecorder). This LSL application records data into the XDF file format ([Extensible File Format](https://github.com/sccn/xdf)). 

__Unless you are designing your own experimental pardigm or collecting data, you won't interact with the LSL distribution or Lab Recorder directly__. I will try to provide examples on how you can integrate LSL into any kind of script (using the Python Package pylsl) in case some of you may be interested. 


# __How do we access this data?__ 

Before we begin with the tutorial, we need to define a few terms that may be unfamilar with: 

- __Sample__: A single measurement of all channels from a device. For instance, it can be a single gaze data point from an eye tracker or voltage measurements for all nodes from an EEG headset.
- __Metadata__: data that provides information about the stream (see below), but not the actual data (e.g. name, device manufacturer, number of channels, etc.)
- __Stream__: The combination of sampled data from a device with the metadata. A stream can have regular sampling rate (e.g. audio can be sampled at 44100Hz) or irregular sampling rate (e.g. key presses or experimental events) with multiple channels (EEG headsets can have 24 channels or more). Each stream is required to have the same data type (integers, floats, doubles, strings). 

We can access an XDF file with Python using the package __pyxdf__. We have imported it above in the first cell. __This tutorial will show how you can access it and what types of data you may expect.__

## __os - Miscellaneous operating system interfaces__

If you haven't worked with loading in files, the Python module __os__ may be new. This module provides ways of using operating system dependent functionality. Your operating system will determine how your paths will be structured: 

1. For windows, there are drives, backslashes or double backslashes: "C:\Users\JohnSmith" or "C:\\Users\\JohnSmith"
2. For Linus, you don't have the typical drives as in Windows, but use forward slashes: "/home/JohnSmith"

This is where the Python module os comes in. It allows us to combine strings into a proper path used by your operating system. Rather than guessing if you use a backslash or foward slash (or writing Python code that may be used for other people with different operating systems), you can use os to always have the correct path structure. One function you will see here is __os.path.join__. I encourage you to read the [documentation](https://docs.python.org/3/library/os.path.html). Moreover, you do more than create paths to files. You can check if a file exist (__os.path.isfile__), check if a directory exists (__os.path.isdir__), create new directory (__os.mkdir__), or list all files in a directory (__os.listdir__). 

# __1 Loading Data with Pyxdf__

We will use pyxdf's __load_xdf__ function to see what these mysterious XDF files contain. We can find more information on this function using the help function (__?pyxdf.load_xdf__). From the documentation, this function will return a __list of dictionaries (one for each stream) and a file header as a dictionary__. These streams will have metadata that will help you identify what type of data is represented in the stream. However, __the quality and comprehensiveness of the metadata depends on how careful the device manufacturer or the script author is on documenting the metadata__ (I'll show you an example of how some metadata can be confusing later). Regardless, most streams will have relevant metadata. When the metadata is incomplete or unclear, ask us and we can find out. 

## __1.1 Accessing Metadata of an XDF file__

In the following, we load an XDF file that was recorded for our Sound Healing experiment, where the subject was fitted with a __128 channel Biosemi EEG cap__. The subject was then asked to sit down and listen to __a live performance that was recorded via LSL as well__. There was also an event marker stream that didn't really serve a purpose for this experiment. 

In [None]:
#This will give you more information on the function load_xdf from the pyxdf package (uncomment the next line of code):
# ?pyxdf.load_xdf

#Loading in streams (we don't really need the file header): 
streams,_ = pyxdf.load_xdf(xdf_soundHealing)

#Each stream is a dictionary. We can access the metadata using the 'info' key. 
# We can see name, stream type, number of channels, and other information:
print('The metadata can have the following information:')
print(list(streams[0]['info'].keys()),'\n')


#Printing some relevant metadata to identify 
print('This XDF File contains the following streams:')
for i,stream in enumerate(streams): 
    print('Stream %i'%(i+1))
    print('stream name: %s'%stream['info']['name'][0])
    print('stream type: %s'%stream['info']['type'][0])
    print('number of channels: %s'%stream['info']['channel_count'][0])
    print('channel format: %s'%stream['info']['channel_format'][0])
    print('nominal sampling rate: %s'%stream['info']['nominal_srate'][0])
    print('Effective (calculated) sampling rate: %f'%stream['info']['effective_srate'],'\n')

## __1.2 Accessing Streams Data__

Let's look at the EEG stream (Biosemi). From the metadata, we see that this stream has 137 channels sampled at 2048 Hz. We can access each datum for all channels and its coresponding timestamp (LSL clock). Our Biosemi EEG data is the __second__ dictionary in our __streams__ list. In our stream, we can access it using the '__time_series__' key. It should be an array with N rows (total number of samples) and 137 columns (Number of Channels; 128 EEG channels, 8 external channels, and 1 trigger). We can also access the time stamps of each sample using the dictionary key '__time_stamps__'. Our metadata usually provides us channel information (for multi-channel streams) using the 'desc' key. However, this depends on how well the manufacturer documented its metadata. 

### __1.2.1 EEG__

In [None]:
#Access the second stream (EEG) in our xdf file:
EEG = streams[1]['time_series'] # EEG data
EEG_ts = streams[1]['time_stamps'] #timestamps
EEG_meta = streams[1]['info']


#Displaying the shape of this array: 
print('EEG stream has a total of %i samples'%EEG.shape[0])
print('EEG stream has %i channels'%EEG.shape[1])
print('EEG stream has a total run time of %0.3f seconds \n'%(EEG_ts[-1]-EEG_ts[0]))

#For multi-channel streams, we could access channel labels
channel_names = [ chan['label'][0] for chan in EEG_meta['desc'][0]['channels'][0]['channel'] ]
print('The channels in the BioSemi EEG cap include: ')
print(channel_names)

### __1.2.2 Audio__

We only have one channel for our audio stream since we recorded the live performance with a one microphone. It is the __third stream__ in our streams list. We can access it the same. 

In [None]:
#Access the third stream (Audio) in our xdf file: 
audio = streams[2]['time_series'] # EEG data
audio_ts = streams[2]['time_stamps'] #timestamps
audio_meta = streams[2]['info']

#Displaying the shape of this array: 
print('Audio stream has a total of %i samples'%audio.shape[0])
print('Audio stream has %i channels'%audio.shape[1])
print('Audio stream has a total run time of %0.3f seconds \n'%(audio_ts[-1]-audio_ts[0]))

## 1.3 Accessing a Specific Stream

You may have noticed that we loaded __ALL STREAMS__ from our XDF recording. Moreover, __if we load another XDF recording from another subject, the order of streams loaded may be different__. For instance, it may have the EEG as our first stream and audio stream as our second stream. Additionally, we don't use the event markers, so we don't want to waste time loading that in. Fortunately, we can load a specific stream(s) using the argument __select_stream__ for the load_xdf function. Again you can use the help function to find more information (__?pyxdf.load_xdf__). To use this, you must know the metadata of the device (it could be as simple as knowing what type and name of stream). If you don't know any metadata of the devices, look at the sections above to display them. Other recordings of the same experiments will be expected to have the same device metadata. Let's demonstrate this. 

In [None]:
#Loading in EEG stream and accessing data:
EEG_stream, _ = pyxdf.load_xdf(
    xdf_soundHealing,
    select_streams=[{'type':'EEG'}]
)

EEG = EEG_stream[0]['time_series']
EEG_ts = EEG_stream[0]['time_stamps']

#Loading in Audio stream and accessing data:
audio_stream, _ = pyxdf.load_xdf(
    xdf_soundHealing,
    select_streams=[{'type':'Audio'}]
)
audio = audio_stream[0]['time_series']
audio_ts = audio_stream[0]['time_stamps']

# __2 What type of data will you encounter?__

Depending on the experiment, you may be asked to analyze different kinds of data. While not a comprehensive list, there are four that I encounter: 1) EEG, 2) Event Markers, 3) Eye Tracking, 4) ECG, and 5) Audio. 

## 2.1 EEG

EEG is the most prevalent data you will encounter (duh! we are working for the Swartz Center for Computational Neuroscience). However, we don't really do EEG data analysis in Python and is mainly done in MATLAB using EEGLab. If you want to explore EEG data analysis in Python, check out [MNE-Python](https://mne.tools/stable/index.html).

## 2.2 Event Markers

Event Markers are used to annotate our data. They can be strings or integers. They mark when an event happens in the experiment. For instance, we can send event markers when the experiment starts and ends. We can even send event markers when the subject presses keys on the keyboard. You may notice that these types of streams have an irregular sampling rate (sampling rate of 0 Hz; see above in metadata section). We can use these events to analyze specific sections of an experiment. 

Let's look at a recording. For an oddball experiment, there were two different types of cubes (standard and deviant) shown to a subject in virtual reality at four different positions (as indicated in each event). Events are sent as a strings. These events tell you what cube is displayed (standard or deviant), the trial number (number of cubes displayed), and the relative position (in x,y,z coordinates). This experiment is used to characterize fixations, saccades, and smooth pursuits in immersive 3D VR with free head movement (ongoing as of Oct. 2022). 

In [None]:
#Loading in Event Markers Stream for oddball
event_stream,_ = pyxdf.load_xdf(
    xdf_oddball,
    select_streams=[{'name':'EventMarker'}]
)

#note that string markers are stored as lists, so we use list iteration
events = [event[0] for event in event_stream[0]['time_series']]
events_ts = event_stream[0]['time_stamps']

#Displaying an event
# We have trial number, type of cube, and coordinate
# We also have event markers when this trial ends
print(events[1])
print(events[3])

## __2.3 Eye Tracking__

Eye tracking data can have many channels that correspond to 2D gaze points, 3D gaze points, angular velocity, etc. Gaze data can be used to identify fixations (which are adjacent collections of gaze points that are below certain thresholds). Fixations can then be seen as events, allowing us to analyze sections of EEG and plotting fixation related potentials (FRPs).

The oddball experiment that was described in the previous section does have an eye tracking stream. However, it has poorly annotated metadata. We don't have any information on the channels and rely on our notes to see what each channel contains. Also notice that the sampling rate is 90Hz instead of the expected 50Hz (which is indicated by the metadata). 

In [None]:
#Loading in Event Markers Stream for oddball
gaze_stream,_ = pyxdf.load_xdf(
    xdf_oddball,
    select_streams=[{'name':'ProEyeGaze'}]
)

#2d Gaze coordinates of the left eye are contained in channels 1 and 2:
gaze2d_left = gaze_stream[0]['time_series'][:,:2]
print('Left Eye 2D Coordinates')
print(gaze2d_left[0,:],'\n') #first sample

#3d Gaze coordinates of the left eye are contaiend in channels 4,5,6
print('Left Eye 3D Coordinates')
gaze3d_left = gaze_stream[0]['time_series'][:,3:6]
print(gaze3d_left[0,:]) #first sample

## __2.4 ECG__

ECG can be a little tricky to extract sometimes because some EEG headsets have external channels that can be used to measure ECG. There is no dedicated ECG stream in this case. Other times, we do have a dedicated ECG stream with one (or two) channels. 

I will display these two types of scenarios. The first case is from our Sound Healing pilot study (briefly described in __Section 1.1__) where we wanted to measure ECG of a subject listening to a concert recording. It has a dedicated ECG stream (from the HeartyPatch device). We have two channels with the ECG data and the device timestamps the sample was collected.

In [None]:
# Recording with a dedicated ECG stream 
ECG_stream,_ = pyxdf.load_xdf(
    xdf_soundHealing_ecg,
    select_streams=[{'type':'ECG'}],
)

#Raw ECG data: first channel
ECG = ECG_stream[0]['time_series'][:,0]
ECG_ts = ECG_stream[0]['time_stamps']

#Plotting first 10 seconds of recording: 
bool_array = (ECG_ts>=ECG_ts[0]) & (ECG_ts<=ECG_ts[0]+10)

fig1,ax1 = plt.subplots(figsize=(7,5))
ax1.plot(ECG_ts[bool_array],ECG[bool_array])
ax1.set_title('Sound Healing')
ax1.set_xlabel('LSL Clock Time (seconds)')
ax1.set_ylabel('ECG')

Next, we will look at a recording from our VRCCT experiment where subjects were in a VR space and guided through a meditation session. ECG (from the left and right arm) is in the EEG stream on its external channels (66 and 67). We will load the EEG stream and then extract the ECG data. 

In [None]:
#Loading in EEG stream:
EEG_stream, _ = pyxdf.load_xdf(
    xdf_meditation,
    select_streams=[{'type':'EEG'}]
)

#Extracting EEG:
EEG = EEG_stream[0]['time_series']

#Extracting ECG (channel 66) from the EEG data:
ECG = EEG[:,65]

#Timestamps correspond to both EEG and ECG:
timestamps = EEG_stream[0]['time_stamps']

#Plotting first 10 seconds of recording: 
bool_array = (timestamps>=timestamps[0]) & (timestamps<=timestamps[0]+10)

fig1,ax2 = plt.subplots(figsize=(7,5))
ax2.plot(timestamps[bool_array],ECG[bool_array]) #note the baseline wander! 
ax2.set_title('VRCCT Experiment')
ax2.set_xlabel('LSL clock Time (seconds)')
ax2.set_ylabel('ECG')

## __2.5 Audio__

One of our projects involve subjects listening to audio. We may need to do some analysis on these audio stream (creating spectrograms of the music or pinpointing certain music features that we are interested in). We can use Python packages like [__librosa__](https://librosa.org/doc/latest/index.html) to analyze such data. These types of streams are characterized by a large sampling rate (in the case of our Sound Healing experiment, we sampled at 11025 Hz for one channel). I mainly analyze these types of data in conjunction with event markers. 