# File Reading 

- [**Imports**](#Imports)
- [**Introduction**](#Introduction)
- [**Reading EDF Files**](#Reading-EDF-Files)
    - [**Properties and Attributes**](#Properties-and-Attributes)
    - [**Header Information**](#Header-Information)
    - [**Reading EDF Data to Arrays**](#Reading-EDF-Data-to-Arrays)
    - [**File Resources and Context Management**](#File-Resources-and-Context-Management)
- [**Writing EDF Files**](#Writing-EDF-Files)
- [**EDF Annotations**](#EDF-Annotations)
- [**Producing from EDF Files with Annotations**](#Producing-from-EDF-Files-with-Annotations)

## Objectives for 7/7 - 7-14

- 1. <s>Review producers demo checking for typos, markdown errors, and hyperlink errors.</s>
- 2. Review differences between our producer demos. A few items I noticed:
    - a. Code simplicity -- my largest cell contains only 8 lines. Demos should use only small code snippets.
    - b. Show call signatures and help documentation
    - c. Highlight important points with text boldings, colors etc.
    - d. print what instances look like on return where needed
    - e. limit use of subfunctions with kwargs set to specific values; delegate kwargs to caller function (65)
    - f. realistic examples -see the sizes of arrays chosen, the selection of items to mask in producers etc.
- 3. This file reading demo

Todo:
- finish first draft
- colorize everything
- <s>fix hyperlinks</s>
- proof
- push clean notebook that demonstrates downloading of files from repo

## Imports

In [5]:
import numpy as np

from openseize.io.bases import Reader, Header, Writer
from openseize.io import edf, annotations
from openseize import demos
from openseize import producer

## Introduction

><s><font size=3> Openseize provides a host of tools for working with and analyzing EEG data. This data can be stored in various file types, and a critical step to performing any work on the data within a file is to read that data into memory. The openseize package provides <font color='darkcyan'><b>an EDF Reader class that can take raw EDF files and extract the EEG records from them for analysis.</b></font> 
>
><s><font size=3> If you would like to work with additional file types that support EEG data, we describe the requirements for creating your own Readers towards the end of this demo.</s>

><font color='red'><font size=3>Openseize currently provides tools for reading and writing European Data Format (EDF) binary files. The details of this file specification can be found here: https://www.edfplus.info/specs/edf.html</font>
>
><font color='red'><font size=3>This demo will describe how to open, read, and produce data from an EDF file using the <font color='darkcyan'><b>EDF Reader class</b></font> and write data to an EDF file using the <font color='darkcyan'><b>EDF Writer class</b></font>. Additionally, this demo will cover how to read Comma Separated (CSV) and Tab separated value (TSV) annotation text files and use the resulting annotations to mask produced EEG numpy arrays.</font>
    
    

## Reading EDF Files

<font color='red'> I think it is better to reverse the order and show the help call to the edf.Reader. This will show that the class needs to be initialized with a path that you will then get.</font>

><font size=3> In order to read from a file, we first need to acquire the path to the file on your machine. For these demos, we have stored demo data to a remote Zenodo repository. The demos module we imported has access to the files in this repo; we can see what's available by calling the <font color='firebrick'><i>available</i></font> method.

In [6]:
demos.paths.available

---Available demo data files & location---
------------------------------------------
annotations_001.txt            '/home/matt/python...nnotations_001.txt'
recording_001.edf              '/home/matt/python.../recording_001.edf'
5872_Left_group A.txt          '/home/matt/python...2_Left_group A.txt'
CW0259_SWDs.npy                '/home/matt/python...ta/CW0259_SWDs.npy'
subset_001.edf                 '/home/matt/python...ata/subset_001.edf'
5872_Left_group A.edf          '/home/matt/python...2_Left_group A.edf'


><font size=3> If the file is currently on your system, you'll see a local location after that file's name. If not, you'll see a link to the Zenodo repo. Regardless of its location, we can get access to a file by calling the <font color='firebrick'><i>locate</i></font> method. If the file hasn't already been found on your local machine, it will be downloaded to the demos data folder. This may take a few minutes, but will occur only once.

In [7]:
# Get access to the file's path locally, downloading if needed
filepath = demos.paths.locate('recording_001.edf')

In [8]:
# We can see the file's location on our local machine now that it has downloaded.
demos.paths.available

---Available demo data files & location---
------------------------------------------
annotations_001.txt            '/home/matt/python...nnotations_001.txt'
recording_001.edf              '/home/matt/python.../recording_001.edf'
5872_Left_group A.txt          '/home/matt/python...2_Left_group A.txt'
CW0259_SWDs.npy                '/home/matt/python...ta/CW0259_SWDs.npy'
subset_001.edf                 '/home/matt/python...ata/subset_001.edf'
5872_Left_group A.edf          '/home/matt/python...2_Left_group A.edf'


><font size=3>Now that we have access to a filepath, we can build a Reader around it. Before we do so, however, <font color=firebrick><i>let's take a look at its built-in help file.</i></font>

In [9]:
help(edf.Reader)

Help on class Reader in module openseize.io.edf:

class Reader(openseize.io.bases.Reader)
 |  Reader(path)
 |  
 |  A reader of European Data Format (EDF/EDF+) files.
 |  
 |  The EDF specification has a header section followed by data records
 |  Each data record contains all signals stored sequentially. EDF+
 |  files include an annotation signal within each data record. To
 |  distinguish these signals we refer to data containing signals as
 |  channels and annotation signals as annotation. Currently, this reader
 |  does not support the reading of annotation signals.
 |  
 |  For details on the EDF/+ file specification please see:
 |  
 |  https://www.edfplus.info/specs/index.html
 |  
 |  Attributes:
 |      header: A dictionary representation of an EDF Header.
 |      shape: A tuple of channels, samples contained in this EDF
 |  
 |  Method resolution order:
 |      Reader
 |      openseize.io.bases.Reader
 |      abc.ABC
 |      openseize.core.mixins.ViewInstance
 |      builtin

><font size=3>As we can see, <b>the EDF Reader takes in a single parameter, the file path we obtained earlier</b>. <font color=firebrick><i>Let's pass in that path to create our Reader object.</i></font>

In [10]:
reader = edf.Reader(filepath)

### Properties and Attributes

><font size=3><s>Now that we have a Reader, we can do a printout to look at the attributes it provides us with.</s>
    
<font color='red'>To view the attributes and properties of this reader we can print the reader instance.</font>

In [11]:
# Print out the reader object to see its attributes
print(reader)

Reader Object
---Attributes & Properties---
{'path': PosixPath('/home/matt/python/nri/openseize/demos/data/recording_001.edf'),
 'header': {'version': '0',
            'patient': 'PIN-42 M 11-MAR-1952 Animal',
            'recording': 'Startdate 15-AUG-2020 X X X',
            'start_date': '15.08.20',
            'start_time': '09.59.15',
            'header_bytes': 1536,
            'reserved_0': 'EDF+C',
            'num_records': 3775,
            'record_duration': 1.0,
            'num_signals': 5,
            'names': ['EEG EEG_1_SA-B', 'EEG EEG_2_SA-B', 'EEG EEG_3_SA-B',
                      'EEG EEG_4_SA-B', 'EDF Annotations'],
            'transducers': ['8401 HS:15279', '8401 HS:15279', '8401 HS:15279',
                            '8401 HS:15279', ''],
            'physical_dim': ['uV', 'uV', 'uV', 'uV', ''],
            'physical_min': [-8144.31, -8144.31, -8144.31, -8144.31, -1.0],
            'physical_max': [8144.319, 8144.319, 8144.319, 8144.319, 1.0],
            'dig

><s><font size=3>As we can see, the Reader contains <b>three attributes:</b> the <font color=firebrick>path</font> to the file we are reading from, a <font color=firebrick>header</font> dictionary which contains a series of settings and information particular to this EEG reading, and the <font color=firebrick>shape</font> of the data in the Reader.</s> 
>
><s><font size=3>Now that's a lot of prelude. Let's get to the reason we're here: Reading data from a file.</s>
>
><font size=3><s>Reading is as simple as making a call to the <font color=firebrick>read</font> method on the reader; sensibly, it's the only method required to make a reader a Reader! <font color=firebrick><i>Let's take a closer look at it.</i></font></s>

<font color='red'> The reader contains three attributes; a path to the open file, a dictionary containing the EDF's header information, and the shape of the data, represented as a 2-D numpy array, with channels along 0th axis and samples along the 1st axis.</br>
The header dictionary contains all information stored to the header section of the EDF file. Details on the exact meaning of each of these fields can be found here: https://www.edfplus.info/specs/edf.html. To ease access to the header data, the header is a dict instance that has been extended to include '.' dot notation attribute access 
    

In [12]:
# Fetch the names of the channels using '.' dot notation
print(reader.header.names)

['EEG EEG_1_SA-B', 'EEG EEG_2_SA-B', 'EEG EEG_3_SA-B', 'EEG EEG_4_SA-B', 'EDF Annotations']


<font color='red'>With the open reader instance, we can call the read method to read EDF data. To understand the parameters of this method lets ask for help.

In [13]:
help(reader.read)

Help on method read in module openseize.io.edf:

read(start, stop=None, channels=None, padvalue=nan) method of openseize.io.edf.Reader instance
    Reads samples from this EDF for the specified channels.
    
    Args:
        start: int
            The start sample index to read.
        stop: int
            The stop sample index to read (exclusive). If None, samples
            will be read until the end of file. Default is None.
        channels: sequence
            Sequence of channels to read from EDF. If None, all channels
            in the EDF will be read. Default is None.
        padvalue: float
            Value to pad to channels that run out of samples to return.
            Only applicable if sample rates of channels differ. Default
            padvalue is NaN.
    
    Returns: 
        A float64 array of shape len(chs) x (stop-start) samples.



><s><font size=3>Read takes in at minimum a <font color=firebrick>start value</font>. This indexes at what line in the file we should start reading out data. If you want, you can also give a <font color=firebrick>stop index</font>; otherwise it will read to the end of the file.</s>

<font color=red>The Readers read method reads from a start sample to a stop sample within the file. If the stop sample is not given the reader will read to the end of the file.

In [14]:
# Important for demos -- state what you are showing
# read samples 0 to 5 for all 4 channels
reader.read(0, 5)

array([[-19.87908032,   7.95793213,  19.88808032,  18.89390131,
         18.89390131],
       [-86.4890744 ,  51.70180884,  63.63195703,  88.48643243,
         63.63195703],
       [-85.49489539,  44.74255573,  29.82987048,  79.53882129,
         52.69598785],
       [ 62.63777802,  95.44568555,  77.55046326,  36.7891236 ,
        109.36419177]])

><s><font size=3>As you can see, this call read out five values from each channel in the EDF data. If we want, we can also pass in a <font color=firebrick>sequence of channel values</font> to limit ourselves to the ones we care about.</s>

<font color='red'>In addition to reading specific samples, the read method supports reading only a selection of channels.</font>

In [15]:
# read samples 0 to 5 for channels 0 and 2
reader.read(0, 5, channels=[0, 2])

array([[-19.87908032,   7.95793213,  19.88808032,  18.89390131,
         18.89390131],
       [-85.49489539,  44.74255573,  29.82987048,  79.53882129,
         52.69598785]])

><s><font size=3>Additionally, there is a <font color=firebrick>padvalue parameter</font>, that can be used if your channels do not all have values present at each record. Those empty spaces will be filled with this padvalue.</s>

<font color='red'>The EDF file specification allows for signals that may be sampled at different sample rates to be stored to the same file. In this case, a signal will have fewer samples than other signals in the file. In order to return non-ragged numpy arrays, the Reader will append padvalue to shorter signals so that all signals have the same length. This padvalue defaults to np.NaN but may take on any value useful for your analysis. 

<font color='red'>Remove the Header Information section since we have covered the header in enough detail. Bytemaps, count_signals, and filter are internally used and should not need to be called by clients.

### <s>Header Information</s>

><s><font size=3>EDF files begin with a few lines of metadata, called a <b>header</b>. This header will contain important details regarding the EDF recording, including information about the patient as well as the recording technology. Generally, though, we will want to keep these lines separate from the actual EEG records.</s>
>
><s><font size=3>By passing an EDF file into an EDF Reader, <b><i>this information is automatically stored in a Header object</i></b>. There is a general Header abstract base class that the EDF Header implements; we will take a look at this, and how to create Headers of your own, later on. For now, let's look at what the EDF Header provides.</s>

In [16]:
#help(edf.Header)

<s>><font size=3>As we can see, the Header maintains a dictionary of all the information stored in those first few lines that make up the EDF file's header. The Header has a few methods you may find of interest:</s>
>    
><s><font size=3><ul style=“list-style-type:square”>
    <li><font color=firebrick>bytemap</font>, which outputs the list of possible header listings and their associated data types and sizes.</li>
    <li><font color=firebrick>count_signals</font>, which tracks the number of signals in the recording (including the implicit annotation channel).</li>
    <li><font color=firebrick>filter</font>, which you can use to filter down the information in the header to only those pieces that are relevant to a select number of channels.</li></s>


In [17]:
#header = reader.header
#print("Number of signals: {}".format(header.count_signals()))
#print("Filtered header by channels 0 and 2: \n{}".format(header.filter([0, 2])))

><s><font size=3>One very nice property of the Header object is that it extends '.' dot notation to access the underlying dictionary elements. <b><i>So we can directly reference attributes of the dictionary without needing to use 'get' methods.</i></b></s>

In [18]:
"""print("Patient: {}".format(header.patient))
print("Header Bytes: {}".format(header.header_bytes))
print("Transducers: {}".format(header.transducers))
"""

'print("Patient: {}".format(header.patient))\nprint("Header Bytes: {}".format(header.header_bytes))\nprint("Transducers: {}".format(header.transducers))\n'

### File Resources and Context Management

<font color='red'>We have seen how to create a Reader instance and use it's read method to extract data from an EDF file. However, the file is still open and using resources that you need to recover. To do this you can call the Reader instance's <font color='firebrick'>close</font> method.

><s><font size=3>The Reader objects we present here are abstractions of Python's base level open and read operations. This being the case, files that we open with Readers will exist in your memory as open links, taking up space. <b>It's very important to close these files when you are done with them.</b> We provide a method on the Reader to do just that.</s>

In [19]:
reader.close()

<font color='red'>To address this potential resource leak, openseize supports opening files using context managment. What does this mean?</font>
In python you open a text file using a piece of code that looks like this<br>
> with open('somefile.text', 'r') as infile:<br>
  >> process file
    
<font color='red'>When opened this way the file is automatically closed at the end of the "with" context. EDF Readers support opening EDF files in a context managed protocol too. Here's how to open the file using the context manager protocol.</font>

In [21]:
# Open Reader as Context Manager and read data from within context
with edf.Reader(filepath) as reader:
    data = reader.read(0)
    print(data[:5])

# Attempt to read from Reader after context has closed
try:
    reader.read(0,)
except ValueError as err:
    print("\nValueError:", err)

[[-1.98790803e+01  7.95793213e+00  1.98880803e+01 ...  4.50000000e-03
   4.50000000e-03  4.50000000e-03]
 [-8.64890744e+01  5.17018088e+01  6.36319570e+01 ...  4.50000000e-03
   4.50000000e-03  4.50000000e-03]
 [-8.54948954e+01  4.47425557e+01  2.98298705e+01 ...  4.50000000e-03
   4.50000000e-03  4.50000000e-03]
 [ 6.26377780e+01  9.54456855e+01  7.75504633e+01 ...  4.50000000e-03
   4.50000000e-03  4.50000000e-03]]

ValueError: seek of closed file


<font color='red'>This method of opening files inside a specific context and performing operations on the data is the preferred way to work with files in Openseize since the resources are automatically recovered at the end of the context.

### <s>EDF Context Managers</s>

><s><font size=3>There are some more complex situations where you may want to keep track of your open readers with a Context Manager. This would watch the number of links to that open reader that exist in memory, and automatically close the open file when that number hits zero. This is usually not necessary and often inefficient for small use cases.</s>
>    
><s><font size=3><b><i>In order to open your Reader as a Context Manager, use the familiar syntax Python uses to open raw files, as seen in the snippet below.</i></b> Note that after the with block has ended, the Reader is considered closed and cannot be read from any more.</s>
    

## Writing EDF Files

><s><font size=3>While you are most likely only going to need to read from existing EDF files in your posession, openseize also allows you to write your own new EDFs using a <font color=firebrick>Writer</font> object. <b>Here, we will show off the current Writer type available in openseize, the EDF Writer. </b></s>
>    
<font color='red'>In addition to an EDF file Reader, Openseize provides an EDF file writer. One of the use cases for this Writer is to split an EDF with channels corresponding to multiple subjects into multiple EDFs containing channels for only one subject. For example, if your EDF contains 3 subjects with 4 channels, their is a total of 12 signals in the EDF. The Writer can then be used to write 3 files each containing 4 channels. Lets examine how to use this Writer. We'll again start by asking for help.

In [16]:
help(edf.Writer)

Help on class Writer in module openseize.io.edf:

class Writer(openseize.io.bases.Writer)
 |  Writer(path)
 |  
 |  A writer of European Data Format (EDF) files.
 |  
 |  This writer does not support writing annotations to an EDF file.
 |  
 |  Method resolution order:
 |      Writer
 |      openseize.io.bases.Writer
 |      abc.ABC
 |      openseize.core.mixins.ViewInstance
 |      builtins.object
 |  
 |  Methods defined here:
 |  
 |  __init__(self, path)
 |      Initialize this Writer. See base class for futher details.
 |  
 |  write(self, header, data, channels, verbose=True)
 |      Write header & data for each channel to file object.
 |      
 |      Args:
 |          header: dict
 |              A mapping of EDF compliant fields and values. For Further
 |              details see Header class of this module.
 |          data: 2-D array or Reader instance
 |              A channels x samples array or Reader instance.
 |          channels: sequence
 |              A sequence of 

><font size=3><s>As we can see, the Writer takes in only a <font color=firebrick>filepath</font> as a constructor input. The only method that is defined is, appropriately, <font color=firebrick>write</font>. As a demonstration, let's write a new EDF file that holds only the first and third channels of the demo EDF we've been working with, along with the appropriate header data. First, <b>let's look at what the write method requires as inputs.</b></s>
    
<font color='red'>To construct a Writer instance you need to provide a file path where the writer will write the new EDF file to. The <font color=firebrick>write</font> method is what you will need to call in order to write data to the file path. Lets examine this method by asking for the method's documentation.

In [17]:
help(edf.Writer.write)

Help on function write in module openseize.io.edf:

write(self, header, data, channels, verbose=True)
    Write header & data for each channel to file object.
    
    Args:
        header: dict
            A mapping of EDF compliant fields and values. For Further
            details see Header class of this module.
        data: 2-D array or Reader instance
            A channels x samples array or Reader instance.
        channels: sequence
            A sequence of channel indices to write to this Writer's 
            open file instance.
        verbose: bool
            An option to print progress of write. Default (True) prints
            status update as each record is written.



><s><font size=3>Alright, so it looks like we're going to need <font color=firebrick>the Header object</font> corresponding to our data, <font color=firebrick>the data itself</font> as a 2-D array or just a Reader pointing to the data, and <font color=firebrick>a sequence of channel indices</font> to index our newly stored data. There is also an option to have the writing process print updates as it writes to a file.</s>
>
><font size=3><s>For now, <b>let's select a filepath for our new file and pass it into a new Writer object.</b></s>
    
<font color='red'>To write an EDF compliant file, the write method will need an EDF Header instance with all required fields and values expected of the EDF file type. An enumeration of the required fields and values can be found by examining the header printed above or be reading the EDF file specification here: https://www.edfplus.info/specs/edf.html </font>
    
<font color='red'>In addition to an EDF compliant Header instance, the write method needs data. This data may be an in-memory or a reader instance from which data will be fetched.</font>
    
<font color='red'>Lastly, the write method can take a list of channel indices. These channel indices will be used to filter both the Header instance and the data. For example if you provide a Header containing metadata for 4 signals and an array containing 4 signals, you can request to write out a subset of the signals, say channel indices [0, 2]. This allows for the splitting of a multichannel EDF into multiple EDFs. Importantly both the new data written and the new Header will contain only data and metadata for the 2 channels written. Lets demonstrate these ideas with an example</font>

In [24]:
# Important Demos must work without the client having to change paths within specific cells

# Edit this path to a directory on your local machine if you would like to replicate this process.
#new_filepath = "/home/josh/work/openseize/demos/data/recording_001_edited.edf"

save_path = demos.paths.data_dir.joinpath('subset_001.edf')

# Create an EDF writer pointing to this path
writer = edf.Writer(save_path)
print(writer)

Writer Object
---Attributes & Properties---
{'path': PosixPath('/home/matt/python/nri/openseize/demos/data/subset_001.edf')}

Type help(Writer) for full documentation


><s><font size=3>Now, we note that we are only after the first and third channels of data. One would think you might need to filter the data first, as well as the Header, but the Writer does all that work for us. Just by passing in the values for the channels parameter, <b><i>the Writer will automatically filter the data from the reader as well as the header for those channels.</i></b></s>
>
><s><font size=3>With that in mind, we can simply pass those respective parameters into the Writer write method to write to our new file.</s>

<font color='red'>The writer knows where it will write data to and the write method can now be called to perform the writing. We will write channels 0 and 2 from the 'recording_001.edf' we used earlier. Since this file has a header, we will reuse that header. The write method will select metadata from the header corresponding to channels 0 and 2. The method will also only write data records corresponding to channels 0 and 2. Remember to open the reader as a context manager so the file resources are automatically recovered.</font>

In [27]:
#reader = edf.Reader(filepath) #Re open reader if previously closed
#writer.write(reader.header, reader, channels=[0,2]) # Write Reader data from channels 0 and 2 to new EDF file

#locate the path to the recording
fp = demos.paths.locate('recording_001.edf')

#open the reader as context manager
with edf.Reader(fp) as reader:
    
    #open the writer as a context manager
    with edf.Writer(save_path) as writer:
        
        #write channels 0 and 2 from the header and reader's data
        writer.write(reader.header, reader, channels=[0,2])
        

Writing data: 100.0% complete

<font color='red'>Notice here that we called both the Reader and Writer as context managers. Just like reader instances, writer instances maintain an open file to write to that is using your machines resources. By opening both the reader and writer as context managers, these file resources will be closed when the reading and writing is finished.</font>
</br>
<font color='red'>Now let's reopen the 'subset_001.edf' file we just wrote and make sure the header and data looks correct.</font>

><s><font size=3>Did that work like we planned? Let's find out. We'll create a new Reader for the file we just wrote to and take a look at its header and data.</s>

In [36]:
#second_file_reader = edf.Reader(new_filepath)
#print(second_file_reader.header)
with edf.Reader(save_path) as reader:
    
    # lets print the readers Header-- it should only have metadata for channels 0 and 2
    print('---EDF SUBSET HEADER---')
    print(reader.header)
    
    #lets print the first 5 samples and check these against the full data
    print('---EDF SUBSET DATA---')
    print(reader.read(0,5))

---EDF SUBSET HEADER---
{'version': '0',
 'patient': 'PIN-42 M 11-MAR-1952 Animal',
 'recording': 'Startdate 15-AUG-2020 X X X',
 'start_date': '15.08.20',
 'start_time': '09.59.15',
 'header_bytes': 768,
 'reserved_0': 'EDF+C',
 'num_records': 3775,
 'record_duration': 1.0,
 'num_signals': 2,
 'names': ['EEG EEG_1_SA-B', 'EEG EEG_3_SA-B'],
 'transducers': ['8401 HS:15279', '8401 HS:15279'],
 'physical_dim': ['uV', 'uV'],
 'physical_min': [-8144.31, -8144.31],
 'physical_max': [8144.319, 8144.319],
 'digital_min': [-8192.0, -8192.0],
 'digital_max': [8192.0, 8192.0],
 'prefiltering': ['none', 'none'],
 'samples_per_record': [5000, 5000],
 'reserved_1': ['', '']}

{'Accessible Properties': ['annotated', 'annotation', 'channels', 'offsets',
                           'record_map', 'samples', 'slopes']}
---EDF SUBSET DATA---
[[-19.87908032   7.95793213  19.88808032  18.89390131  18.89390131]
 [-85.49489539  44.74255573  29.82987048  79.53882129  52.69598785]]


><s><font size=3>The Header is only showing data for two channels, which is a good sign. Now, let's check the data.</s>
    
<font color='red'>Both the header and the data appear to contain only the metadata and data for channels 0 and 2. Now lets check that is the case by examining all the data against the original 'recording_001.edf' demo file.</font>

In [37]:
#second_file_reader.read(0, 5) remove  me

><s><font size=3>That all looks well and good. But to be absolutely sure, <b>let's extract the first and third channels from our initial Reader, and compare them to our new Reader's data directly.</b></s>

In [35]:
# State what your doing and don't use trailing ","
#print("Do the records match? ", np.allclose(reader.read(0, channels=[0, 2]), second_file_reader.read(0,)))

#fp is the still the filepath to recording_001.edf
with edf.Reader(fp) as reader:
    
    #read all 4 channels from the file
    all_data = reader.read(0)

#save_path is where the subset_001.edf resides
with edf.Reader(save_path) as reader:
    
    #read the 2 channels from the subset file
    two_ch_data = reader.read(0)
    
print("Do the arrays match? -> ", np.allclose(all_data[[0,2], :], two_ch_data))

Do the arrays match? ->  True


## EDF Annotations

><font size=3>EDF Files will contain headers and data records as we have just seen. There is, in fact, a third piece of information that may be stored onto the EDF file alongside the raw data, the annotations. <font color=firebrick>Annotations</font> are extra labels that can be used to denote significant events or time periods in the EEG data. Examples may include recording artifacts to be removed, or periods of the subject being awake/asleep.
>
><font size=3>Annotations can be used to separate data by different needs, (e.g. filtering out all noted times where the subject is asleep.) Because of this, openseize provides a special Annotation object type to read these out of the EDF file and work with them directly. 
>    
><font size=3>Annotations can come in various formats, but for now, we will look only at the <font color=firebrick>Pinnacle</font> template of annotation, for which we have provided a specific Annotation object. <b>Let's take a look at what this Pinnacle Annotation looks like in openseize:</b>
    
<font color='red'>In addition to EDF file readers, Openseize provides annotation file readers. Typically, annotation files are comma-separated or tab-separated value text files that contain time-stamps and labels of important events that occurred during an EEG recording session. Here we will show how to open a Pinnacle format annotation TSV text file. Lets start by looking at the documentation for this annotation reader.</font> 

In [23]:
help(annotations.Pinnacle)

Help on class Pinnacle in module openseize.io.annotations:

class Pinnacle(openseize.io.bases.Annotations)
 |  Pinnacle(path, **kwargs)
 |  
 |  A Pinnacle Technologies© annotations file reader.
 |  
 |  Pinnacle files store annotation data to a plain text file. This
 |  reader reads each row of this file extracting and storing annotation
 |  data to a sequence Annotation objects one per annotation (row) in the
 |  file.
 |  
 |  Method resolution order:
 |      Pinnacle
 |      openseize.io.bases.Annotations
 |      abc.ABC
 |      builtins.object
 |  
 |  Methods defined here:
 |  
 |  channel(self, row)
 |      Extracts the annotation channel for a row in this file.
 |  
 |  duration(self, row)
 |      Measures the duration of an annotation for a row in this file.
 |  
 |  label(self, row)
 |      Extracts the annotation label for a row in this file.
 |  
 |  open(self, path, start=0, delimiter='\t', **kwargs)
 |      Opens a file returning a file handle and row iterator.
 |      
 

><s><font size=3>As the helpfile notes, the only parameter to a new Pinnacle Annotations object is a <font color=firebrick>path</font>. This path should point directly to a Pinnacle annotation file, which is usually a comma- or tab-separated value text document. Like before, <b>we will pull down a sample annotations file form our demos repo and create a Pinnacle object from it.</b></s>
    
<font color='red'>To construct an annotation reader you will need to provide a path to an annotation file. This path is given to the open method (see above). Additionally, you may need to pass in a start line of the file. This describes what line the column data starts on. Lets fetch the demo file "annotations_001.txt" if it is not on your system already and display the file's contents.</font>

In [44]:
#determine the local path using the locate method and download if necessary
annotations_path = demos.paths.locate('annotations_001.txt')
#areader = annotations.Pinnacle(annotations_path, start=6)

#lets take a look at the file
with open(annotations_path, 'r') as infile:
    for idx, row in enumerate(infile):
        print(idx, row)

0 Experiment ID	Experiment

1 Animal ID	Animal

2 Researcher	Test

3 Directory path	

4 

5 

6 Number	Start Time	End Time	Time From Start	Channel	Annotation

7 0	08/15/20 09:59:15.215	08/15/20 09:59:15.215	0.0000	ALL	Started Recording

8 1	08/15/20 10:00:00.000	08/15/20 10:00:00.000	44.7850	ALL	Qi_start

9 2	08/15/20 10:00:25.000	08/15/20 10:00:30.000	69.7850	ALL	grooming

10 3	08/15/20 10:00:45.000	08/15/20 10:00:50.000	89.7850	ALL	grooming

11 4	08/15/20 10:02:15.000	08/15/20 10:02:20.000	179.7850	ALL	grooming

12 5	08/15/20 10:04:36.000	08/15/20 10:04:41.000	320.7850	ALL	exploring

13 6	08/15/20 10:05:50.000	08/15/20 10:05:55.000	394.7850	ALL	exploring

14 7	08/15/20 10:08:50.000	08/15/20 10:08:55.000	574.7850	ALL	rest

15 8	08/15/20 10:10:14.000	08/15/20 10:10:19.000	658.7850	ALL	exploring

16 9	08/15/20 10:17:10.000	08/15/20 10:17:15.000	1074.7850	ALL	rest

17 10	08/15/20 10:35:49.000	08/15/20 10:35:54.000	2193.7850	ALL	rest

18 11	08/15/20 10:40:00.000	08/15/20 10:40:00.000	2444

<font color='red'>With this path  we can now construct an Annotations reader instance. Just as with Readers and Writers an instance can (and most of the time should) be constructed as a context manager. Below we are going to construct the annotations reader starting from line 6 since that is the row containing the column headers of the file. Note this initialization argument is passed to the open method which can accept any argument that python's builtin CSV.DictReader can accept.</font>

><s><font size=3>Now that we have the Pinnacle reader, we can <b>take a look at the list of annotations in the file.</b></s>

In [48]:
#open the annotations and read all the annotations in the file using the 'read' method
with annotations.Pinnacle(annotations_path, start=6) as reader:
    
    #call read to get the annotations as a sequence of Annotation instances (to be described in a moment)
    annotes = reader.read()
    
#print the sequence of annotation instances
for instance in annotes:
    print(instance)

Annotation(label='Started Recording', time=0.0, duration=0.0, channel='ALL')
Annotation(label='Qi_start', time=44.785, duration=0.0, channel='ALL')
Annotation(label='grooming', time=69.785, duration=5.0, channel='ALL')
Annotation(label='grooming', time=89.785, duration=5.0, channel='ALL')
Annotation(label='grooming', time=179.785, duration=5.0, channel='ALL')
Annotation(label='exploring', time=320.785, duration=5.0, channel='ALL')
Annotation(label='exploring', time=394.785, duration=5.0, channel='ALL')
Annotation(label='rest', time=574.785, duration=5.0, channel='ALL')
Annotation(label='exploring', time=658.785, duration=5.0, channel='ALL')
Annotation(label='rest', time=1074.785, duration=5.0, channel='ALL')
Annotation(label='rest', time=2193.785, duration=5.0, channel='ALL')
Annotation(label='Qi_stop', time=2444.785, duration=0.0, channel='ALL')
Annotation(label='Stopped Recording', time=3774.664, duration=0.0, channel='ALL')


In [47]:
#areader.read() remove me

><s><font size=3>As we can see, each annotation has been read into a unique <font color=firebrick>Annotation</font> object. We can consider the output of an Annotations Reader to be a list of these Annotation objects. Let's <b>look at a single one of these Annotations in finer detail.</b></s>
</br>
<font color='red'>You can see that we have fetched all of the annotations from the displayed file and stored each annotation to an Annotation instance. What is this instance? An Annotation object is a python dataclass. If you haven't seen this before, you can think of it as a simple container with '.' dot notation access to the container's contents. Lets examime the third dataclass instance.</font>

In [60]:
#fetch the third annotation item and display it
item = annotes[3]
print(item)

#access the items time from recording start
print('This annotation occurred at {} s relative to the start time'.format(item.time))

Annotation(label='grooming', time=89.785, duration=5.0, channel='ALL')
This annotation occurred at 89.785 s relative to the start time


In [61]:
#DELETE

#areader = annotations.Pinnacle(path=annotations_path, start=6) # We have to reset the reader to reiterate over annotations
#print(areader.read()[4])

><font size=3>The key pieces of information are given to us in a single Annotation instance:
>    * <font color=firebrick>label</font> - a piece of text describing the annotation
>    * <font color=firebrick>time</font> - the exact point in time (in seconds) from the beginning of the recording that the annotation takes place
>    * <font color=firebrick>duration</font> - the length (in seconds) of the annotation from its start time 
    >    * <font color=firebrick>channel</font> - a list of the channels in the EEG recording that the annotation is applied to

><s><font size=3>In the process of performing analysis, you may want to filter the list of annotations you have by what type of annotation it is. We use the label as the descriptor from which we can perform this filter. <b>By passing in a list of labels, you can read only the annotations you care about using the Pinnacle Reader.</b><s>
    
<font color='red'>In the preceeding example we read all of the annotations from the Pinnacle formatted file but the Annotations 'read' method can accept a sequence of labels to selectively read only some of the annotations. Let's show how this works on this demo annotation file.</font>

In [63]:
#areader = annotations.Pinnacle(path=annotations_path, start=6) # We have to reset the reader to reiterate over annotations
#areader.read(labels=['rest', 'exploring'])

#read only the annotations with labels matching either rest or exploring
with annotations.Pinnacle(annotations_path, start=6) as reader:
    subset_annotes = reader.read(labels=['rest', 'exploring'])
    
for annote in subset_annotes:
    print(annote)

Annotation(label='exploring', time=320.785, duration=5.0, channel='ALL')
Annotation(label='exploring', time=394.785, duration=5.0, channel='ALL')
Annotation(label='rest', time=574.785, duration=5.0, channel='ALL')
Annotation(label='exploring', time=658.785, duration=5.0, channel='ALL')
Annotation(label='rest', time=1074.785, duration=5.0, channel='ALL')
Annotation(label='rest', time=2193.785, duration=5.0, channel='ALL')


## Producing from EDF Files with Annotations 

><s><font size=3>Here, we filter the annotations by the labels 'exploring' and 'resting', excluding the remainder. Now aside from solely filtering the annotations themselves, we can use them to also filter EEG data via a process called <i><b>masking</b></i>. Masking refers to applying a filter to the data so that only portions of it are read out, and the rest ignored. The mask itself is just <b>an array of True/False values, corresponding to each sample in the EEG data</b>, determining whether that sample should be kept or removed. Applying a mask to an EDF file can be done with the use of a producer (for more details, see the demo on Producers).</s>
> <font color='red'>Two important components of an Annotation instance is the time and duration attributes. These attributes allow for selective filtering of EEG data returned from either a Reader or a producer. To do this, the annotation dataclass  instances are converted into a boolean mask that can pick out samples of data to keep or discard. Here we will demonstrate how to construct a boolean mask from a list of annotation instances and use that mask to filter a producer's yielded numpy arrays. Further details can be found in the producer demo. 
>
><font size=3>The annotations module provides a method for generating a mask automatically from a series of annotation objects, the <font color=firebrick>as_mask</font> method.

In [28]:
help(annotations.as_mask)

Help on function as_mask in module openseize.io.annotations:

as_mask(annotations, size, fs, include=True)
    Convert a sequence of annotation objects into a 1-D boolean array. 
    
    Args:
        annotations: list
            A sequence of annotation objects to convert to a mask.
        size: int
            The length of the boolean array to return.
        fs: int
            The sampling rate in Hz of the recorded EEG.
        include: bool
            A boolean to determine if annotations should be set to True or
            False in the returned array. Default is True, meaning all values
            are False in the returned array except for samples where the
            annotations are located.
    
    Returns:
        A 1-D boolean array of length size.



><s><font size=3>This method takes in <font color=firebrick>a list of Annotation objects</font>, the <font color=firebrick>size</font> of the mask to create (usually the number of samples in the EDF file you wish to mask), and the <font color=firebrick>sampling rate</font> of the recording. In addition, you can apply an <font color=firebrick>include</font> parameter, to determine if the annotations you pass in are meant to come back as True or False (this depends on if your goal is to filter for or against the annotated time periods).</s>
><font color='red'>To construct a mask, <font color=firebrick>as_mask</font> needs a sequence of annotation dataclass instances, the size of the mask along the sample axis, the sampling rate to convert the annotation times to samples, and a boolean "include" parameter which determines if the annotations should be kept (True) or discarded (False) from the EEG data.
><font size=3>Here, as an example, <b>we create such a mask.</b>

In [77]:
#DELETE
#areader = annotations.Pinnacle(path=annotations_path, start=6)

# Create mask, 
#mask = annotations.as_mask(areader.read(), size=reader.shape[-1], fs=1000, include=True)

# Check how many values in the recording mask to False or True
#np.unique(mask, return_counts=True)

In [None]:
#Read the annotations from the demo file
with annotations.Pinnacle(annotations_path, start=6) as reader:
    subset_annotes = reader.read(labels=['rest', 'exploring'])

In [75]:
#Build the mask; fp is the still the filepath to recording_001.edf; size and fs can be fetched from reader
with edf.Reader(fp) as reader:    
    size = reader.shape[-1]
    fs = reader.header.samples_per_record[0]
    
mask = annotations.as_mask(subset_annotes, size, fs, include=True)

In [76]:
#print the first 10 values of the mask
print(mask[:10])

#The first True values should occur at 320.785 secs * 5000 Hz since fs=5000 and the first annotation 
#(see above occurs at 320.785). Lets confirm this by print 10 samples around this sample
start = int(320.785 * 5000)
print(mask[start-5: start+5])

#lastly lets print out the total number of samples we will keep
expected = len(subset_annotes) * 5 * 5000 # each annote is 5 secs @ 5 kHz
actual = np.count_nonzero(mask)
print('Expected number of samples to keep is {} \nActual number kept is {}'.format(expected, actual))

[False False False False False False False False False False]
[False False False False False  True  True  True  True  True]
Expected number of samples to keep is 150000 
Actual number kept is 150000


><s><font size=3>Here, we can see that filtering out any records that are not annotated results in 45000 records or 45 seconds of True values in our mask. This mask can then be applied to a producer to filter out these results directly from an EDF file.</s>

In [78]:
# BUild a producer with this mask and show that it has the expected shape