In [1]:
from IPython.display import display, HTML, Image, clear_output

<center><span style="color:blue;font-family:helvetica; font-size:3.5rem; font-weight:700;">Reading hypnogram files in Visbrain Sleep</span></center>


# General
Visbrain Sleep provides the option to read hypnogram files in addition to polysomnography information. The information in the [Visbrain documentation](http://visbrain.readthedocs.io/en/latest/sleep.html#supported-files-and-format) is pretty minimal. It includes a warning that there is no international standard for the hypnogram format. Visbrain Sleep supports three extensions for hypnogram files:
- .txt
- .csv
- .hyp(ELAN)  

The Physionet Sleep database contains hypnogram files. For each subject there are two files:
- `"subject identifier".edf.hyp` file
- `"subject identifier"-Hypnogram.edf` file  

The information in both files is basically the same, but differently formatted.  

Note `.edf.hyp` is not the format supported by Visbrain.  

The best option at this stage to read the Physionet hypnogram files in Visbrain is to convert the `"subject identifier"-Hypnogram.edf` file into a text file which can be read by Visbrain



# The Visbrain  .txt hypnogram file
The way Visbrain Sleep handles a hypnogram .txt file is pretty primitive. It actually requires two files to specify the hypnogram. According to the Visbrain Sleep documentation this is to overcome problems caused by different sampling rate of hypnogram files and/or different values assigned to each sleep stages. The two files are:
- `name.txt`  
this file contains an integer value as the identifier of the sleep stage for each time period. The identifier as well as the time period are defined in the second file
- `name_description.txt` file  
this file defines the time period for scoring the sleep stages. Note that there is only one time period. Also the identifiers for the various sleep stages are defined here. The file is not defined as input, but is automatically read by the Visbrain program after deriving the file name from the file containing the sleep stages. The program takes the first part of the filename (`name`) and put `_description.txt` after the filename to construct the name of the file to read. This process results in `name_description.txt`

There is no internal verification that the hypnogram files provide information for the same subject as the file for the channel information. 

The above can be illustrated by the example from Visbrain:  
- the `name.txt` file in this example is called `Hypnogram_excerpt2.txt`. The file has 362 lines:
 - the first line contains `[Hypnogram]`
 - the following lines all contain an integer between 0 and 5 identifying the sleep stage for that time period
- the `name_description.txt` file in this example is called `Hypnogram_excerpt2_description.txt`. The file contains the following lines:
 - line 1: `time 2.5`
 - line 2: `W 5`
 - line 3: `N1 3`
 - line 4: `N2 2`
 - line 5: `N3 3`
 - line 6: `N4 0`
 - line 7: `REM 4`  
 
This example can be run as follows:  
```Python
# example from visbrain documentation with edf file, hypnogram file and config file
from visbrain import Sleep

dfile ='data/excerpt2.edf'
hfile = 'data/Hypnogram_excerpt2.txt'

Sleep(data=dfile, hypno=hfile).show()```

Note again that only the file `'Hypnogram_excerpt2.txt'` is specified in the program as an input to read. Visbrain internally works out that there should be a file `'Hypnogram_excerpt2_description.txt'` and reads this file as well.  

The plot of the hypnogram in Visbrain screen is shown underneath the plots of the channels and looks as:  
<img src=figs/fig_3.png>

# Reading a `hypnogram.txt` in Visbrain
See [Visbrain Sleep Documentation](http://visbrain.org/sleep)  

The filename is either entered into the `Sleep().show()` command or selected when the popup appears for hypnogram files. Note that there is no verification that the selected hypnogram file is the file that accompanies the selected data file.  

From the filename, a second filename is derived as `filename + '_description.txt'`. The content of this fie is stored in a variable called `header`. The `description` file is assumed to be in the same directory as the hypnogram file.  

From this `description` file, the following is read:
- labels
- values  

from `labels` and `values`, a dictionary is derived called `desc`. This is done as follows:
```Python
    labels = np.genfromtxt(header, dtype=str, delimiter=" ", usecols=0)
    values = np.genfromtxt(header, dtype=float, delimiter=" ", usecols=1)
    desc = {label: row for label, row in zip(labels, values)}```  
    
For the Visbrain example, the labels, values and description dictionary look like:  
- labels:  
`array(['time', 'W', 'N1', 'N2', 'N3', 'N4', 'REM'], dtype='<U4')`
- values:  
`array([ 2.5,  5. ,  3. ,  2. ,  1. ,  0. ,  4. ])`
- desc:  
`{'time': 2.5, 'W': 5.0, 'N1': 3.0, 'N2': 2.0, 'N3': 1.0, 'N4': 0.0, 'REM': 4.0}`  

The next step is to compute the sampling frequency of the hypnogram:  
```Python
    sf_hyp = 1 / float(desc['time'])```
    
In the Visbrain example, `desc['time'] = 2.5` (a scoring's period is 2.5 seconds) and therefore the sampling frequency is: `sf_hyp = 0.4`. 

The next step is that the actual hypnogram file is read. This is a text file and the statement used is:  
`hyp = np.genfromtxt(path, delimiter='\n', usecols=[0], dtype=None, skip_header=0)`  

When reading the hypnogram file from the Visbrain example, the first 10 entries look like:   
`array([b'[Hypnogram]', b'2', b'2', b'2', b'2', b'2', b'2', b'2', b'2', b'3'], dtype='|S11')`  

The next step is to check whether the type is `np.integer`. If not, the file is decode:  
```Python
    if not np.issubdtype(hyp.dtype, np.integer):
        hyp = np.char.decode(hyp)
        hypno = np.array([s for s in hyp if s.lstrip('-').isdigit()],
                         dtype=int)
    else:
        hypno = hyp.astype(int)```   
        
After the decoding, the first 10 entries look like:
`array([2, 2, 2, 2, 2, 2, 2, 2, 3, 3])`  
        
The next step is that the values read are translated into the values required for plotting the hypnogram:  
```Python
    hypno = swap_hyp_values(hypno, desc)

def swap_hyp_values(hypno, desc):  
        hypno_s = -1 * np.ones(shape=(hypno.shape), dtype=int)

    if 'Art' in desc:
        hypno_s[hypno == desc['Art']] = -1
    if 'Nde' in desc:
        hypno_s[hypno == desc['Nde']] = -1
    if 'Mt' in desc:
        hypno_s[hypno == desc['Mt']] = -1
    if 'W' in desc:
        hypno_s[hypno == desc['W']] = 0
    if 'N1' in desc:
        hypno_s[hypno == desc['N1']] = 1
    if 'N2' in desc:
        hypno_s[hypno == desc['N2']] = 2
    if 'N3' in desc:
        hypno_s[hypno == desc['N3']] = 3
    if 'N4' in desc:
        hypno_s[hypno == desc['N4']] = 3
    if 'REM' in desc:
        hypno_s[hypno == desc['REM']] = 4

    return hypno_s```  
    
In the above code, an array `hypno_s` is created with the same size as `hypno` (the array of values read from the hypnogram file) with initially all values = -1. Then the -1 values are changed to either 0,1,2,3,4 for entries in `hypno` which identify the sleep stages `W,N1,N2,N3,N4,REM`. For the sleep stages `Art,Nde,Mt`, the values stay on -1.  

Note that stages `N3` and `N4` are combined to `N3`.

After this, we have:  
```Python
    return vispy_array(hypno), sf_hyp```
    
`vispy_array(hypno)` converts the entries to floats. self  

The next step is to oversample and then downsample to make number of entries in the hypno file the same as the number of columns in the data file after downsampling. 

# Physionet `XX...XX-Hypnogram.edf`
The Physionet `xx...xx-Hypnogram.edf` files are using the edf+ format. edf+ is an extension of the edf format and makes it possible to add annotations to a data file or have the annotations in a separate file. Below an outline of the essential aspects of the edf+ format.
## The header info
The header information of an edf+ file is basically the same as the header info of a standard edf file. See for details on the header info of an edf file the notebook [Reading edf files in Visbrain Sleep](Reading edf files in Visbrain Sleep.ipynb) and also the [edf format specifications](https://www.edfplus.info/specs/index.html)). 
 

## The data records
The format of edf+ data records is as follows ([see edf+ format specifications](https://www.edfplus.info/specs/edfplus.html#additionalspecs)):
- annotations in an edf+ file are listed in Time-stamped Annotations Lists (TALs) as follows:
 - each TAL starts with a time stamp Onset`21`Duration`20` in which `21` and `20` are single bytes with value 21 and 20 respectivley, Onset and Duration are coded using US-ASCII characters with byte value 43, 45, 46, and 48-57 (the '+','-','.' and '0'-'9' characters respectively)
 - Onset must start with a '+' or a '-'character and specifies the amount of seconds by which the onset of the annotated event follows ('+') or precedes ('-') the startdate/time of the file
 - Duration must not contain any '+' or '-' and specifies the duration of the annotated event in seconds
 - after the time stamp, a list of annotations all sharing the same Onset and Duration may follow
 - each annotation is followed by a single `20` and may not contain nay `20`
 - a `0`-byte follows after the last `20` of the TAL. So the TAL ends with a `20` followed by a `0`  
 
A typical TAL from the Physionet hypnogram files looks like:  
```Text
+39480\x1560\x14Sleep stage 3\x14'```  

The information in this TAL is:
- onset: +39480  
this TAL starts 39480 seconds after the starttime as defined in the header
- onset is followed by `ASCII 21 = \x15`
- duration: `60`  
60 seconds for the sleepstage which follows
- the duration is followed by `ASCII 20 = \x14`
- the annotation `Sleep stage 3`
- the annotation is followed by `ASCII 20 = \x14`

# Approach to handle Physionet hypnogram files in Visbrain
- Visbrain can only handle hypnogram files which have a constant scoring period. This is not a problem as the Physionet hypnogram files are all scored with a constant period of 30 seconds 
- Visbrain has a few options for reading hypnogram files. One of them is that the hypnogram file format is a text file where for consecutive scoring periods a value is given which indicates the sleep stage for the period. This text file is read together with a description file which contains the scoring period in seconds and also the values used for the various sleep stages (see above) 
- the approach to read the Physionet `xxx-HYPNOGRAM.edf` files is to develop a new function to read these files and convert them to an array similar as the one which is the result of reading a Visbrain hypnogram file in text format.

# Reading a hypnogram.edf in Visbrain
This section describes the changes in Visbrain required to read a `hypnogram.edf` which uses the EDF+ format into to Visbrain. These type of files are available in the Physionet Sleep database.  

A new function called `read_hypno_edf` will be added to the program `rw_hypno.py`. A few minor changes are required in the program `read_sleep.py` in order to make this new function work in Visbrain.

The following is assumed: 
- the scoring period for the hypnogram is given by the duration of a data record in seconds. This is available in the header information of the accompanying polysomnograph file
- the scores for the various sleep stages is as used in the Physionet database:
 - Sleep stage ?
 - Movement time
 - Sleep stage W
 - REM
 - Sleep stage 1
 - Sleep stage 2
 - Sleep stage 3
 - Sleep stage 4
 

## Change in read_sleep.py
To read and process the Physionet hypnogram files which are formatted in EDF+, a new function has been added to the Visbrain `io/rw_hypno.py` program. This function will return an hypnogram array in the same format as when using the Visbrain .txt option for reading hypnogram files.  

In order to provided the option to read EDF_ hypnogram files, a few minor modifications are made in the Visbrain `io/read_sleep.py` program. These modifications are:
- in the section for importing modules, line 18 has been changed from:
```Python
from .rw_hypno import (read_hypno, oversample_hypno)```
to:
```Python
from .rw_hypno import (read_hypno, oversample_hypno, read_hypno_edf)```
- Dialog window for hypnogram:  
 - lines 113 - 115 have been modified from:
```Python
    hypno = dialog_load(self, "Open hypnogram", upath,
                        "Elan (*.hyp);;Text file (*.txt);;"
                        "CSV file (*.csv);;All files (*.*)")```
to:
```Python
    hypno = dialog_load(self, "Open hypnogram", upath,
                        "Elan (*.hyp);;Text file (*.txt);;"
                        "CSV file (*.csv);;EDF+ file(*.edf);;All files (*.*)")```
As the header info of the data file will be required, we need to have a minor change in `read_sleep.py` starting line 123:
```Python
       if isinstance(hypno, str):  # (*.hyp / *.txt / *.csv)
            hypno, _ = read_hypno(hypno)```  
is changed to:
```Python
       if isinstance(hypno, str):  # (*.hyp / *.txt / *.csv / *.edf)
            if hypno[-3:] == 'edf':
                hypno, _ = read_hypno_edf(hypno, file)     # file is path to data file
            else:
                hypno, _ = read_hypno(hypno)```  
 - to access the function to read EDF+ hypnogram files, the lines 123-124 in `io/read_sleep.py` have been modified. As the new function checks whether the hypnogram file is the appropriate accompanying file to the polysomnograph file by comparing the `subject_id` as well as the `recording_id`, the path to the polysomnograph file is required. The variable `file` contains the path of this file and will be used as an argument when calling the new function. Lines 123-124 have been modified from:
 ```Python 
         if isinstance(hypno, str):  # (*.hyp / *.txt / *.csv)
            hypno, _ = read_hypno(hypno)```
to:
```Python
          if isinstance(hypno, str):  # (*.hyp / *.txt / *.csv / *.edf)
              if hypn[-3:] == 'edf':
                  hypno, _ = read_hypno_edf(hypno, file)   # file is path to polysomnograph file
              else:
                  hypno, _ = read_hypno(hypno)```


## Changes in rw_hypno.py
The function `read_hypno_edf` is added to the program `rw_hypno.py`. This functions reads EDF+ hypnogram files assuming that these files are formatted as done by Physionet. The functions returns a hypnogram file which can be processed by Visbrain.  

The function is:
```Python
def read_hypno_edf(hypno,file):
    """This function is developed to read hypnogram files which are formatted according 
       to EDF+ specifications (see https://www.edfplus.info/specs/index.html).
       The function was specifically developed to read and plot hypnograms which are
       part of the Physionet Sleep Database (see https://physionet.org/pn4/sleep-edfx/). 
       The Physionet Sleep Database contains 61 polysomnograms (PSGs) with accompanying hypnograms.
       The Physionet polysomnogram files are using edf format, while the hypnograms use
       EDF+. When using EDF+ for hypnograms, the data records contain Timestamped Annotation
       Lists (TALs). Each TAL consists of an onset, a duration, and a sleep stage.
       The following assumptions have been made:
       - the scoring period for the hypnogram is equal to the number of seconds per data record as 
         given by header information of the accompanying polysomnogram file
       - all EDF+ hypnogram files use the same scoring for sleep stages which are in the function
         converted to the values used by Visbrain:
           EDF+ score                value used by Visbrain
         Sleep stage ?                      -1
         Movement time                      -1
         Sleep stage W                       0
         Sleep stage 1                       1
         Sleep stage 2                       2
         Sleep stage 3                       3
         Sleep stage 4                       3
         Sleep stage R                       4
         The Physionet hypnogram files (EDF+) sometimes cover a slightly longer total period than the polysomnograph files, where the score for the last part is set to "Sleep stage ?". This function
         will only read the hypnogram files so that only the time period of the polysomnograph is covered.
         The function will check whether the hypnogram file is the appropriate accompanying file to the            polysomnograph file by comparing the `subject_id` as well as the `recording_id`
    """

    with open(file + '.edf', 'rb') as f:                                 # open edf data file
        hdr1 = {}
        assert f.tell() == 0
        assert f.read(8) == b'0       '

        # recording info
        hdr1['subject_id'] = f.read(80).decode('utf-8').strip()   # read patient info
        hdr1['recording_id'] = f.read(80).decode('utf-8').strip() # read recording date and time

        f.seek(68,1)
        hdr1['n_records'] = int(f.read(8))
        hdr1['record_length'] = float(f.read(8))  # in seconds 
        end_file = str(int(hdr1['n_records']*hdr1['record_length']) )

    with open(hypno, 'rb') as f:                                 # open edf hypnogram file
        hdr2 = {}
        assert f.tell() == 0
        assert f.read(8) == b'0       '

        # recording info
        hdr2['subject_id'] = f.read(80).decode('utf-8').strip()   # read patient info
        hdr2['recording_id'] = f.read(80).decode('utf-8').strip() # read recording date and time

        # compare the patient info and recording date of the two files
        try:
            (hdr1['subject_id'] == hdr2['subject_id']) and \
             (hdr1['recording_id'] == hdr2['recording_id'] )      
        except: 
            ValueError ('Data File does not match Hypnogram File')

        f.seek(16,1)                                              # skip records not required
        hdr2['header_n_bytes'] = int(f.read(8))                   # read bytes in header of hypnogram file    

        f.seek(hdr2['header_n_bytes'])                            # go to the end of the header info```

        data_hypno = f.read().decode('utf-8')                     # read the data
        
    time = hdr1['record_length']                    # number of secs per data record used for score period
    data_hypno_spl = data_hypno.split('\x00')
    ln = len(data_hypno_spl)
    tr = {'Sleep stage ?':-1,'Movement time':-1,'Sleep stage W':0,'Sleep stage 1':1,'Sleep stage 2':2,
      'Sleep stage 3':3,'Sleep stage 4':3,'Sleep stage R':4}
    hypno_s = []
    for i in range(ln):
        in_start = data_hypno_spl[i].find('\x15')
        if in_start == -1:
            continue
        else:
            in_stop = data_hypno_spl[i].find('\x14')
            onset = data_hypno_spl[i][1:in_start]
            duration = data_hypno_spl[i][in_start+1:in_stop]
            
            if int(onset)+int(duration) >= int(end_file):
                duration =  str(int(end_file) - int(onset))

            sleepstage = data_hypno_spl[i][in_stop+1:-1]

            nr = int(int(duration)/30)
            entry = [tr[sleepstage]]*nr
            hypno_s.extend(entry)
            
    hypno_s = np.array(hypno_s)
    sf_hyp = 1 / time
    return hypno_s, sf_hyp```
            
    