In [1]:
from IPython.display import display, HTML, Image, clear_output
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt

<center><span style="color:blue;font-family:helvetica; font-size:3.5rem; font-weight:700;">Reading edf files in Visbrain with MNE</span></center>


# Objective
The objective of this notebook is to give an outline of how edf files are read in Visbrain using the MNE software. The main reason to use the MNE software is that MNE can handle data with channels which have multiple sampling frequencies. 

# Issues with MNE
Currently I have found one issue which affects the interface between MNE and Visbrain. This is caused by the fact that when the values of channels are measured in $\mu V$, MNE  converts the values to $V$, while Visbrain uses the actual values. This issue is addressed in the Visbrain software.  

Also note that MNE does not have capabilities to read hypnogram files


# Accessing MNE from Visbrain
Using the MNE software to read the data file is triggered by setting the parameter `use_mne=True` in the command `Sleep(data, use_mne=True).show()`.  

# Test program
To understand how Visbrain reads edf files, a program is debugged which reads an edf file and then plots the results. This program is:  
```Python
# Read edf file from library
from visbrain import Sleep

Sleep("/users/kees/sleepdynamics/sleep-edfx/SC4001E0-PSG.edf", use_mne=True).show()```  

The `SC4001E0-PSG.edf` file is the first polysomnography file in the Physionet sleep database.  

`SC4001E0-PSG.edf` is the first file in the Physionet database. The plots generated by the Visbrain software can be visually compared with the plots from [PhysioBank ATM](https://physionet.org/cgi-bin/atm/ATM).  

The program is initially run by pressing "Cancel" when the popup for a hypnogram file is shown.  


# `Sleep(data=dfile, use_mne=True).show()`
`sleep(data = dfile, use_mne=True).show()` is the second line in the test program.  

This notebook deals with how the data is read using the MNE software

## Outline of program `sleep.py`
The class`Sleep` which is called in the test program is a class defined in `sleep.py`. The parts in this class relevant to reading the data are:
```Python
"""Top Level Sleep class"""
# import modules
...
class Sleep(PyQtModule, ReadSleepData, UiInit, Visuals, UiElements,
            MouseEventControl):
    def __init__(self, data=None, hypno=None, config_file=None,
                 annotations=None, channels=None, sf=None, downsample=100.,
                 axis=True, href=['art', 'wake', 'rem', 'n1', 'n2', 'n3'],
                 preload=True, use_mne=False, kwargs_mne={}, verbose=None):
...
      ReadSleepData.__init__(self, data, channels, sf, hypno, href, preload,
                               use_mne, downsample, kwargs_mne,
                               annotations)```

### `ReadSleepData`
`ReadSleepData` is a class defined in `read_sleep.py`
```Python
class ReadSleepData(object):
    """Main class for reading sleep data."""

    def __init__(self, data, channels, sf, hypno, href, preload, use_mne,
                 downsample, kwargs_mne, annotations):
        # dialog window if data is none
        ...
        if use_mne:
            ...
            args = mne_switch(file, ext, downsample, **kwargs_mne)
        else:
            ...
            args = sleep_switch(file, ext, downsample)
        ```

In this situation, the `men_switch` function is accessed.

## `mne_switch`

`mne_switch` is a function defined in the `mneio.py` program in Visbrain. `mneo.py` is part of the `io` directory of Visbrain. The relevant statements in `mne_switch` are:
```Python
def mne_switch(file, ext, downsample, preload=True, **kwargs):
    """Read sleep datasets using mne.io.
    from mne import io

    # Get full path :
    path = file + ext

    # Preload :
    if preload is False:
        preload = 'temp.dat'
    kwargs['preload'] = preload

    if ext.lower() in ['.edf', '.bdf', '.gdf']:  # EDF / BDF / GDF
        raw = io.read_raw_edf(path, **kwargs)```

The parameters when accessing `io` in MNE` are:
- `path = '/users/kees/sleepdynamics/sleep-edfx/SC4001E0-PSG.edf'`
- `kwargs = {'preload': True}`  


`raw` is what is coming back from MNE. The important attributes are:
- `raw.info`: the instant of `mne.io.meas_info.Info` which contains the header data
- `raw_data`: the raw data
- `raw.raw_extras`: additional header data

# Architecture for reading edf files in MNE
With the statement:
```Python
    from mne import io```
    
a number of modules are imported. The important line here is:
```Python
from .edf import read_raw_edf, find_edf_events```

This imports the `edf.py` program from the directory `mne/io/edf`. The `edf.py` program contains one class and a number of functions:
- classes:
 - `RawEDF`
- relevant functions:
 - `read_raw_edf` 


# Steps in MNE
## `read_raw_edf`
The function `read_raw_edf` in `edf.py` does the following:
```Python
def read_raw_edf(input_fname, montage=None, eog=None, misc=None,
                 stim_channel='auto', annot=None, annotmap=None, exclude=(),
                 preload=False, verbose=None):
    """Reader function for EDF+, BDF, GDF conversion to FIF"""
   return RawEDF(input_fname=input_fname, montage=montage, eog=eog, misc=misc,
                  stim_channel=stim_channel, annot=annot, annotmap=annotmap,
                  exclude=exclude, preload=preload, verbose=verbose)```
                  

The parameters are:
- `input_frame = '/users/kees/sleepdynamics/sleep-edfx/SC4001E0-PSG.edf'`

This step returns the instance of `RawEDF` with as most important attributes:
- `info` header information
- `_data`: the raw data
- `_raw_extras`

## `RawEDF`
`RawEDF` is a class in `edf.py` and has as parent `BaseRaw` from `base.py`.  

The relevant statements are:
```Python
class RawEDF(BaseRaw):
    """Raw object from EDF, EDF+, BDF file"""
        def __init__(self, input_fname, montage, eog=None, misc=None,
                 stim_channel=True, annot=None, annotmap=None, exclude=(),
                 preload=False, verbose=None):
        
        input_fname = os.path.abspath(input_fname)
        info, edf_info = _get_info(input_fname, stim_channel, annot,
                                   annotmap, eog, misc, exclude, preload)
        
        last_samps = [edf_info['nsamples'] - 1]
        super(RawEDF, self).__init__(
            info, preload, filenames=[input_fname], raw_extras=[edf_info],
            last_samps=last_samps, orig_format='int', verbose=verbose)```
            
This creates an instance of `RawEDF` with many attributes. The most important are:
- `self.info`
- `self._data`
- `self._raw_extras`  

The two important statements here are:
- ```Python
info, edf_info = _get_info(input_fname, stim_channel, annot,
                                   annotmap, eog, misc, exclude, preload)```
- ```Python
super(RawEDF, self).__init__(
            info, preload, filenames=[input_fname], raw_extras=[edf_info],
            last_samps=last_samps, orig_format='int', verbose=verbose)```
                                   



## `get_info`
`get_info` is a function in `edf.py` used to get the header information.  

The relevant first part of this function is:
```Python
def _get_info(fname, stim_channel, annot, annotmap, eog, misc, exclude,
              preload):
    """Extract all the information from the EDF+, BDF or GDF file."""
    if eog is None:
        eog = []
    if misc is None:
        misc = []

    # Read header from file
    ext = os.path.splitext(fname)[1][1:].lower()
    logger.info('%s file detected' % ext.upper())
    if ext in ('bdf', 'edf'):
        edf_info = _read_edf_header(fname, annot, annotmap, exclude)```
        
 `_read_edf_header` is another function in `edf.py`.  
 
 The header info is read according to the edf format:  
 `edf_info` is declared as a dictionary and updated with `annot, annotmap, events=[]`.  

The header data is read:
- `patient` is declared as a dictionary and the following is entered:
 - `patient['id']` here: `X`
 - `patient['name']` here: `F`
- `meas_id` is declared as a dictionary and the following is entered:
 - `meas_id[recording_id']` here `Startdate 24-APR-1989 X X X`
- some date and time manipulations are basically the same as in Visbrian:  
the only exception is: `century = 2000 if year < 50 else 1900` which leads to a better computation of the century
- `date` here `datetime.datetime(1989, 4, 24, 16, 13)`
- `n_records` here 
- `header_nbytes` here `2048`
- `n_records` here `2650`
- `record_length=np.array([float(fid.read(8)), 1.])` here `array([30.,  1.])`
- `nchan` here `7`
- `channels = list(range(nchan))` here `[0, 1, 2, 3, 4, 5, 6]` used in next statement
- `ch_names = [fid.read(16).strip().decode() for ch in channels]` here  
`['EEG Fpz-Cz', 'EEG Pz-Oz', 'EOG horizontal', 'Resp oro-nasal', 'EMG submental', 'Temp rectal', 'Event marker']`
- `exclude` gives names of channels to exclude if there are in `exclude`
- transducers are read to exclude?
- `units` are read for all channels and entered into `edf_info{'units]`:  
```Python
        units = [fid.read(8).strip().decode() for ch in channels]
        edf_info['units'] = list()
        include = list()
        for i, unit in enumerate(units):
            if i in exclude:
                continue
            if unit == 'uV':
                edf_info['units'].append(1e-6)
            else:
                edf_info['units'].append(1)
            include.append(i)```  
here `[1e-06, 1e-06, 1e-06, 1, 1e-06, 1, 1]`
- for all channels:
 - `physical_min` here `array([ -192.,  -197., -1009., -2048.,    -5.,    34., -2047.])`
 - `physical_max` here `array([ 192.,  196., 1009., 2047.,    5.,   40., 2048.])`
 - `digital_min` here `array([-2048., -2048., -2048., -2048., -2500., -2849., -2047.])`
 - `digital_max` here `array([2047., 2047., 2047., 2047., 2500., 2731., 2048.])`
 - `prefiltering` with exception of last channel (event marker) here  
 `['HP:0.5Hz LP:100Hz [enhanced cassette BW]', 'HP:0.5Hz LP:100Hz [enhanced cassette BW]', 'HP:0.5Hz LP:100Hz [enhanced cassette BW]', 'HP:0.03Hz LP:0.9Hz', 'HP:16Hz Rectification LP:0.7Hz', '']`
 - `highpass`  
 ```Python
         highpass = np.ravel([re.findall(r'HP:\s+(\w+)', filt)
                             for filt in prefiltering])```
  here: `array([], dtype=float64)`
 - `lowpass` similar to but search on `LP` instead of `HP`
- populate `edf_info` with `edf_info.update` for:
 - 'chnames = chnames`
 - 'data_offset = header_nbytes`
 - `digital_max=digital_max`
 - `digital_min=digital_min`
 - `exclude=exclude`
 - `highpass=highpass`
 - `include=include`
 - `lowpass=lowpass`
 - `meas_date=calendar.timegm(date.utctimetuple())`
 - `n_records=n_records`
 - `n_samps=n_samps` 
 - `nchan=nchan`
 - `subject_info=patient` 
 - `physical_max=physical_max`
 - `physical_min=physical_min`
 - `record_length=record_length`
 - `subtype=subtype`  
 
For the EEG file `SC4001E0-PSG.edf`, the values of `edf_info` are:
```Text
annot: None
annotmap: None
events: []
units: [1e-06, 1e-06, 1e-06, 1, 1e-06, 1, 1]
ch_names: ['EEG Fpz-Cz', 'EEG Pz-Oz', 'EOG horizontal', 'Resp oro-nasal', 'EMG submental', 'Temp rectal', 'Event marker']
data_offset: 2048
digital_max: [2047. 2047. 2047. 2047. 2500. 2731. 2048.]
digital_min: [-2048. -2048. -2048. -2048. -2500. -2849. -2047.]
exclude: []
highpass: []
include: [0, 1, 2, 3, 4, 5, 6]
lowpass: []
meas_date: 609437580
n_records: 2650
n_samps: [3000 3000 3000   30   30   30   30]
nchan: 7
subject_info: {'id': 'X', 'name': 'F'}
physical_max: [ 192.  196. 1009. 2047.    5.   40. 2048.]
physical_min: [ -192.  -197. -1009. -2048.    -5.    34. -2047.]
record_length: [30.  1.]
subtype: edf```

All information read into `edf_info` is directly taken from the header information of the edf file, with the exception of `units`. `units` in the header info is for the example:  
`units: ['uV', 'uV', 'uV', '', 'uV', 'DegC', '']`  

While in `edf_info` the `units` are:  
`units: [1e-06, 1e-06, 1e-06, 1, 1e-06, 1, 1]`


After the data for `ed_info` is read, the data is massaged and `info` is created. `info` is basically the header data which is given back to Visbrain. At this stage, I think the following points are relevant:
- for each channel in numpy array:
 - `physical_ranges = edf_info['physical_max'] - edf_info['physical_min']`   
 here `array([ 384.,  393., 2018., 4095.,   10.,    6., 4095.])`
 - `cals = edf_info['digital_max'] - edf_info['digital_min']`  
 here `array([4095., 4095., 4095., 4095., 5000., 5580., 4095.])`
- a list of dictionaries of eeg channels is created:
```Python
    chs = list()
    pick_mask = np.ones(len(ch_names))
    for idx, ch_info in enumerate(zip(ch_names, physical_ranges, cals)):
        ch_name, physical_range, cal = ch_info
        chan_info = {}
        chan_info['cal'] = cal
        chan_info['logno'] = idx + 1
        chan_info['scanno'] = idx + 1
        chan_info['range'] = physical_range
        chan_info['unit_mul'] = 0.
        chan_info['ch_name'] = ch_name
        chan_info['unit'] = FIFF.FIFF_UNIT_V
        chan_info['coord_frame'] = FIFF.FIFFV_COORD_HEAD
        chan_info['coil_type'] = FIFF.FIFFV_COIL_EEG
        chan_info['kind'] = FIFF.FIFFV_EEG_CH
        chan_info['loc'] = np.zeros(12)
        if ch_name in eog or idx in eog or idx - nchan in eog:
            chan_info['coil_type'] = FIFF.FIFFV_COIL_NONE
            chan_info['kind'] = FIFF.FIFFV_EOG_CH
            pick_mask[idx] = False
        if ch_name in misc or idx in misc or idx - nchan in misc:
            chan_info['coil_type'] = FIFF.FIFFV_COIL_NONE
            chan_info['kind'] = FIFF.FIFFV_MISC_CH
            pick_mask[idx] = False
        check1 = stim_channel == ch_name
        check2 = stim_channel == idx
        check3 = nchan > 1
        stim_check = np.logical_and(np.logical_or(check1, check2), check3)
        if stim_check:
            chan_info['coil_type'] = FIFF.FIFFV_COIL_NONE
            chan_info['unit'] = FIFF.FIFF_UNIT_NONE
            chan_info['kind'] = FIFF.FIFFV_STIM_CH
            pick_mask[idx] = False
            chan_info['ch_name'] = 'STI 014'
            ch_names[idx] = chan_info['ch_name']
            edf_info['units'][idx] = 1
            if isinstance(stim_channel, str):
                stim_channel = idx
        if tal_channel is not None and idx in tal_channel:
            chan_info['range'] = 1
            chan_info['cal'] = 1
            chan_info['coil_type'] = FIFF.FIFFV_COIL_NONE
            chan_info['unit'] = FIFF.FIFF_UNIT_NONE
            chan_info['kind'] = FIFF.FIFFV_MISC_CH
            pick_mask[idx] = False
        chs.append(chan_info)```
- the first dictionary in the list `chs` has the following values:
```Text
cal: 4095.0
logno: 1
scanno: 1
range: 384.0
unit_mul: 0.0
ch_name: EEG Fpz-Cz
unit: 107
coord_frame: 4
coil_type: 1
kind: 2
loc: [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]```
- the other entries in the list `chs`  differ in:
 - `cal` which is the `cals` computed above
 - `logno` and `scanno` which is a sequential numbering
 - `range` is `physcial_range` computed above
 - `ch_name`
 
    


The next steps are:
- update of `info_edf` for `max_samp`:
```Python    
        if any(pick_mask):
        picks = [item for item, mask in zip(range(nchan), pick_mask) if mask]
        edf_info['max_samp'] = max_samp = n_samps[picks].max()```  
all channels are in `pick_mask`. Here `edf_info['max_samp']=3000`
- `sfreq` is computed as  
```Python
    data_samps = n_samps
    sfreq = data_samps.max() * edf_info['record_length'][1] / edf_info['record_length'][0]```
here `edf_info['record_length'] = [30,1]` and `data_samps.max()=3000`. --> `sfreq= 100`


The next thing is that `info` is created as a dictionary through:
```Python
    info = _empty_info(sfreq)```  
    
- in `_empty_info`, a class in `meas_info.py`, a number of `_none_keys` and a number of `_list_keys` are defined as tuples
- `info` is defined as an instance of `Info()` so it inherits all the methods of `Info`. `Info` is a class with parent `dict` and is also in `meas_info.py`. So `info` is basically a directory 
- the `-non_keys` and `'list_keys` are now set up in `'info` and some values are set:  
```Python
def _empty_info(sfreq):
    """Create an empty info dictionary."""
    from ..transforms import Transform
    _none_keys = (
        'acq_pars', 'acq_stim', 'buffer_size_sec', 'ctf_head_t', 'description',
        'dev_ctf_t', 'dig', 'experimenter',
        'file_id', 'highpass', 'hpi_subsystem', 'kit_system_id',
        'line_freq', 'lowpass', 'meas_date', 'meas_id', 'proj_id', 'proj_name',
        'subject_info', 'xplotter_layout',
    )
    _list_keys = ('bads', 'chs', 'comps', 'events', 'hpi_meas', 'hpi_results',
                  'projs', 'proc_history')
    info = Info()
    for k in _none_keys:
        info[k] = None
    for k in _list_keys:
        info[k] = list()
    info['custom_ref_applied'] = False
    info['dev_head_t'] = Transform('meg', 'head')
    info['highpass'] = 0.
    info['sfreq'] = float(sfreq)
    info['lowpass'] = info['sfreq'] / 2.
    info._update_redundant()
    info._check_consistency()
    return info```
- here:
 - `info['dev_head_t']`: a $4\times 4$ array
 - `info['highpass'] = 0`
 - `info['sfreq'] = float(sfreq)=100.0`; transferred when creating `_empty_invo(sfreq)`
 - `info['lowpass'] = info['sfreq'] / 2 = 50.`
- after `info` is returned with keys and some values, other values are updated:
```Text
info
<Info | 17 non-empty fields
    bads : list | 0 items
    buffer_size_sec : float | 1.0
    ch_names : list | EEG Fpz-Cz, EEG Pz-Oz, EOG horizontal, Resp oro-nasal, EMG submental, Temp rectal, Event marker
    chs : list | 7 items (EEG: 7)  
      this list contains for each channel a dictionary with info for that channel. 
      The most important ones are:
      - cal first channel: 4095
      - range first channel: 384
    comps : list | 0 items
    custom_ref_applied : bool | False
    dev_head_t : Transform | 3 items
    events : list | 0 items
    highpass : float | 0.0 Hz
    hpi_meas : list | 0 items
    hpi_results : list | 0 items
    lowpass : float | 50.0 Hz
    meas_date : int | 609437580
    nchan : int | 7
    proc_history : list | 0 items
    projs : list | 0 items
    sfreq : float | 100.0 Hz
    acq_pars : NoneType
    acq_stim : NoneType
    ctf_head_t : NoneType
    description : NoneType
    dev_ctf_t : NoneType
    dig : NoneType
    experimenter : NoneType
    file_id : NoneType
    hpi_subsystem : NoneType
    kit_system_id : NoneType
    line_freq : NoneType
    meas_id : NoneType
    proj_id : NoneType
    proj_name : NoneType
    subject_info : NoneType
    xplotter_layout : NoneType```


#### Reading the data
The next steps are (line 173 in `__init__` in `RawEDF`:
```Python
        last_samps = [edf_info['nsamples'] - 1]   # here: 79449999
        super(RawEDF, self).__init__(
            info, preload, filenames=[input_fname], raw_extras=[edf_info]```
This sits in `BaseRaw`:
```Python
        self._last_samps = np.array(last_samps)  
        self._first_samps = np.array(first_samps)
        info._check_consistency()  # make sure subclass did a good job
        self.info = info
        cals = np.empty(info['nchan'])
        for k in range(info['nchan']):
            cals[k] = info['chs'][k]['range'] * info['chs'][k]['cal']  # how cals is computed see below
        bad = np.where(cals == 0)[0]
        if len(bad) > 0:
            raise ValueError('Bad cals for channels %s'
                             % dict((ii, self.ch_names[ii]) for ii in bad))
        self.verbose = verbose
        self._cals = cals
        self._raw_extras = list(raw_extras)
        # deal with compensation (only relevant for CTF data, either CTF
        # reader or MNE-C converted CTF->FIF files)
        self._read_comp_grade = self.compensation_grade  # read property
        if self._read_comp_grade is not None:
            logger.info('Current compensation grade : %d'
                        % self._read_comp_grade)
        self._comp = None
        self._filenames = list(filenames)
        self.orig_format = orig_format
        self._projectors = list()
        self._projector = None
        self._dtype_ = dtype
        self.annotations = None
        # If we have True or a string, actually do the preloading
        self._update_times()
        if load_from_disk:
            self._preload_data(preload)```  
            
Here the part from above where `cals` is computed:
```Python
        cals = np.empty(info['nchan'])
        for k in range(info['nchan']):
            cals[k] = info['chs'][k]['range'] * info['chs'][k]['cal']```
here: 
- `info['nchan']=7`
- `info['chs'][k]['range']` is the physical range = physical max - physical min; 1st channel: 384
- `info['chs'][k]['cal']` is cal = digital max - digital min; 1st channel: 4095
- `cals = range*cal` first channel: 1572480

now `self._preload_data(preload)`

- what I think is relevant here is:
 - `self._last_samp`: 7949999
 - `self._first_samps`: 0
 - `self.info`
 - `self._cals`: `array([ 1572480.,  1609335.,  8263710., 16769025.,    50000.,    33480., 16769025.])`
 - `self._raw_extras` is `info_edf`
 - `self._preload_data(preload)`

slice(None, None, None)`_preload_data` is a method of `BaseRaw` in `base.py`. The important statement is:
- `self._data = self._read_segment(data_buffer=data_buffer)`  
this is a method in `BaseRaw`  
some values for the statements 460 - 525
- `start = 0`
- `stop = 7950000`
- `n_sel_channels = 7`
- `data_shape = (7, 7950000)`
- `dtype = class 'numpy.float64`
- `data = np.zeros(data_shape, dtype)`
- `cumul_lens = np.concatenate(([0], np.array(self._raw_lengths, dtype='int')))` `array([   0,7950000])`
- `cals = self._cals.ravel()[np.newaxis, :]`  
here: `array([[ 1572480.,  1609335.,  8263710., 16769025.,    50000.,    33480., 16769025.]])`
- `cals = cals.T[idx]`  
- `start_file = 0`
- `stop_file = 795000`
- `'n_read = 795000`   

Now at 508 in `_read_segment` of `BaseRaw`: read from necessary file
```Python
        offset = 0
        for fi in np.nonzero(files_used)[0]:
            start_file = self._first_samps[fi]
            # first iteration (only) could start in the middle somewhere
            if offset == 0:
                start_file += start - cumul_lens[fi]
            stop_file = np.min([stop - cumul_lens[fi] + self._first_samps[fi],
                                self._last_samps[fi] + 1])
            if start_file < self._first_samps[fi] or stop_file < start_file:
                raise ValueError('Bad array indexing, could be a bug')
            n_read = stop_file - start_file
            this_sl = slice(offset, offset + n_read)
            self._read_segment_file(data[:, this_sl], idx, fi,
                                    int(start_file), int(stop_file),
                                    cals, mult)
            offset += n_read
        return data```  
        

 `_read_segment_file` is a method of `RawEDF`:  
 ```Python
     def _read_segment_file(self, data, idx, fi, start, stop, cals, mult):
        """Read a chunk of raw data."""```
        
the parameters are:
- `data` an empty $7\times 7950000$ array
- `idx` `slice(None, None, None)`
- `fi` = 0
- `start` = 0
- `stop` = 7950000
- `cals` `array([ 1572480.,  1609335.,  8263710., 16769025.,    50000.,    33480., 16769025.])` but transposed
- `mult` = None

first statement: 
```Python
        from scipy.interpolate import interp1d```  
        
additional parameters are determined from `self`. The most important:
- `sel = np.arange(self.info['nchan'])[idx]` here `array([0, 1, 2, 3, 4, 5, 6])`
- `n_samps = self._raw_extras[fi]['n_samps']` here `array([3000, 3000, 3000,   30,   30,   30,   30])`
- `buf_len = int(self._raw_extras[fi]['max_samp'])` here = 3000
- `sfreq = self.info['sfreq']` here = 100
- `dtype = self._raw_extras[fi]['dtype_np']` here `<class 'numpy.int16'>`
- `dtype_byte = self._raw_extras[fi]['dtype_byte']` here = 2
- `data_offset = self._raw_extras[fi]['data_offset']` here = 2048  

then gain constructor:  
```Python
        physical_range = np.array([ch['range'] for ch in self.info['chs']])
        cal = np.array([ch['cal'] for ch in self.info['chs']])
        cal = np.atleast_2d(physical_range / cal)  # physical / digital
        gains = np.atleast_2d(self._raw_extras[fi]['units'])```

- `pysical_range` here `array([ 384.,  393., 2018., 4095.,   10.,    6., 4095.])`
- `cal` here before the second statement `array([ 384.,  393., 2018., 4095.,   10.,    6., 4095.])`
- `cal` after the second statement `array([[0.09377289, 0.0959707 , 0.49279609, 1.        , 0.002     ,
        0.00107527, 1.        ]])`
- `gains` here `array([[1.e-06, 1.e-06, 1.e-06, 1.e+00, 1.e-06, 1.e+00, 1.e+00]])`  

then physical dimension in $\mu V$:
```Python
        physical_min = self._raw_extras[fi]['physical_min']
        digital_min = self._raw_extras[fi]['digital_min']

        offsets = np.atleast_2d(physical_min - (digital_min * cal)).T```

- `physical_min` here `array([ -192.,  -197., -1009., -2048.,    -5.,    34., -2047.])`
- `digital_min` here `array([-2048., -2048., -2048., -2048., -2500., -2849., -2047.])`
- `offsets` here `array([[ 0.04688645, -0.45201465,  0.24639805,  0.,  0., 37.06344086,  0.]])` shown transposed
        






Rough from 27 in `RawEDF` of `edf.py`:
- they say they could read one EDF block at a time, but to speed it up, they need to read multiple blocks at once.
- some manipulation about starts, offsets, etc
- then at 253:
```Python
                # Read and reshape to (n_chunks_read, ch0_ch1_ch2_ch3...)
                many_chunk = _read_ch(fid, subtype, ch_offsets[-1] * n_read,
                                      dtype_byte, dtype).reshape(n_read, -1)```

- `_read_ch` is defined on line 341 of `edf.py`
- `def _read_ch(fid, subtype, samp, dtype_byte, dtype=None):
    """Read a number of samples for a single channel."""`
- the parameters are:
 - `fid` here  
 `<_io.FileIO name='/users/kees/sleepdynamics/sleep-edfx/SC4001E0-PSG.edf' mode='rb' closefd=True>`
 - `subtype`: here `edf`
 - `samp` here = 5234880
 - `dtype_byte` here = 2
 - `dtype` here = `<class 'numpy.int16'>`
 - then:  
 `ch_data = np.fromfile(fid, dtype=dtype, count=samp)`

- Here the data is read in one chunck (how about the header data? skipped?
- after the read the data is split into channels and we are ending up with an array `data` which has dimensions $7\times 7950000$
- then we have:  
```Python
        data *= cal.T[sel]  # scale
        data += offsets[sel]  # offset
        data *= gains.T[sel]  # apply units gain last```
        


Back to `visbrain mne_switch` with `raw`. The data is in `raw._data` 

Here we have:  
```Python
    sf = raw.info['sfreq']
    dsf, downsample = get_dsf(downsample, sf)```
    
This might be the point where things are going haywire. 

What is returned is:
- `dsf` = 1
- `downsample` = 100.0  

What is returned from `mne_switch` is:
- `sf` =100
- `downsample` = 100
- `dsf` = 1
- `data[:,::dsf]`   this is a $7\times 7950000$ array
- 'channels`
- `n` = 795000
- `start_time`
- `anot`

It goes back to `ReadSleepData` where we get the following I/O:
```Text
[1m[1;37mINFO[0m | File successfully loaded (/users/kees/sleepdynamics/sleep-edfx/SC4001E0-PSG.edf):
- Sampling-frequency : 100.00Hz
- Number of time points (before down-sampling): 7950000
- Down-sampling frequency : 100.00Hz
- Number of time points (after down-sampling): 7950000
- Number of channels : 7```


## Conversion from digital values to physical values
`cals` is initially computed as the digital range. It is then entered into the header info as `cal` and the physical range is entered as `range` in the dictionary of header info per channel:
```Python
        chan_info['cal'] = cal
        chan_info['range'] = physical_range```
        
Then `cals` is computed for each channel as the product of physical range and digital range and added as an attribute of `BaseRaw`:
```Python
        cals = np.empty(info['nchan'])
        for k in range(info['nchan']):
            cals[k] = info['chs'][k]['range'] * info['chs'][k]['cal']
        self._cals = cals```
        
In `_read_segment_file`, `cal` is computed as the ratio between the physical range and the digital range. So `cal` has now the same values as `gain` in the previous section. Then `gains` is set to the units of measurement taken from the header info:  
`array([[1.e-06, 1.e-06, 1.e-06, 1.e+00, 1.e-06, 1.e+00, 1.e+00]])`  

Then `offsets` are calculated as `offsets = phys min - digital min*cal`.  

After the data has been read, we get the conversion back to physical values as done in the previous section, but also the application of the units (`gains`):  
```Python
        data *= cal.T[sel]  # scale
        data += offsets[sel]  # offset
        data *= gains.T[sel]  # apply units gain last```
        
The first two statements are required for the conversion to physical. The last statement results to values in base units, e.g. $\mu V$ for EEG channels.  

## Handing back data to Visbrain
The last step is to hand back the data to Visbrain as `raw`. The two important attributes here of `raw` are:
- `raw.info`: the header data
- `raw._data`: the sample values (in base units)  

# Resampling in MNE
Resampling in MNE is not used when Visbrain uses MNE to read data files. It is here just to show how resampling is done in MNE.
```Python
def resample(x, up=1., down=1., npad=100, axis=-1, window='boxcar', n_jobs=1,
             pad='reflect_limited', verbose=None):
    """Resample an array.

    Operates along the last dimension of the array.

    Parameters
    ----------
    x : n-d array
        Signal to resample.
    up : float
        Factor to upsample by.
    down : float
        Factor to downsample by.
    npad : int | str
        Number of samples to use at the beginning and end for padding.
        Can be "auto" to pad to the next highest power of 2.
    axis : int
        Axis along which to resample (default is the last axis).
    window : string or tuple
        See :func:`scipy.signal.resample` for description.
    n_jobs : int | str
        Number of jobs to run in parallel. Can be 'cuda' if scikits.cuda
        is installed properly and CUDA is initialized.
    pad : str
        The type of padding to use. Supports all :func:`numpy.pad` ``mode``
        options. Can also be "reflect_limited" (default), which pads with a
        reflected version of each vector mirrored on the first and last
        values of the vector, followed by zeros.

        .. versionadded:: 0.15
    verbose : bool, str, int, or None
        If not None, override default verbose level (see :func:`mne.verbose`
        and :ref:`Logging documentation <tut_logging>` for more).

    Returns
    -------
    xf : array
        x resampled.

    Notes
    -----
    This uses (hopefully) intelligent edge padding and frequency-domain
    windowing improve scipy.signal.resample's resampling method, which
    we have adapted for our use here. Choices of npad and window have
    important consequences, and the default choices should work well
    for most natural signals.

    Resampling arguments are broken into "up" and "down" components for future
    compatibility in case we decide to use an upfirdn implementation. The
    current implementation is functionally equivalent to passing
    up=up/down and down=1.
    """
    from scipy.fftpack import fft, ifftshift, fftfreq, ifft
    
    from scipy.signal import get_window
    # check explicitly for backwards compatibility
    if not isinstance(axis, int):
        err = ("The axis parameter needs to be an integer (got %s). "
               "The axis parameter was missing from this function for a "
               "period of time, you might be intending to specify the "
               "subsequent window parameter." % repr(axis))
        raise TypeError(err)

    # make sure our arithmetic will work
    x = np.asanyarray(x)
    ratio = float(up) / down
    if axis < 0:
        axis = x.ndim + axis
    orig_last_axis = x.ndim - 1
    if axis != orig_last_axis:
        x = x.swapaxes(axis, orig_last_axis)
    orig_shape = x.shape
    x_len = orig_shape[-1]
    if x_len == 0:
        warn('x has zero length along last axis, returning a copy of x')
        return x.copy()
    bad_msg = 'npad must be "auto" or an integer'
    if isinstance(npad, string_types):
        if npad != 'auto':
            raise ValueError(bad_msg)
        # Figure out reasonable pad that gets us to a power of 2
        min_add = min(x_len // 8, 100) * 2
        npad = 2 ** int(np.ceil(np.log2(x_len + min_add))) - x_len
        npad, extra = divmod(npad, 2)
        npads = np.array([npad, npad + extra], int)
    else:
        if npad != int(npad):
            raise ValueError(bad_msg)
        npads = np.array([npad, npad], int)
    del npad

    # prep for resampling now
    x_flat = x.reshape((-1, x_len))
    orig_len = x_len + npads.sum()  # length after padding
    new_len = int(round(ratio * orig_len))  # length after resampling
    final_len = int(round(ratio * x_len))
    to_removes = [int(round(ratio * npads[0]))]
    to_removes.append(new_len - final_len - to_removes[0])
    to_removes = np.array(to_removes)
    # This should hold:
    # assert np.abs(to_removes[1] - to_removes[0]) <= int(np.ceil(ratio))

    # figure out windowing function
    if window is not None:
        if callable(window):
            W = window(fftfreq(orig_len))
        elif isinstance(window, np.ndarray) and \
                window.shape == (orig_len,):
            W = window
        else:
            W = ifftshift(get_window(window, orig_len))
    else:
        W = np.ones(orig_len)
    W *= (float(new_len) / float(orig_len))
    W = W.astype(np.complex128)

    # figure out if we should use CUDA
    n_jobs, cuda_dict, W = setup_cuda_fft_resample(n_jobs, W, new_len)

    # do the resampling using an adaptation of scipy's FFT-based resample()
    # use of the 'flat' window is recommended for minimal ringing
    if n_jobs == 1:
        y = np.zeros((len(x_flat), new_len - to_removes.sum()), dtype=x.dtype)
        for xi, x_ in enumerate(x_flat):
            y[xi] = fft_resample(x_, W, new_len, npads, to_removes,
                                 cuda_dict, pad)
    else:
        parallel, p_fun, _ = parallel_func(fft_resample, n_jobs)
        y = parallel(p_fun(x_, W, new_len, npads, to_removes, cuda_dict, pad)
                     for x_ in x_flat)
        y = np.array(y)

    # Restore the original array shape (modified for resampling)
    y.shape = orig_shape[:-1] + (y.shape[1],)
    if axis != orig_last_axis:
        y = y.swapaxes(axis, orig_last_axis)

    return y
```