## File type demo

This notebook demonstrates some of the capabilities of the eegyolk library in terms of reading the cnt file type. It also demonstrates the importance of the dataformat argument in the mne library. Previous work had let some arguments in reading cnt files default to preset values.
This notebook specifically demonstrates how whether the format of int32 or int16 influences the understanding of the file

#### Imports

In [1]:
import mne      # toolbox for analyzing and visualizing EEG data
import os       # using operating system dependent functionality (folders)
import pandas   # data analysis and manipulation
import numpy    # numerical computing (manipulating and performing operations on arrays of data)
import copy     # Can Copy and Deepcopy files so original file is untouched.
import glob
import numpy as np
import pandas as pd
from numpy.fft import fft, fftfreq
from scipy import signal

import matplotlib
import matplotlib.pyplot as plt

from mne.time_frequency.tfr import morlet
from mne.viz import plot_filter, plot_ideal_filter


from IPython.display import clear_output
import warnings

import sys

Below eegyolk is importted in it's most updated version
But the cell could be replaced by importing the stable library and then importing the modules of it.

In [2]:

import sys
sys.path.insert(0, '../eegyolk') # path to helper functions
import helper_functions as hf # library useful for eeg and erp data cleaning
#import initialization_functions #library to import data
import epod_helper
import raw
from config import Config
from raw import RawData

#### Load EEG files
Change your config file to change where data is coming from

In [3]:
!pwd

/home/cmoore/eegyolk/demos


In [4]:
config = Config()


Object `SwitchedRawData` not found.


In [6]:
#eeg_file_root = "../../volume-ceph/ePodium_projectfolder"
acquired = RawData(config.get_directory('data'), config.get_directory('metadata'))

Note we assume you are looking at bdf files

In [7]:
acquired.raw.head()

Unnamed: 0,code,cnt_path,cnt_file,age_group,age_days,age_months,age_years
0,35,/volume-ceph/DDP_projectfolder/11mnd mmn/035_1...,035_11_jc_mmn36_slp_mmn25_slp,11,331.0,11.033333,0.919444
1,27,/volume-ceph/DDP_projectfolder/11mnd mmn/027_1...,027_11_jc_mmn25_wk,11,326.0,10.866667,0.905556
2,25,/volume-ceph/DDP_projectfolder/11mnd mmn/025_1...,025_11_mc_mmn36_wk,11,360.0,12.0,1.0
3,35,/volume-ceph/DDP_projectfolder/11mnd mmn/035_1...,035_11_jc_mmn36slp_mmn25_slp_2,11,331.0,11.033333,0.919444
4,30,/volume-ceph/DDP_projectfolder/11mnd mmn/030_1...,030_11_jc_mmn36_wk_mmn25_wk,11,328.0,10.933333,0.911111


In [8]:
data_raw = acquired.as_mne[2]

In [9]:
print('Data type: {}\n\n{}\n'.format(type(data_raw), data_raw))

# Get the sample rate
print('Sample rate:', data_raw.info['sfreq'], 'Hz')

# Get the size of the matrix
print('Size of the matrix: {}\n'.format(data_raw.get_data().shape))

# The mne.info class can be used to learn more about the data.
print(data_raw.info)

Data type: <class 'mne.io.cnt.cnt.RawCNT'>

<RawCNT | 025_11_mc_mmn36_wk.cnt, 64 x 195390 (390.8 s), ~95.5 MB, data loaded>

Sample rate: 500.0 Hz
Size of the matrix: (64, 195390)

<Info | 8 non-empty values
 bads: []
 ch_names: O2, O1, OZ, PZ, P4, CP4, P8, C4, TP8, T8, P7, P3, CP3, CPZ, CZ, ...
 chs: 62 EEG, 2 EOG
 custom_ref_applied: False
 highpass: 0.0 Hz
 lowpass: 250.0 Hz
 meas_date: 2002-03-19 11:50:16 UTC
 nchan: 64
 projs: []
 sfreq: 500.0 Hz
 subject_info: 5 items (dict)
>


# Now we will show as a pandas dataframe

In [10]:
raw_df = data_raw.to_data_frame()
raw_df.head()

Unnamed: 0,time,O2,O1,OZ,PZ,P4,CP4,P8,C4,TP8,...,F2,F6,FC5,F1,AF4,AF8,F5,AF7,AF3,FPZ
0,0,6253748.0,8242944.0,15000950.0,24360910.0,10690540.0,1188921.0,3539428.0,17058890.0,-2513232.0,...,-4592944.0,3691768.0,12936850.0,21315070.0,7617701.0,5630349.0,7997737.0,5734013.0,749085.4,8542372.0
1,2,7166750.0,9477653.0,16187340.0,25425050.0,12539220.0,1337556.0,4167398.0,18106560.0,-898265.7,...,-3960995.0,4459465.0,13811430.0,22133570.0,8093100.0,6695248.0,8453425.0,6465057.0,4346442.0,10169480.0
2,4,6961324.0,9260436.0,15639770.0,25081770.0,11632090.0,1063176.0,3916206.0,17890190.0,-1182595.0,...,-5777863.0,2551495.0,11891890.0,20212370.0,6451839.0,4973270.0,6778696.0,4169124.0,-317753.9,8372844.0
3,6,4096767.0,6105062.0,12251700.0,21683370.0,8577715.0,-1349102.0,1404325.0,14656050.0,-5129023.0,...,-9208469.0,-1083805.0,8166398.0,16551840.0,3271193.0,1370719.0,3622926.0,251190.2,-6650079.0,4553613.0
4,8,2031094.0,4047205.0,10129880.0,19474970.0,6683092.0,-3052541.0,-296903.5,12583460.0,-6982832.0,...,-9569584.0,-1388626.0,7666639.0,16063020.0,2908989.0,804284.2,3326748.0,445368.4,-8068571.0,2350257.0


Now we used the default settings above, let's just switch the settings and compare

In [11]:
def read_raw_agnostic(fname, int_arg):
    """
    For testing purposes
    """
    
    cnt_read_args = {
        'eog': 'auto',
        'data_format': int_arg,
        'date_format': 'dd/mm/yy',
        'verbose': False,
    }
    read = mne.io.read_raw_cnt(
            fname,
            data_format=int_arg,
            preload=True,
            
        )
    return read

In [12]:
n = 2

In [13]:
paths_df = acquired.raw
paths_df.cnt_path[n]

'/volume-ceph/DDP_projectfolder/11mnd mmn/025_11_mc_mmn36_wk.cnt'

In [14]:
read_on_32 = read_raw_agnostic(paths_df.cnt_path[n], 'int32')


Reading 0 ... 195389  =      0.000 ...   390.778 secs...


  read = mne.io.read_raw_cnt(


In [15]:
raw_df32 = read_on_32.to_data_frame()
raw_df32.head()

Unnamed: 0,time,O2,O1,OZ,PZ,P4,CP4,P8,C4,TP8,...,F2,F6,FC5,F1,AF4,AF8,F5,AF7,AF3,FPZ
0,0,6253748.0,8242944.0,15000950.0,24360910.0,10690540.0,1188921.0,3539428.0,17058890.0,-2513232.0,...,-4592944.0,3691768.0,12936850.0,21315070.0,7617701.0,5630349.0,7997737.0,5734013.0,749085.4,8542372.0
1,2,7166750.0,9477653.0,16187340.0,25425050.0,12539220.0,1337556.0,4167398.0,18106560.0,-898265.7,...,-3960995.0,4459465.0,13811430.0,22133570.0,8093100.0,6695248.0,8453425.0,6465057.0,4346442.0,10169480.0
2,4,6961324.0,9260436.0,15639770.0,25081770.0,11632090.0,1063176.0,3916206.0,17890190.0,-1182595.0,...,-5777863.0,2551495.0,11891890.0,20212370.0,6451839.0,4973270.0,6778696.0,4169124.0,-317753.9,8372844.0
3,6,4096767.0,6105062.0,12251700.0,21683370.0,8577715.0,-1349102.0,1404325.0,14656050.0,-5129023.0,...,-9208469.0,-1083805.0,8166398.0,16551840.0,3271193.0,1370719.0,3622926.0,251190.2,-6650079.0,4553613.0
4,8,2031094.0,4047205.0,10129880.0,19474970.0,6683092.0,-3052541.0,-296903.5,12583460.0,-6982832.0,...,-9569584.0,-1388626.0,7666639.0,16063020.0,2908989.0,804284.2,3326748.0,445368.4,-8068571.0,2350257.0


In [16]:
read_on_16 = read_raw_agnostic(paths_df.cnt_path[n], 'int16')
raw_df16 = read_on_16.to_data_frame()
raw_df16.head()

Reading 0 ... 390779  =      0.000 ...   781.558 secs...


  read = mne.io.read_raw_cnt(


Unnamed: 0,time,O2,O1,OZ,PZ,P4,CP4,P8,C4,TP8,...,F2,F6,FC5,F1,AF4,AF8,F5,AF7,AF3,FPZ
0,0,-274.444902,95.420848,211.836368,125.883886,104.424014,229.393813,166.724743,369.939166,382.993497,...,21.3517,75.96899,126.863087,115.004328,-40.58722,75.54019,107.605393,-40.784313,80.34632,118.103448
1,2,-271.136239,103.968602,220.539587,137.058045,115.462123,241.256002,171.777008,379.843596,390.802608,...,30.650021,85.615846,122.357021,121.769288,-29.706391,86.603284,102.911781,11.503268,94.199134,130.344827
2,4,-268.524137,109.376365,226.631841,144.740279,121.243989,247.535985,174.73868,386.099026,390.629072,...,37.193284,95.262702,113.171579,126.279262,-21.93437,94.0363,96.13212,52.28758,102.164502,145.862068
3,6,-266.086174,112.341913,230.287193,148.4068,122.99607,248.931537,175.087112,389.052979,386.117142,...,40.292725,101.808783,106.23917,128.707709,-18.825562,97.666378,71.621037,66.753811,100.779221,155.172413
4,8,-270.962099,106.236374,224.717133,141.42295,114.410874,239.162675,167.770039,380.886168,374.490243,...,32.716315,96.640824,100.866553,122.28967,-27.288429,88.677614,33.202956,44.444443,86.406926,148.275862


In [17]:
# The big difference

In [18]:
(raw_df32 -raw_df16).sum().sum()

-37583506229279.79

### Youch!

So here we saw an example of how reading the same file on int32 and int16 will give us dramatically different results. We must know which we have when we use cnt files.

## This is no floating point error!

So is there anything we can do to figure out which kind of files we had?
Why, yes of course. First of all on a few files in the DDP dataset there is information on this
in the metadata (numchannels). Second of all, we can be a bit sneaky, and ask ourselves if
32 is a possibility. When we look at the number of bit and bytes inside the file for the
information, is this a number that we could divide by 32? Well it turns out when you peel the header off of
the files you can calculate the number of bits/bytes inside. And they are (all tested so far) with a bit
number divisible by 32 (or a byte number divisible by 4). But wait, that means de-facto , they
are also divisble by 16. So we can say probably the files should have been read as 32; but
we can not say with 100% certainty. This is not a problem outside the cnt format, but for cnt formatted
data, back to cell 1.

In [1]:
4*8

32