# Temple EEG

This notebook looks around the Temple University Hospital EEG Abnormal Corpus (`TUAB`) dataset and convert the file format from the `EDF` to `NumPy memmap` for the speed-up.

-----

## Configure environments

In [1]:
# for auto-reloading external modules
# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2
%cd ..

C:\Users\Minjae\Desktop\EEG_Project


In [2]:
# Load some packages
import os
import glob

import math
import json
import pyedflib
import numpy as np

import pprint
from tqdm.auto import tqdm

# custom package
from datasets.utils import *
from datasets.pipeline import *

In [4]:
# Data file path
origin_path = r'H:\Other_DB\Temple_EEG\tuh_eeg_abnormal\v2.0.0\edf'
desired_path = 'local/dataset/05_Temple_EEG/'

---
## `TUAB` dataset

In [29]:
count = 0
text_files = glob.glob(os.path.join(origin_path, 'train/normal/01_tcp_ar/*/*/*/*.txt'))
for filename in text_files:
    with open(filename, 'rt', encoding='UTF-8') as f:
        text_script = f.read()
        if 'year' in text_script:
            count += 1
        else:
            print(text_script)
            print('-----' * 4)
            
print(count, len(text_files)) 

LENGTH OF THE RECORDING:  24 minutes.
ACTIVATION PROCEDURES:  Hyperventilation and photic stimulation.
CONDITIONS OF THE RECORDING:  The recording was done according to the standard 10-20 Hz system with additional T1 and T2 electrodes, and a single EKG lead.
CLINICAL HISTORY:  The patient presents with concerns of short-term memory as well as tinnitus bilaterally status post a fall with loss of consciousness on 09/10/2012.  Additionally, the patient has been having episodes of slurred speech and headaches.
MEDICATIONS:  Carbamazepine.
INTRODUCTION:  Digital video EEG is performed in the lab/bedside using standard 10-20 system of electrode placement with one channel of EKG.  Hyperventilation and photic stimulation are performed.
DESCRIPTION OF THE RECORDING:  During brief wakefulness the posterior dominant rhythm consists of a well-formed and modulated low to moderate amplitude 10.5 to 11 Hz alpha activity that attenuates with eyes opening.  There is an anterior to posterior frequency a

---
## Simple preprocessing with converting

#### Test trailing zero signals trimming

In [None]:
for i, f in enumerate(glob.glob(os.path.join(curate_path, 'signal/*.edf'))):
    signals, signal_headers, edf_header = pyedflib.highlevel.read_edf(f)
    signals = trim_trailing_zeros(signals)  # trim garbage zeros
    print(f, end='\t\t')
    print(edf_header['startdate'], end='\t')
    print(edf_header['startdate'] + datetime.timedelta(seconds = signals.shape[1] / 200), end='\t')
    print()
    
    if i > 10:
        break

---
## Convert and Save

In [None]:
save_feather = True
save_memmap = True

In [None]:
if save_feather:
    os.makedirs(os.path.join(curate_path, 'signal/feather'), exist_ok=True)

if save_memmap:
    os.makedirs(os.path.join(curate_path, 'signal/memmap'), exist_ok=True)

for f in tqdm(glob.glob(os.path.join(curate_path, 'signal/*.edf'))):
    # file name
    serial = f.split('.edf')[0][-5:]
    
    # load signal
    signals, signal_headers, edf_header = pyedflib.highlevel.read_edf(f)
    signals = trim_trailing_zeros(signals)
    signals = signals.astype('int32')
    
    # save as feather
    if save_feather:
        df = pd.DataFrame(data=signals.T, columns=[s_h['label'] for s_h in signal_headers], dtype=np.int32)
        feather.write_feather(df, os.path.join(curate_path, 'signal/feather', serial + '.feather'))

    # save as numpy memmap
    if save_memmap:
        fp = np.memmap(os.path.join(curate_path, 'signal/memmap', serial + '.dat'), 
                       dtype='int32', mode='w+', shape=signals.shape)
        fp[:] = signals[:]
        fp.flush()