# Introduction
In this year's PhysioNet Challenge we will use a variety of physiological signals, collected during polysomnographic sleep studies, to detect these other sources of arousal (non-apnea) during sleep.

## Challenge data
- the number of subjects: 1985
    - trianing: 994
    - test: 989
- six sleep stages were annotated in 30 second contiguous intervals: 
     - wakefulness
     - stage 1
     - stage 2
     - stage 3
     - rapid eye movement (REM)
     - undefined
- The annotated arousals were classified as either
    - spontaneous arousals
    - respiratory effort related arousals (RERA)
    - bruxisms
    - hypoventilations
    - hypopneas
    - apneas (central, obstructive and mixed)
    - vocalizations
    - snores
    - periodic leg movements
    - Cheyne-Stokes breathing or partial airway obstructions
- signals recorded
    - electroencephalography (EEG)
    - electrooculography (EOG)
    - electromyography (EMG)
    - electrocardiology (EKG)
    - oxygen saturation (SaO2)
- Excluding SaO2, all signals were sampled to 200 Hz
- For analytic convenience, SaO2 was resampled to 200 Hz, and is measured as a percentage
- The goal of the challenge is use information from the available signals to correctly classify target arousal regions. For the purpose of the Challenge, target arousals are defined as regions where either of the following conditions were met:

# Preparations
## Load libraries

In [1]:
import os
import pandas as pd
import numpy as np

os.chdir('../src')
import physionetchallenge2018_lib as phyc

## Load data

In [29]:
#load directry_names
record_names_training = pd.read_table('../input/training/RECORDS', header=None)

In [37]:
#load files
for i in list(record_names_triaining):
    record_name = '../input/training/' + record_names_triaining[0][i] + record_names_triaining[0][i].strip('/')
    header_file = record_name + '.hea'
    signal_file = record_name + '.mat'
    arousal_file = record_name + '-arousal.mat'
    
    # Get the signal names from the header file
    signal_names, Fs, n_samples = phyc.import_signal_names(header_file)
    signal_names = list(np.append(signal_names, 'arousals'))
    
    # Convert this subject's data into a pandas dataframe
    this_data = phyc.get_subject_data(arousal_file, signal_file, signal_names)

  from ._conv import register_converters as _register_converters


In [None]:


# ----------------------------------------------------------------------
# Generate the Features for the classificaition model - variance of SaO2
# ----------------------------------------------------------------------

# For the baseline, let's only look at how SaO2 might predict arousals

SaO2 = this_data.get(['SaO2']).values
arousals = this_data.get(['arousals']).values

# We select a window size of 60 seconds with no overlap to compute
# the features
step        = Fs * 60
window_size = Fs * 60

# Initialize the matrices that store our training data
X_subj = np.zeros([((n_samples) // step), 1])
Y_subj = np.zeros([((n_samples) // step), 1])

# Extract the variance of the SaO2 in 60 second windows as a feature
for idx, k in enumerate(range(0, (n_samples-step+1), step)):
    X_subj[idx, 0] = np.var(np.transpose(SaO2[k:k+window_size]), axis=1)
    Y_subj[idx]    = np.max(arousals[k:k+window_size])

    # Ignore records that do not contain any arousals
if not np.any(Y_subj):
    sys.stderr.write('no arousals found in %s\n' % record_name)
    return