# Introduction
In this year's PhysioNet Challenge we will use a variety of physiological signals, collected during polysomnographic sleep studies, to detect these other sources of arousal (non-apnea) during sleep.

## Challenge data
- the number of subjects: 1985
    - trianing: 994
    - test: 989
- six sleep stages were annotated in 30 second contiguous intervals: 
     - wakefulness
     - stage 1
     - stage 2
     - stage 3
     - rapid eye movement (REM)
     - undefined
- The annotated arousals were classified as either
    - spontaneous arousals
    - respiratory effort related arousals (RERA)
    - bruxisms
    - hypoventilations
    - hypopneas
    - apneas (central, obstructive and mixed)
    - vocalizations
    - snores
    - periodic leg movements
    - Cheyne-Stokes breathing or partial airway obstructions
- signals recorded
    - electroencephalography (EEG)
    - electrooculography (EOG)
    - electromyography (EMG)
    - electrocardiology (EKG)
    - oxygen saturation (SaO2)
- Excluding SaO2, all signals were sampled to 200 Hz
- For analytic convenience, SaO2 was resampled to 200 Hz, and is measured as a percentage
- The goal of the challenge is use information from the available signals to correctly classify target arousal regions. For the purpose of the Challenge, target arousals are defined as regions where either of the following conditions were met:

## Signals
| F3-M2 --- O2M1| E1-M2 | Chin1-Chin2 | ABD | CHEST | Airflow | SaO2 | ECG | Arousals |
|:--:|:--:|:--:|:--:|:--:|:--:|:--:|:--:|:--:|
| EEG | EOG | あご | 腹部 | 胸部 | 呼吸 | 酸素飽和度 | 心電図 | ラベル |
|uV|uV|uV|uV|uV|uV|%|mV|1:覚醒, 0:非覚醒, -1:不明 | 

# Preparations
## Load libraries

In [None]:
import os
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
from multiprocessing import Pool
import numpy as np
from tqdm import tqdm

os.chdir('../src')
import physionetchallenge2018_lib as phyc

## Load data

In [None]:
#load directry_names
record_names_training = pd.read_table('../input/training/RECORDS', header=None)

In [None]:
#multi_processing
def feature_engineering(i):
    record_name = '../input/training/' + record_names_training[0][i] + record_names_training[0][i].strip('/')
    header_file = record_name + '.hea'
    signal_file = record_name + '.mat'
    arousal_file = record_name + '-arousal.mat'
    
    # Get the signal names from the header file
    signal_names, Fs, n_samples = phyc.import_signal_names(header_file)
    signal_names = list(np.append(signal_names, 'arousals'))
    
    # Convert this subject's data into a pandas dataframe
    this_data = phyc.get_subject_data(arousal_file, signal_file, signal_names)
    print('{} is end',format(i) )
    return this_data

p = Pool(8)
resultList = p.map(feature_engineering, range(0,10,1))


In [None]:
resultList

In [None]:
#load files
features_training = [''] * len(record_names_training)
#for i in tqdm(range(len(record_names_training))):
for i in range(0, 10, 1):
    
    record_name = '../input/training/' + record_names_training[0][i] + record_names_training[0][i].strip('/')
    header_file = record_name + '.hea'
    signal_file = record_name + '.mat'
    arousal_file = record_name + '-arousal.mat'
    
    # Get the signal names from the header file
    signal_names, Fs, n_samples = phyc.import_signal_names(header_file)
    signal_names = list(np.append(signal_names, 'arousals'))
    
    # Convert this subject's data into a pandas dataframe
    #this_data = phyc.get_subject_data(arousal_file, signal_file, signal_names)

    

In [None]:
training = [''] * len(record_names_training)

In [None]:
this_data.head()

In [None]:
this_data.dtypes

In [None]:
this_data.arousals.describe()

In [None]:
(this_data['arousals']).plot.hist(bins=8, figsize=(20,10), edgecolor='white',range=[-2,2])

In [None]:
this_data.arousals.value_counts()/len(this_data)