# sleep
The following is a diagram of the tree structure of the repository.
```
.
├── data
│   ├── README.md
├── logs
│   ├── 1617040190.9484377
│   ├── 1617040246.9613357
│   ├── 1617040355.7140718
│   ├── 1617040428.4261482
│   └── 1617040538.6125515
├── makefile
├── model
│   ├── best_model.h5
│   ├── best_transformer.h5
│   ├── saved_model.pb
│   └── variables
├── notebooks
│   ├── data.ipynb
│   ├── main.ipynb
│   ├── mlp_from_split.ipynb
│   ├── presentation.ipynb
│   ├── timeseries.ipynb
│   ├── train.ipynb
│   └── transformer.ipynb
├── README.md
├── requirements.txt
├── scripts
│   ├── train_backup.py
│   ├── train.py
│   ├── transformer.py
│   └── unzipAndRenameData.py
└── utils
    ├── style.txt
    └── utils.py
```
# Obtain and Preprocess Data
## Obtain Data
We have a tarred and gzipped data file on Google Drive which can be navigated to and downloaded by hand with the following command in the top level of the tree structure:
```
make downloaddata
```
Make sure to place the file "RawSleepData.tar.gz" in ./data. The data files contained in the tarred and gzipped package are 24 files named as the following:
```
 10secScoredDataPowerControl.xls
 10secScoredDataPowerSleepDeprivation.xls
'2020 Feb A Control 10 sec Scored Data Power Activity and EMG.xls'
'2020 Feb B Control 1 10 sec Scored Data Power Activity and EMG.xls'
'2020 Feb B Control 2 10 sec Scored Data Power Activity and EMG.xls'
'2020 Feb B Sleep Dep 10 sec Scored Data Power Activity and EMG.xls'
'2020 Feb C Control 10 sec Scored Data Power Activity and EMG.xls'
'2020 Feb C Sleep Dep 10 sec Scored Data Power Activity and EMG.xls'
'2020 Feb D Control 10 sec Scored Data Power Activity and EMG.xls'
'2020 Feb D Sleep Dep 10 sec Scored Data Power Activity and EMG.xls'
'2020 Feb F Control 10 sec Scored Data Power Activity and EMG.xls'
'2020 Feb F Sleep Dep 10 sec Scored Data Power Activity and EMG.xls'
'2020 Jun A Control 10 sec Scored Data Power Activity and EMG.xls'
'2020 Jun A Sleep Dep 10 sec Scored Data Power Activity and EMG.xls'
'2020 Jun B Control 10 sec Scored Data Power Activity and EMG.xls'
'2020 Jun B Sleep Dep 10 sec Scored Data Power Activity and EMG.xls'
'2020 Jun C Control 10 sec Scored Data Power Activity and EMG.xls'
'2020 Jun C Sleep Dep 10 sec Scored Data Power Activity and EMG.xls'
'2020 Jun D Control Not Scored 10 sec Scored Data Power Activity and EMG.xls'
'2020 Jun D Sleep Dep Not Scored 10 sec Scored Data Power Activity and EMG.xls'
'2020 Jun E Control Not Scored 10 sec Scored Data Power Activity and EMG No score.xls'
'2020 Jun E Sleep Dep Not Scored 10 sec Scored Data Power Activity and EMG No score.xls'
'2020 Jun F Control Not Scored 10 sec Scored Data Power Activity and EMG No score.xls'
'2020 Jun F Sleep Dep Not Scored 10 sec Scored Data Power Activity and EMG No score.xls'
```
We want to unzip, extract the tar archive, and rename these raw data files. To do this, execute the following command
```
make preprocessdata
```
This command will give the following directory structure in ./data:
```
.
├── mapping
├── raw
├── RawSleepData.tar
├── README.md
└── renamed
```
./data/raw contains the raw files (not renamed). ./data/renamed contains the raw files, renamed to the following naming structure:
```
0.xls
1.xls
.
.
.
23.xls
```
where the mapping between names is given in the file ./data/mapping.
To convert from .xls to .csv
``` ssconvert input.xls output.csv ```


# Download Models
We want to download the trained machine learning models stored in Google Drive. Run the following command:
```
make downloadmodels
```
Make sure to save "best_model.h5", "best_transformer.h5", "saved_model.pb", and "rf_model" in the new "model" directory. 

# Download And Rename NeuroScore ZDB files
Download ZDB files that corrospond to the xls files from Dropbox
Run the following command to unzip and rename the ZDB files:
```
make renameZDB
```

This will create data/rawZDB, which has the raw ZDB files, and data/renamedZDB, which has the ZDB files renamed following the mapping as the xls data files.


# Initial Preprocessing
We want to preprocess each file in ./data/renamed

In [1]:
import os
from utils.utils import preprocess

## Preprocess
i = 0
for file in os.listdir("data/renamed"):
    print("Iteration " + str(i))
    preprocess(file)
    i += 1

2022-02-14 13:11:46.655536: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-02-14 13:11:46.655634: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.


Iteration 0


  df = pd.read_csv(file) # load csv file into pandas dataframe


Iteration 1


  df = pd.read_csv(file) # load csv file into pandas dataframe
mkdir: cannot create directory ‘data/preprocessed’: File exists


# Anomalys
For some reason anomalies in certain files (21)

In [2]:
import pandas as pd
unscored_list = []
for file in os.listdir("data/preprocessed"):
    df = pd.read_csv("data/preprocessed/"+file)
    print("======================================"+file)
    if(df.columns[0]!="Class"):
        print("NOT SCORED")
        unscored_list.append("data/preprocessed/"+file)
        continue
    if(df.columns[1]!="0-0.5"):
        print("ANOMALY")
        df.rename(columns={"EEG 1 (0-0.5 Hz, 0-0.5Hz , 10s) (Mean, 10s)":"0-0.5"},inplace=True)
    EEG_2 = df["EEG 2"]
    Activity = df["Activity"]
    X = df.iloc[:,:-2]
    X.insert(X.shape[1],"EEG 2",EEG_2)
    X.insert(X.shape[1],"Activity",Activity)
    X.to_csv("data/preprocessed/"+file,index=False)


NOT SCORED
NOT SCORED


In [3]:
l = unscored_list[0]
l[18:]

'1_preprocessed.csv'

# Window

In [4]:
from tqdm import tqdm
def window_data(target_filename):
    df = pd.read_csv(target_filename)
    Y = pd.DataFrame()
    for i in tqdm(range(len(df)-4)):
        win = df.iloc[i:i+5]
        x = win.values.flatten()
        X = pd.DataFrame(x).T
        Y = pd.concat([Y,X])
    df_win = Y
    df_win = df_win.reset_index()
    del df_win['index']
    df = df_win
    if ( not os.path.isdir('data/windowed')):
        os.system('mkdir data/windowed')
    target_filename = target_filename.replace(".csv","")
    df.to_csv("data/windowed/"+target_filename[18:]+"_windowed.csv",index=False)

In [5]:
from pandas import read_csv
i = 0
for file in unscored_list:
    print("Iteration: " + str(i))
    print(file)
    # X = read_csv(i)
    window_data(file)
    i += 1

Iteration: 0
data/preprocessed/1_preprocessed.csv


100%|██████████| 22436/22436 [05:12<00:00, 71.84it/s] 


Iteration: 1
data/preprocessed/0_preprocessed.csv


100%|██████████| 22436/22436 [04:50<00:00, 77.12it/s] 


# Scale

In [6]:
i = 0
for file in os.listdir("data/windowed"):
    filename = "data/windowed/"+file
    X = read_csv(filename)
    from sklearn.preprocessing import MinMaxScaler

    scaler = MinMaxScaler()
    print("Iteration: " + str(i))
    X = scaler.fit_transform(X)
    if ( not os.path.isdir('data/windowed_scaled')):
        os.system('mkdir data/windowed_scaled')
    file = file.replace("_preprocessed_windowed.csv", "")
    filename = "data/windowed_scaled/"+file+"_windowed_scaled.csv"
    pd.DataFrame(X).to_csv(filename, index=False)
    print(filename)
    i += 1


Iteration: 0
data/windowed_scaled/0_windowed_scaled.csv
Iteration: 1
data/windowed_scaled/1_windowed_scaled.csv


# Score ANN

In [7]:
def score_data_ann(target_filename):
    import numpy as np
    X = read_csv("data/windowed_scaled/"+target_filename)
    X = np.array(X)

    from keras.models import load_model
    model = load_model('model/best_model.h5')
    import numpy as np
    x = np.array(X)
    y = model.predict(x)
    y = np.array(y)
    y = np.argmax(y,axis=1)
    if ( not os.path.isdir('data/predictions_ann')):
        os.system('mkdir data/predictions_ann')
    target_filename = target_filename.replace("_windowed_scaled.csv", "")
    # filename = "data/predictions/"+target_filename+"_scored_ann.csv"
    filename = "data/predictions_ann/"+target_filename+".csv"
    pd.DataFrame(y).to_csv(filename, index=False)
    print(filename)

In [8]:
for file in os.listdir("data/windowed_scaled"):
    score_data_ann(file)

2022-02-14 13:25:20.787484: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-02-14 13:25:20.819416: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-02-14 13:25:20.819887: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublas.so.11'; dlerror: libcublas.so.11: cannot open shared object file: No such file or directory
2022-02-14 13:25:20.820275: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublasLt.so.11'; dlerror: libcublasLt.so.11: cannot open shared object file: No such file or directory
2022-02-14 13:25:20.820651: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Co

data/predictions_ann/0.csv
data/predictions_ann/1.csv


# Score RF

In [9]:
from pandas import read_csv
import warnings
warnings.filterwarnings("ignore")
import pandas as pd
def score_data_rf(target_filename):
    import joblib
    rf_model = joblib.load("model/rf_model")
    X = read_csv("data/windowed/"+target_filename)
    y = rf_model.predict(X)
    dct = {0: 0, 1: 0, 2: 0}
    for i in y:
        dct[i] += 1
    print(dct)
    if ( not os.path.isdir('data/predictions_rf')):
        os.system('mkdir data/predictions_rf')
    target_filename = target_filename.replace("_preprocessed_windowed.csv", "")
    # filename = "data/predictions/"+target_filename+"_scored_rf.csv"
    filename = "data/predictions_rf/"+target_filename+".csv"
    pd.DataFrame(y).to_csv(filename, index=False)
    print(filename)
    
    # This might be used later, for now use other chunk of code on unscaled data
    # Downloaded manually
    # X = read_csv("data/windowed_scaled/"+target_filename)
    # y = rf_model.predict(X)
    # dct = {0: 0, 1: 0, 2: 0}
    # for i in y:
    #     dct[i] += 1
    # print(dct)
    # if ( not os.path.isdir('data/predictions')):
    #     os.system('mkdir data/predictions')
    # target_filename = target_filename.replace("_windowed_scaled.csv", "")
    # filename = "data/predictions/"+target_filename+"_scored_rf.csv"
    # pd.DataFrame(y).to_csv(filename, index=False)
    # print(filename)

In [10]:
for file in os.listdir("data/windowed"):
    score_data_rf(file)

{0: 621, 1: 17740, 2: 4075}
data/predictions_rf/0.csv
{0: 31, 1: 22010, 2: 395}
data/predictions_rf/1.csv


# Expand Predictions

In [11]:
from tqdm import tqdm
import pandas as pd
import numpy as np
def expand_predictions_ann(file):
    df = pd.read_csv("data/predictions_ann/"+file)
    Y = np.array(df)
    Y = Y.reshape(Y.shape[0],)
    # print(len(Y))
    lo_limit = len(Y)-1
    hi_limit = len(Y)+3
    Y_new = []
    for i,x in tqdm(enumerate(range(len(Y)+4))):
        if(i==0):
            # print("i:",i)
            # print(Y[0])
            # print("Bincount:",Y[0])
            Y_new.append(Y[0])
        elif(i<5):
            # print("i:",i)
            # print(Y[0:i])
            # print("Bincount:",np.bincount(Y[0:i]))
            # print("Class:",np.argmax(np.bincount(Y[0:i])))
            Y_new.append(np.argmax(np.bincount(Y[0:i])))
        # elif(i>8635 and i!=8639):
        elif(i>lo_limit and i!=hi_limit):
            # print("i:",i)
            # print(Y[8635-(4-(i-8635)):8635])
            # print("Bincount:",np.bincount(Y[8635-(4-(i-8635)):8635]))
            # print("Class:",np.argmax(np.bincount(Y[8635-(4-(i-8635)):8635])))

            # Y_new.append(np.argmax(np.bincount(Y[8635-(4-(i-8635)):8635])))
            Y_new.append(np.argmax(np.bincount(Y[lo_limit-(4-(i-lo_limit)):lo_limit])))
        # elif(i==8639):
        elif(i==hi_limit): 
            # print("i:",i)
            # print(Y[8635])
            # print("Bincount:",Y[8365])
            # Y_new.append(Y[8365])
            Y_new.append(Y[lo_limit])
        else:
            # print("i:",i)
            # print(Y[i-4:i])
            # print("Bincount:",np.bincount(Y[i-4:i]))
            # print("Class:",np.argmax(np.bincount(Y[i-4:i])))
            Y_new.append(np.argmax(np.bincount(Y[i-4:i])))

    if ( not os.path.isdir('data/expanded_predictions_ann')):
        os.system('mkdir data/expanded_predictions_ann')
    
        print(Y.shape[0], len(Y_new))
    pd.DataFrame(Y_new).to_csv("data/expanded_predictions_ann/"+file,index=False)

    # if file[-6:] == "rf.csv":
    #     if ( not os.path.isdir('data/expanded_predictions_rf')):
    #             os.system('mkdir data/expanded_predictions_rf')
    #     pd.DataFrame(Y_new).to_csv("data/expanded_predictions_rf/"+file,index=False)


In [12]:
for file in os.listdir("data/predictions_ann"):
    expand_predictions_ann(file)

22440it [00:00, 276554.83it/s]


22436 22440


22440it [00:00, 296062.00it/s]


In [13]:
def expand_predictions_rf(file):
    df = pd.read_csv("data/predictions_rf/"+file)
    Y = np.array(df)
    Y = Y.reshape(Y.shape[0],)
    # print(len(Y))
    lo_limit = len(Y)-1
    hi_limit = len(Y)+3
    Y_new = []
    for i,x in tqdm(enumerate(range(len(Y)+4))):
        if(i==0):
            # print("i:",i)
            # print(Y[0])
            # print("Bincount:",Y[0])
            Y_new.append(Y[0])
        elif(i<5):
            # print("i:",i)
            # print(Y[0:i])
            # print("Bincount:",np.bincount(Y[0:i]))
            # print("Class:",np.argmax(np.bincount(Y[0:i])))
            Y_new.append(np.argmax(np.bincount(Y[0:i])))
        # elif(i>8635 and i!=8639):
        elif(i>lo_limit and i!=hi_limit):
            # print("i:",i)
            # print(Y[8635-(4-(i-8635)):8635])
            # print("Bincount:",np.bincount(Y[8635-(4-(i-8635)):8635]))
            # print("Class:",np.argmax(np.bincount(Y[8635-(4-(i-8635)):8635])))
            
            # Y_new.append(np.argmax(np.bincount(Y[8635-(4-(i-8635)):8635])))
            Y_new.append(np.argmax(np.bincount(Y[lo_limit-(4-(i-lo_limit)):lo_limit])))
        elif(i==hi_limit):
            # print("i:",i)
            # print(Y[8635])
            # print("Bincount:",Y[8365])
            Y_new.append(Y[lo_limit])
        else:
            # print("i:",i)
            # print(Y[i-4:i])
            # print("Bincount:",np.bincount(Y[i-4:i]))
            # print("Class:",np.argmax(np.bincount(Y[i-4:i])))
            Y_new.append(np.argmax(np.bincount(Y[i-4:i])))

    if ( not os.path.isdir('data/expanded_predictions_rf')):
        os.system('mkdir data/expanded_predictions_rf')
    pd.DataFrame(Y_new).to_csv("data/expanded_predictions_rf/"+file,index=False)


In [14]:
for file in os.listdir("data/predictions_rf"):
    expand_predictions_rf(file)

22440it [00:00, 207313.63it/s]
22440it [00:00, 233104.03it/s]


In [15]:
if ( not os.path.isdir('data/expanded_combined_rf')):
        os.system('mkdir data/expanded_combined_rf')

if ( not os.path.isdir('data/expanded_combined_ann')):
        os.system('mkdir data/expanded_combined_ann')

In [16]:
for i in os.listdir("data/expanded_predictions_rf"):
    df = pd.read_csv("data/expanded_predictions_rf/" + i)
    i = i.replace("data/expanded_combined_rf/", "")
    for j in os.listdir("data/expanded_predictions_rf"):
        df1 = pd.read_csv("data/expanded_predictions_rf/" + j)
        j = j.replace("data/expanded_combined_rf/", "")
        if i[7:11] == j[7:11]:
            if i[12:] != j[12:]:
                frames = [df, df1]
                df2 = pd.concat(frames)
                df2.to_csv("data/expanded_combined_rf/" + i[0:11] + "_combined.csv")

for i in os.listdir("data/expanded_predictions_ann"):
    df = pd.read_csv("data/expanded_predictions_ann/" + i)
    i = i.replace("data/expanded_combined_ann/", "")
    for j in os.listdir("data/expanded_predictions_ann"):
        df1 = pd.read_csv("data/expanded_predictions_ann/" + j)
        j = j.replace("data/expanded_combined_ann/", "")
        if i[7:11] == j[7:11]:
            if i[12:] != j[12:]:
                frames = [df, df1]
                df2 = pd.concat(frames)
                df2.to_csv("data/expanded_combined_ann/" + i[0:11] + "_combined.csv")



In [17]:
# i = 0
# import pandas as pd
# if ( not os.path.isdir('data/expanded_predictions_rf_renamed')):
#                 os.system('mkdir data/expanded_predictions_rf_renamed')

# if ( not os.path.isdir('data/expanded_predictions_ann_renamed')):
#                 os.system('mkdir data/expanded_predictions_ann_renamed')

# with open("data/mapping") as f:
#     df = pd.read_csv("expanded_predictions_ann/"+str(i)+"_scored_ann.csv")
#     for line in f:
#         print(len(line))
#         df.to_csv("data/expanded_predictions_ann_renamed/"+line.replace(".xls",".csv"),index=False)

#     df1 = pd.read_csv("expanded_predictions_rf/"+str(i)+"_scored_rf.csv")
#     for line in f:
#         print(len(line))
#         df.to_csv("data/expanded_predictions_rf_renamed/"+line.replace(".xls",".csv"),index=False)
#     i += 1


In [18]:
if ( not os.path.isdir('data/expanded_renamed_rf')):
        os.system('mkdir data/expanded_renamed_rf')

if ( not os.path.isdir('data/expanded_renamed_ann')):
        os.system('mkdir data/expanded_renamed_ann')

if ( not os.path.isdir('data/expanded_combined_renamed_rf')):
        os.system('mkdir data/expanded_combined_renamed_rf')

if ( not os.path.isdir('data/expanded_combined_renamed_ann')):
        os.system('mkdir data/expanded_combined_renamed_ann')

In [19]:
rename_dict = {0: 'P', 1: 'S', 2: 'W'}
rename_dict

{0: 'P', 1: 'S', 2: 'W'}

In [20]:
from pandas import read_csv
import pandas as pd

In [21]:
for file in os.listdir('data/expanded_predictions_ann'):
    df = read_csv('data/expanded_predictions_ann/' + file)
    y = df['0']
    new_y = []
    for i in y:
        new_y.append(rename_dict[i])
    pd.DataFrame(new_y).to_csv("data/expanded_renamed_ann/"+file,index=False)

In [22]:
for file in os.listdir('data/expanded_predictions_rf'):
    df = read_csv('data/expanded_predictions_rf/' + file)
    y = df['0']
    new_y = []
    for i in y:
        new_y.append(rename_dict[i])
    pd.DataFrame(new_y).to_csv("data/expanded_renamed_rf/"+file,index=False)

In [23]:
for file in os.listdir('data/expanded_combined_rf'):
    df = read_csv('data/expanded_combined_rf/' + file)
    y = df['0']
    new_y = []
    for i in y:
        new_y.append(rename_dict[i])
    pd.DataFrame(new_y).to_csv("data/expanded_combined_renamed_rf/"+file,index=False)

In [24]:
for file in os.listdir('data/expanded_combined_ann'):
    df = read_csv('data/expanded_combined_ann/' + file)
    y = df['0']
    new_y = []
    for i in y:
        new_y.append(rename_dict[i])
    pd.DataFrame(new_y).to_csv("data/expanded_combined_renamed_ann/"+file,index=False)

# Remap names

In [25]:
import os
mapping = open('data/mapping').read().splitlines()
i=0
os.system('mkdir data/final_ann')
for file in os.listdir('data/expanded_renamed_ann'):
    index_str = file.replace('.csv', '')
    newName = mapping[int(index_str)].replace('.xls', '-ann.csv')
    os.system('cp data/expanded_renamed_ann/' +file+" data/final_ann/'"+newName+"'")
    i+=1

In [26]:
mapping = open('data/mapping').read().splitlines()
i=0
os.system('mkdir data/final_rf')
for file in os.listdir('data/expanded_renamed_rf'):
    index_str = file.replace('.csv', '')
    newName = mapping[int(index_str)].replace('.xls', '-rf.csv')
    os.system('cp data/expanded_renamed_rf/' +file+" data/final_rf/'"+newName+"'")
    i+=1

# Import CSV Predictions into Neuroscore ZDB file


In [None]:
print(yo)

In [None]:
import os
import sqlite3
from sqlite3 import Error

def ZDBConversion(csv, zdb):
    try:
        conn = sqlite3.connect(zdb)
    except Error as e:
        print(e)
    
    


In [None]:
import os

os.system('mkdir finalZDB_ann')
i=0
for csv in os.listdir('data/expanded_renamed_ann'):
    zdb = 'data/renamedZDB/'+str(i)+'.zdb'
    os.system("cp data/renamedZDB/'"+zdb+"' data/finalZDB_ann/")
    ZDBConversion()
    i += 1