# sleep
The following is a diagram of the tree structure of the repository.
```
.
├── data
│   ├── README.md
├── logs
│   ├── 1617040190.9484377
│   ├── 1617040246.9613357
│   ├── 1617040355.7140718
│   ├── 1617040428.4261482
│   └── 1617040538.6125515
├── makefile
├── model
│   ├── best_model.h5
│   ├── best_transformer.h5
│   ├── saved_model.pb
│   └── variables
├── notebooks
│   ├── data.ipynb
│   ├── main.ipynb
│   ├── mlp_from_split.ipynb
│   ├── presentation.ipynb
│   ├── timeseries.ipynb
│   ├── train.ipynb
│   └── transformer.ipynb
├── README.md
├── requirements.txt
├── scripts
│   ├── train_backup.py
│   ├── train.py
│   ├── transformer.py
│   └── unzipAndRenameData.py
└── utils
    ├── style.txt
    └── utils.py
```
# Obtain and Preprocess Data
## Obtain Data
We have a tarred and gzipped data file on Google Drive which can be navigated to and downloaded by hand with the following command in the top level of the tree structure:
```
make downloaddata
```
Make sure to place the file "RawSleepData.tar.gz" in ./data. The data files contained in the tarred and gzipped package are 24 files named as the following:
```
 10secScoredDataPowerControl.xls
 10secScoredDataPowerSleepDeprivation.xls
'2020 Feb A Control 10 sec Scored Data Power Activity and EMG.xls'
'2020 Feb B Control 1 10 sec Scored Data Power Activity and EMG.xls'
'2020 Feb B Control 2 10 sec Scored Data Power Activity and EMG.xls'
'2020 Feb B Sleep Dep 10 sec Scored Data Power Activity and EMG.xls'
'2020 Feb C Control 10 sec Scored Data Power Activity and EMG.xls'
'2020 Feb C Sleep Dep 10 sec Scored Data Power Activity and EMG.xls'
'2020 Feb D Control 10 sec Scored Data Power Activity and EMG.xls'
'2020 Feb D Sleep Dep 10 sec Scored Data Power Activity and EMG.xls'
'2020 Feb F Control 10 sec Scored Data Power Activity and EMG.xls'
'2020 Feb F Sleep Dep 10 sec Scored Data Power Activity and EMG.xls'
'2020 Jun A Control 10 sec Scored Data Power Activity and EMG.xls'
'2020 Jun A Sleep Dep 10 sec Scored Data Power Activity and EMG.xls'
'2020 Jun B Control 10 sec Scored Data Power Activity and EMG.xls'
'2020 Jun B Sleep Dep 10 sec Scored Data Power Activity and EMG.xls'
'2020 Jun C Control 10 sec Scored Data Power Activity and EMG.xls'
'2020 Jun C Sleep Dep 10 sec Scored Data Power Activity and EMG.xls'
'2020 Jun D Control Not Scored 10 sec Scored Data Power Activity and EMG.xls'
'2020 Jun D Sleep Dep Not Scored 10 sec Scored Data Power Activity and EMG.xls'
'2020 Jun E Control Not Scored 10 sec Scored Data Power Activity and EMG No score.xls'
'2020 Jun E Sleep Dep Not Scored 10 sec Scored Data Power Activity and EMG No score.xls'
'2020 Jun F Control Not Scored 10 sec Scored Data Power Activity and EMG No score.xls'
'2020 Jun F Sleep Dep Not Scored 10 sec Scored Data Power Activity and EMG No score.xls'
```
We want to unzip, extract the tar archive, and rename these raw data files. To do this, execute the following command
```
make renameZIP
```
This command will give the following directory structure in ./data:
```
.
├── mapping
├── raw
├── RawSleepData.tar
├── README.md
└── renamed
```
./data/raw contains the raw files (not renamed). ./data/renamed contains the raw files, renamed to the following naming structure:
```
0.xls
1.xls
.
.
.
23.xls
```
where the mapping between names is given in the file ./data/mapping.
To convert from .xls to .csv
``` ssconvert input.xls output.csv ```


# Download Models
We want to download the trained machine learning models stored in Google Drive. Run the following command:
```
make downloadmodels
```
Make sure to save "best_model.h5", "best_transformer.h5", "saved_model.pb", and "rf_model" in the new "model" directory. 

# Download And Rename NeuroScore ZDB files
Download ZDB files that corrospond to the xls files from Dropbox

NOTE: The ZDB files must have each been scored at least once in Neuroscore.

Run the following command to unzip and rename the ZDB files:
```
make renameZDB
```

This will create data/rawZDB, which has the raw ZDB files, and data/renamedZDB, which has the ZDB files renamed following the mapping as the xls data files.


# Initial Preprocessing
We want to preprocess each file in ./data/renamed

In [None]:
import os
from utils.utils import preprocess

## Preprocess
i = 0
for file in os.listdir("data/renamed"):
    print("Iteration " + str(i))
    preprocess(file)
    i += 1

# Anomalys
For some reason anomalies in certain files (21)

In [None]:
import pandas as pd
unscored_list = []
for file in os.listdir("data/preprocessed"):
    df = pd.read_csv("data/preprocessed/"+file)
    print("======================================"+file)
    if(df.columns[0]!="Class"):
        print("NOT SCORED")
        unscored_list.append("data/preprocessed/"+file)
        continue
    if(df.columns[1]!="0-0.5"):
        print("ANOMALY")
        df.rename(columns={"EEG 1 (0-0.5 Hz, 0-0.5Hz , 10s) (Mean, 10s)":"0-0.5"},inplace=True)
    EEG_2 = df["EEG 2"]
    Activity = df["Activity"]
    X = df.iloc[:,:-2]
    X.insert(X.shape[1],"EEG 2",EEG_2)
    X.insert(X.shape[1],"Activity",Activity)
    X.to_csv("data/preprocessed/"+file,index=False)


In [None]:
l = unscored_list[0]
l[18:]

# Window

In [None]:
from tqdm import tqdm
def window_data(target_filename):
    df = pd.read_csv(target_filename)
    if ( not os.path.isdir('data/windowed')):
        os.system('mkdir data/windowed')
    new_target_filename = target_filename.replace(".csv",'')
    new_target_filename = "data/windowed/"+new_target_filename[18:]+"_windowed.csv"
    os.system('touch '+new_target_filename)

    for i in tqdm(range(len(df)-4)):
        win = df.iloc[i:i+5]
        x = win.values.flatten()
        X = pd.DataFrame(x).T
        if i==0:
            X.to_csv(new_target_filename, mode='a', index=False)
        else:
            X.to_csv(new_target_filename, mode='a', index=False, header=False)

In [None]:
from pandas import read_csv
i = 0
for file in unscored_list:
    print("Iteration: " + str(i))
    print(file)
    # X = read_csv(i)
    window_data(file)
    i += 1

# Scale

In [None]:
i = 0
for file in os.listdir("data/windowed"):
    filename = "data/windowed/"+file
    X = read_csv(filename)
    from sklearn.preprocessing import MinMaxScaler

    scaler = MinMaxScaler()
    print("Iteration: " + str(i))
    X = scaler.fit_transform(X)
    if ( not os.path.isdir('data/windowed_scaled')):
        os.system('mkdir data/windowed_scaled')
    file = file.replace("_preprocessed_windowed.csv", "")
    filename = "data/windowed_scaled/"+file+"_windowed_scaled.csv"
    pd.DataFrame(X).to_csv(filename, index=False)
    print(filename)
    i += 1


# Score ANN

In [None]:
def score_data_ann(target_filename):
    import numpy as np
    X = read_csv("data/windowed_scaled/"+target_filename)
    X = np.array(X)

    from keras.models import load_model
    model = load_model('model/best_model.h5')
    import numpy as np
    x = np.array(X)
    y = model.predict(x)
    y = np.array(y)
    y = np.argmax(y,axis=1)
    if ( not os.path.isdir('data/predictions_ann')):
        os.system('mkdir data/predictions_ann')
    target_filename = target_filename.replace("_windowed_scaled.csv", "")
    # filename = "data/predictions/"+target_filename+"_scored_ann.csv"
    filename = "data/predictions_ann/"+target_filename+".csv"
    pd.DataFrame(y).to_csv(filename, index=False)
    print(filename)

In [None]:
for file in os.listdir("data/windowed_scaled"):
    score_data_ann(file)

# Score RF

In [None]:
from pandas import read_csv
import warnings
warnings.filterwarnings("ignore")
import pandas as pd
def score_data_rf(target_filename):
    import joblib
    rf_model = joblib.load("model/rf_model")
    X = read_csv("data/windowed/"+target_filename)
    y = rf_model.predict(X)
    dct = {0: 0, 1: 0, 2: 0}
    for i in y:
        dct[i] += 1
    print(dct)
    if ( not os.path.isdir('data/predictions_rf')):
        os.system('mkdir data/predictions_rf')
    target_filename = target_filename.replace("_preprocessed_windowed.csv", "")
    # filename = "data/predictions/"+target_filename+"_scored_rf.csv"
    filename = "data/predictions_rf/"+target_filename+".csv"
    pd.DataFrame(y).to_csv(filename, index=False)
    print(filename)
    
    # This might be used later, for now use other chunk of code on unscaled data
    # Downloaded manually
    # X = read_csv("data/windowed_scaled/"+target_filename)
    # y = rf_model.predict(X)
    # dct = {0: 0, 1: 0, 2: 0}
    # for i in y:
    #     dct[i] += 1
    # print(dct)
    # if ( not os.path.isdir('data/predictions')):
    #     os.system('mkdir data/predictions')
    # target_filename = target_filename.replace("_windowed_scaled.csv", "")
    # filename = "data/predictions/"+target_filename+"_scored_rf.csv"
    # pd.DataFrame(y).to_csv(filename, index=False)
    # print(filename)

In [None]:
for file in os.listdir("data/windowed"):
    score_data_rf(file)

# Expand Predictions

In [None]:
from tqdm import tqdm
import pandas as pd
import numpy as np
def expand_predictions_ann(file):
    df = pd.read_csv("data/predictions_ann/"+file)
    Y = np.array(df)
    Y = Y.reshape(Y.shape[0],)
    # print(len(Y))
    lo_limit = len(Y)-1
    hi_limit = len(Y)+3
    Y_new = []
    for i,x in tqdm(enumerate(range(len(Y)+4))):
        if(i==0):
            # print("i:",i)
            # print(Y[0])
            # print("Bincount:",Y[0])
            Y_new.append(Y[0])
        elif(i<5):
            # print("i:",i)
            # print(Y[0:i])
            # print("Bincount:",np.bincount(Y[0:i]))
            # print("Class:",np.argmax(np.bincount(Y[0:i])))
            Y_new.append(np.argmax(np.bincount(Y[0:i])))
        # elif(i>8635 and i!=8639):
        elif(i>lo_limit and i!=hi_limit):
            # print("i:",i)
            # print(Y[8635-(4-(i-8635)):8635])
            # print("Bincount:",np.bincount(Y[8635-(4-(i-8635)):8635]))
            # print("Class:",np.argmax(np.bincount(Y[8635-(4-(i-8635)):8635])))

            # Y_new.append(np.argmax(np.bincount(Y[8635-(4-(i-8635)):8635])))
            Y_new.append(np.argmax(np.bincount(Y[lo_limit-(4-(i-lo_limit)):lo_limit])))
        # elif(i==8639):
        elif(i==hi_limit): 
            # print("i:",i)
            # print(Y[8635])
            # print("Bincount:",Y[8365])
            # Y_new.append(Y[8365])
            Y_new.append(Y[lo_limit])
        else:
            # print("i:",i)
            # print(Y[i-4:i])
            # print("Bincount:",np.bincount(Y[i-4:i]))
            # print("Class:",np.argmax(np.bincount(Y[i-4:i])))
            Y_new.append(np.argmax(np.bincount(Y[i-4:i])))

    if ( not os.path.isdir('data/expanded_predictions_ann')):
        os.system('mkdir data/expanded_predictions_ann')
    
        print(Y.shape[0], len(Y_new))
    pd.DataFrame(Y_new).to_csv("data/expanded_predictions_ann/"+file,index=False)

    # if file[-6:] == "rf.csv":
    #     if ( not os.path.isdir('data/expanded_predictions_rf')):
    #             os.system('mkdir data/expanded_predictions_rf')
    #     pd.DataFrame(Y_new).to_csv("data/expanded_predictions_rf/"+file,index=False)


In [None]:
for file in os.listdir("data/predictions_ann"):
    expand_predictions_ann(file)

In [None]:
def expand_predictions_rf(file):
    df = pd.read_csv("data/predictions_rf/"+file)
    Y = np.array(df)
    Y = Y.reshape(Y.shape[0],)
    # print(len(Y))
    lo_limit = len(Y)-1
    hi_limit = len(Y)+3
    Y_new = []
    for i,x in tqdm(enumerate(range(len(Y)+4))):
        if(i==0):
            # print("i:",i)
            # print(Y[0])
            # print("Bincount:",Y[0])
            Y_new.append(Y[0])
        elif(i<5):
            # print("i:",i)
            # print(Y[0:i])
            # print("Bincount:",np.bincount(Y[0:i]))
            # print("Class:",np.argmax(np.bincount(Y[0:i])))
            Y_new.append(np.argmax(np.bincount(Y[0:i])))
        # elif(i>8635 and i!=8639):
        elif(i>lo_limit and i!=hi_limit):
            # print("i:",i)
            # print(Y[8635-(4-(i-8635)):8635])
            # print("Bincount:",np.bincount(Y[8635-(4-(i-8635)):8635]))
            # print("Class:",np.argmax(np.bincount(Y[8635-(4-(i-8635)):8635])))
            
            # Y_new.append(np.argmax(np.bincount(Y[8635-(4-(i-8635)):8635])))
            Y_new.append(np.argmax(np.bincount(Y[lo_limit-(4-(i-lo_limit)):lo_limit])))
        elif(i==hi_limit):
            # print("i:",i)
            # print(Y[8635])
            # print("Bincount:",Y[8365])
            Y_new.append(Y[lo_limit])
        else:
            # print("i:",i)
            # print(Y[i-4:i])
            # print("Bincount:",np.bincount(Y[i-4:i]))
            # print("Class:",np.argmax(np.bincount(Y[i-4:i])))
            Y_new.append(np.argmax(np.bincount(Y[i-4:i])))

    if ( not os.path.isdir('data/expanded_predictions_rf')):
        os.system('mkdir data/expanded_predictions_rf')
    pd.DataFrame(Y_new).to_csv("data/expanded_predictions_rf/"+file,index=False)


In [None]:
for file in os.listdir("data/predictions_rf"):
    expand_predictions_rf(file)

In [None]:
if ( not os.path.isdir('data/expanded_combined_rf')):
        os.system('mkdir data/expanded_combined_rf')

if ( not os.path.isdir('data/expanded_combined_ann')):
        os.system('mkdir data/expanded_combined_ann')

In [None]:
for i in os.listdir("data/expanded_predictions_rf"):
    df = pd.read_csv("data/expanded_predictions_rf/" + i)
    i = i.replace("data/expanded_combined_rf/", "")
    for j in os.listdir("data/expanded_predictions_rf"):
        df1 = pd.read_csv("data/expanded_predictions_rf/" + j)
        j = j.replace("data/expanded_combined_rf/", "")
        if i[7:11] == j[7:11]:
            if i[12:] != j[12:]:
                frames = [df, df1]
                df2 = pd.concat(frames)
                df2.to_csv("data/expanded_combined_rf/" + i[0:11] + "_combined.csv")

for i in os.listdir("data/expanded_predictions_ann"):
    df = pd.read_csv("data/expanded_predictions_ann/" + i)
    i = i.replace("data/expanded_combined_ann/", "")
    for j in os.listdir("data/expanded_predictions_ann"):
        df1 = pd.read_csv("data/expanded_predictions_ann/" + j)
        j = j.replace("data/expanded_combined_ann/", "")
        if i[7:11] == j[7:11]:
            if i[12:] != j[12:]:
                frames = [df, df1]
                df2 = pd.concat(frames)
                df2.to_csv("data/expanded_combined_ann/" + i[0:11] + "_combined.csv")



In [None]:
# i = 0
# import pandas as pd
# if ( not os.path.isdir('data/expanded_predictions_rf_renamed')):
#                 os.system('mkdir data/expanded_predictions_rf_renamed')

# if ( not os.path.isdir('data/expanded_predictions_ann_renamed')):
#                 os.system('mkdir data/expanded_predictions_ann_renamed')

# with open("data/mapping") as f:
#     df = pd.read_csv("expanded_predictions_ann/"+str(i)+"_scored_ann.csv")
#     for line in f:
#         print(len(line))
#         df.to_csv("data/expanded_predictions_ann_renamed/"+line.replace(".xls",".csv"),index=False)

#     df1 = pd.read_csv("expanded_predictions_rf/"+str(i)+"_scored_rf.csv")
#     for line in f:
#         print(len(line))
#         df.to_csv("data/expanded_predictions_rf_renamed/"+line.replace(".xls",".csv"),index=False)
#     i += 1


In [None]:
if ( not os.path.isdir('data/expanded_renamed_rf')):
        os.system('mkdir data/expanded_renamed_rf')

if ( not os.path.isdir('data/expanded_renamed_ann')):
        os.system('mkdir data/expanded_renamed_ann')

if ( not os.path.isdir('data/expanded_combined_renamed_rf')):
        os.system('mkdir data/expanded_combined_renamed_rf')

if ( not os.path.isdir('data/expanded_combined_renamed_ann')):
        os.system('mkdir data/expanded_combined_renamed_ann')

In [None]:
#rename_dict = {0: 'P', 1: 'S', 2: 'W'}
rename_dict = {0: 'Sleep-Paradoxical', 1: 'Sleep-SWS', 2: 'Sleep-Wake'}
rename_dict

In [None]:
from pandas import read_csv
import pandas as pd

In [None]:
for file in os.listdir('data/expanded_predictions_ann'):
    df = read_csv('data/expanded_predictions_ann/' + file)
    y = df['0']
    new_y = []
    for i in y:
        new_y.append(rename_dict[i])
    pd.DataFrame(new_y).to_csv("data/expanded_renamed_ann/"+file,index=False)

In [None]:
for file in os.listdir('data/expanded_predictions_rf'):
    df = read_csv('data/expanded_predictions_rf/' + file)
    y = df['0']
    new_y = []
    for i in y:
        new_y.append(rename_dict[i])
    pd.DataFrame(new_y).to_csv("data/expanded_renamed_rf/"+file,index=False)

In [None]:
for file in os.listdir('data/expanded_combined_rf'):
    df = read_csv('data/expanded_combined_rf/' + file)
    y = df['0']
    new_y = []
    for i in y:
        new_y.append(rename_dict[i])
    pd.DataFrame(new_y).to_csv("data/expanded_combined_renamed_rf/"+file,index=False)

In [None]:
for file in os.listdir('data/expanded_combined_ann'):
    df = read_csv('data/expanded_combined_ann/' + file)
    y = df['0']
    new_y = []
    for i in y:
        new_y.append(rename_dict[i])
    pd.DataFrame(new_y).to_csv("data/expanded_combined_renamed_ann/"+file,index=False)

# Remap names

In [None]:
import os
mapping = open('data/mapping').read().splitlines()
i=0
os.system('mkdir data/final_ann')
for file in os.listdir('data/expanded_renamed_ann'):
    index_str = file.replace('.csv', '')
    newName = mapping[int(index_str)].replace('.xls', '-ann.csv')
    os.system('cp data/expanded_renamed_ann/' +file+" data/final_ann/'"+newName+"'")
    i+=1

In [None]:
mapping = open('data/mapping').read().splitlines()
i=0
os.system('mkdir data/final_rf')
for file in os.listdir('data/expanded_renamed_rf'):
    index_str = file.replace('.csv', '')
    newName = mapping[int(index_str)].replace('.xls', '-rf.csv')
    os.system('cp data/expanded_renamed_rf/' +file+" data/final_rf/'"+newName+"'")
    i+=1

# Import CSV Predictions into Neuroscore ZDB file


In [None]:
import os
import pandas as pd
import sqlite3
from sqlite3 import Error

def ZDBConversion(csv, zdb):
    offset = 10e7       #epoch time period

    df = pd.read_csv(csv)
    try:
        conn = sqlite3.connect(zdb)
    except Error as e:
        print(e)

    #create sqlite table from csv    
    df.to_sql('raw_csv', conn, if_exists='replace', index=False)

    #copy csv data into formatted table    
    cur = conn.cursor()
    query = """
            CREATE TABLE IF NOT EXISTS temp_csv (
                id INTEGER PRIMARY KEY AUTOINCREMENT,
                status TEXT
            );
            """
    cur.execute(query)

    query = """
            INSERT INTO temp_csv (status)
            SELECT * FROM raw_csv;
            """
    cur.execute(query)

    #drop this table - creates issues
    query = "DROP TABLE IF EXISTS temporary_scoring_marker;"
    cur.execute(query)

    #get starting point for scoring
    query = "SELECT id FROM scoring_marker WHERE type LIKE 'Sleep%';"
    cur.execute(query)
    startid = cur.fetchall()[0][0]

    #get keyid of scoring
    query = "SELECT MAX(id) FROM scoring_revision WHERE name='Machine Data'"
    cur.execute(query)
    keyid = cur.fetchall()[0][0]

    #get start time to crreate epochs
    query = 'SELECT starts_at FROM scoring_marker WHERE id = '+str(startid)+';'
    cur.execute(query)
    start_time = cur.fetchall()[0][0]
    stop_time = 0

    #delete first score before adding machine data
    query = "DELETE FROM scoring_marker WHERE id = " + str(startid)+";"
    cur.execute(query)


    #insert new epochs with scoring into the table
    for i in range(len(df)):
        #calculate epoch
        if i != 0:
            start_time = stop_time
        stop_time = start_time+offset

        #insert epoch
        query = f"""
                INSERT INTO scoring_marker 
                (starts_at, ends_at, notes, type, location, is_deleted, key_id)
                VALUES 
                ({start_time}, {stop_time}, '', '', '', 0, {keyid});
                """ 
        cur.execute(query)
        
        #get current id by selecting max id
        query = "SELECT MAX(id) from scoring_marker"
        cur.execute(query)
        currentid = cur.fetchall()[0][0]

        #set score
        query = f"""
                UPDATE scoring_marker
                SET type = (Select status
                            FROM temp_csv
                            WHERE id = {i+1})
                WHERE id = {currentid};
                """
        cur.execute(query)
    
    cur.execute("DROP TABLE temp_csv;")
    cur.execute("DROP TABLE raw_csv;")

    conn.commit()
    conn.close()
    return


In [None]:
import os

os.system('mkdir data/ZDB_ann')
i=0
for csv in os.listdir('data/expanded_renamed_ann'):
    zdb = 'data/renamedZDB/'+str(i)+'.zdb'
    new_zdb = 'data/ZDB_ann/'+str(i)+'.zdb'
    os.system("cp "+zdb+" "+new_zdb)
    csv_path = "data/expanded_renamed_ann/"+csv
    ZDBConversion(csv_path, new_zdb)
    i += 1

In [None]:
import os

os.system('mkdir data/ZDB_rf')
i=0
for csv in os.listdir('data/expanded_renamed_rf'):
    zdb = 'data/renamedZDB/'+str(i)+'.zdb'
    new_zdb = 'data/ZDB_rf/'+str(i)+'.zdb'
    os.system("cp "+zdb+" "+new_zdb)
    csv_path = "data/expanded_renamed_rf/"+csv
    ZDBConversion(csv_path, new_zdb)
    i += 1

Rename ZDBs to original names

In [None]:
os.system('mkdir data/ZDB_final_ann')
mapping = open('data/ZDBmapping').read().splitlines()

for zdb in os.listdir('data/ZDB_ann'):
    index = int(zdb.replace('.zdb', ''))
    os.system(f'cp data/ZDB_ann/"{zdb}" data/ZDB_final_ann/"{mapping[index]}"')

In [None]:
os.system('mkdir data/ZDB_final_rf')
mapping = open('data/ZDBmapping').read().splitlines()

for zdb in os.listdir('data/ZDB_rf'):
    index = int(zdb.replace('.zdb', ''))
    os.system(f'cp data/ZDB_rf/"{zdb}" data/ZDB_final_rf/"{mapping[index]}"')