# CMI-SleepState-Detection
## Child Mind Institute - Detect Sleep States
### Detect sleep onset and wake from wrist-worn accelerometer data
_______________________________________________________________________
# [Kaggle Competition](https://www.kaggle.com/competitions/child-mind-institute-detect-sleep-states/overview)
________________________________________________________________________
# Author Details:
### Name: Najeeb Haider Zaidi
### Email: zaidi.nh@gmail.com
### Profiles: [Github](https://github.com/snajeebz)  [LinkedIn](https://www.linkedin.com/in/najeebz) [Kaggle](https://www.kaggle.com/najeebz)
### License: Private, Unlicensed, All the files in this repository under any branch are Prohibited to be used commercially or for personally, communally or privately unless permitted by author in writing.
### Copyrights 2023-2024 (c) are reserved only by the author: Najeeb Haider Zaidi
________________________________________________________________________
# Attributions:
## The Dataset has been provided by Child Mind Institute. in [Kaggle Competition](https://www.kaggle.com/competitions/child-mind-institute-detect-sleep-states/overview) which the author is participating in and authorized to use the dataset solely for the competition purposes.
________________________________________________________________________

In [1]:
import pandas as pd
import numpy as np 
import seaborn as sns
import plotly.express as px
import matplotlib.pyplot as plt 
import datetime as dt
#Disable warning
import warnings
warnings.filterwarnings("ignore")

In [2]:
# Color printing
# inspired by https://www.kaggle.com/code/ravi20076/sleepstate-eda-baseline
from colorama import Fore, Style, init
from pprint import pprint
def PrintColor(text:str, color = Fore.BLUE, style = Style.BRIGHT):
    "Prints color outputs using colorama using a text F-string";
    print(style + color + text + Style.RESET_ALL);
    
# inspired by https://www.kaggle.com/code/rishabh15virgo/cmi-dss-first-impression-data-understanding-eda
def summarize_dataframe(df):
    summary_df = pd.DataFrame(df.dtypes, columns=['dtypes'])
    summary_df['missing#'] = df.isna().sum().values*100
    summary_df['missing%'] = (df.isna().sum().values*100)/len(df)
    summary_df['uniques'] = df.nunique().values
    summary_df['first_value'] = df.iloc[0].values
    summary_df['last_value'] = df.iloc[len(df)-1].values
    summary_df['count'] = df.count().values
    #sum['skew'] = df.skew().values
    desc = pd.DataFrame(df.describe().T)
    summary_df['min'] = desc['min']
    summary_df['max'] = desc['max']
    summary_df['mean'] = desc['mean']
    return summary_df

In [3]:
train_series=pd.read_parquet(path="Dataset/test_series.parquet", engine='auto')
train_events=pd.read_csv("Dataset/train_events.csv")
summarize_dataframe(train_events)

Unnamed: 0,dtypes,missing#,missing%,uniques,first_value,last_value,count,min,max,mean
series_id,object,0,0.0,277,038441c925bb,fe90110788d2,14508,,,
night,int64,0,0.0,84,1,35,14508,1.0,84.0,15.120072
event,object,0,0.0,2,onset,wakeup,14508,,,
step,float64,492300,33.933002,7499,4992.0,,9585,936.0,739392.0,214352.123944
timestamp,object,492300,33.933002,9360,2018-08-14T22:26:00-0400,,9585,,,


In [4]:
print('Info: \n',train_events.info())
print('\n Describe: \n',train_events.describe())
print('\n Head: \n',train_events.head(500))



<class 'pandas.core.frame.DataFrame'>
RangeIndex: 14508 entries, 0 to 14507
Data columns (total 5 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   series_id  14508 non-null  object 
 1   night      14508 non-null  int64  
 2   event      14508 non-null  object 
 3   step       9585 non-null   float64
 4   timestamp  9585 non-null   object 
dtypes: float64(1), int64(1), object(3)
memory usage: 566.8+ KB
Info: 
 None

 Describe: 
               night           step
count  14508.000000    9585.000000
mean      15.120072  214352.123944
std       10.286758  141268.408192
min        1.000000     936.000000
25%        7.000000   95436.000000
50%       14.000000  200604.000000
75%       21.000000  317520.000000
max       84.000000  739392.000000

 Head: 
         series_id  night   event     step                 timestamp
0    038441c925bb      1   onset   4992.0  2018-08-14T22:26:00-0400
1    038441c925bb      1  wakeup  10932.0  2018-08-15T06:41:

In [5]:
print(' Wakeup Nulls: \n',train_events[(train_events['event']=='wakeup')].isnull().count())
print('\n Wakeup Entries: \n',train_events[(train_events['event']=='wakeup')].count())
print(' \n Onset Nulls: \n',train_events[(train_events['event']=='onset')].isnull().count())
print('\n Onset Entries: \n',train_events[(train_events['event']=='onset')].count())



 Wakeup Nulls: 
 series_id    7254
night        7254
event        7254
step         7254
timestamp    7254
dtype: int64

 Wakeup Entries: 
 series_id    7254
night        7254
event        7254
step         4794
timestamp    4794
dtype: int64
 
 Onset Nulls: 
 series_id    7254
night        7254
event        7254
step         7254
timestamp    7254
dtype: int64

 Onset Entries: 
 series_id    7254
night        7254
event        7254
step         4791
timestamp    4791
dtype: int64


In [None]:
print(' Onset Nulls: \n',train_events[train_events['event']=='onset'].isnull().sum())


In [None]:
print(' Onset Nulls: \n',train_events[train_events['event']=='onset'].isnull().head(500))

In [5]:
summarize_dataframe(train_series)


Unnamed: 0,dtypes,missing#,missing%,uniques,first_value,last_value,count,min,max,mean
series_id,object,0,0.0,3,038441c925bb,0402a003dae9,450,,,
step,uint32,0,0.0,150,0,149,450,0.0,149.0,74.5
timestamp,object,0,0.0,450,2018-08-14T15:30:00-0400,2018-12-18T12:57:25-0500,450,,,
anglez,float32,0,0.0,305,2.6367,7.0299,450,-88.367996,68.460503,-56.177723
enmo,float32,0,0.0,183,0.0217,0.0081,450,0.0,0.9802,0.030276


# Observation:
- As evident from the summary and the nature of the data, it should have 

In [12]:
train_events.describe()

Unnamed: 0,night,step
count,14508.0,9585.0
mean,15.120072,214352.123944
std,10.286758,141268.408192
min,1.0,936.0
25%,7.0,95436.0
50%,14.0,200604.0
75%,21.0,317520.0
max,84.0,739392.0


In [13]:
train_series.describe()

Unnamed: 0,step,anglez,enmo
count,450.0,450.0,450.0
mean,74.5,-56.177723,0.030276
std,43.3485,39.331936,0.06795
min,0.0,-88.367996,0.0
25%,37.0,-88.216599,0.0
50%,74.5,-79.989449,0.0133
75%,112.0,-29.100624,0.03525
max,149.0,68.460503,0.9802


In [14]:
train_series.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 450 entries, 0 to 449
Data columns (total 5 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   series_id  450 non-null    object 
 1   step       450 non-null    uint32 
 2   timestamp  450 non-null    object 
 3   anglez     450 non-null    float32
 4   enmo       450 non-null    float32
dtypes: float32(2), object(2), uint32(1)
memory usage: 12.4+ KB


## Plan:
- There are two categories of data, onset and sleep. 
- We should train two models Sleep Positive/Negative and Onset Positive/Negative with probability and combine the results.
- In order to train two models, we need to separate training and create two CSV in this file.
- In the 2nd file we will create two models and train these with two different sets of the data.
- Based on the results we will decide the further plan of action.