# [CMI-SleepState-Detection](https://www.kaggle.com/competitions/child-mind-institute-detect-sleep-states/overview)
## Child Mind Institute - Detect Sleep States
### Detect sleep onset and wake from wrist-worn accelerometer data
_______________________________________________________________________ 
# Author Details:
- Name: Najeeb Haider Zaidi
- Email: zaidi.nh@gmail.com
- Profiles: [Github](https://github.com/snajeebz)  [LinkedIn](https://www.linkedin.com/in/najeebz) [Kaggle](https://www.kaggle.com/najeebz)
- License: Private, Unlicensed, All the files in this repository under any branch are Prohibited to be used commercially or for personally, communally or privately unless permitted by author in writing.
- Copyrights 2023-2024 (c) are reserved only by the author: Najeeb Haider Zaidi
________________________________________________________________________
# Attributions:
The Dataset has been provided by Child Mind Institute. in [Kaggle Competition](https://www.kaggle.com/competitions/child-mind-institute-detect-sleep-states/overview) which the author is participating in and authorized to use the dataset solely for the competition purposes.
________________________________________________________________________

In [1]:
!pip install pandarallel

Collecting pandarallel
  Downloading pandarallel-1.6.5.tar.gz (14 kB)
  Preparing metadata (setup.py) ... [?25ldone
Building wheels for collected packages: pandarallel
  Building wheel for pandarallel (setup.py) ... [?25ldone
[?25h  Created wheel for pandarallel: filename=pandarallel-1.6.5-py3-none-any.whl size=16672 sha256=7059498fb145601692b530697b0cf527bb69745b49a9b9db6ab038cf306c2231
  Stored in directory: /root/.cache/pip/wheels/50/4f/1e/34e057bb868842209f1623f195b74fd7eda229308a7352d47f
Successfully built pandarallel
Installing collected packages: pandarallel
Successfully installed pandarallel-1.6.5


In [2]:
import numpy as np # linear algebra
import pandas as pd# data processing, CSV file I/O (e.g. pd.read_csv)
import seaborn as sns
from pandarallel import pandarallel
import plotly.express as px
import matplotlib.pyplot as plt
from datetime import datetime as dts
pd.set_option('display.max_row', 500)
# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory
import warnings
warnings.filterwarnings("ignore")
pd.set_option('display.max_colwidth', None)
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))
        
from tqdm import tqdm
tqdm.pandas()
pandarallel.initialize(progress_bar=True)


/kaggle/input/test-train-3-var-no-ts/X_test.pkl
/kaggle/input/test-train-3-var-no-ts/y_train.pkl
/kaggle/input/test-train-3-var-no-ts/X_train.pkl
/kaggle/input/test-train-3-var-no-ts/y_test.pkl
/kaggle/input/later-data/Full_merged.pkl
/kaggle/input/later-data/full_clustered .pkl
/kaggle/input/later-data/cluster_model.joblib
/kaggle/input/train-series-modified/df_mod.parquet
/kaggle/input/child-mind-institute-detect-sleep-states/train_series.parquet
/kaggle/input/child-mind-institute-detect-sleep-states/sample_submission.csv
/kaggle/input/child-mind-institute-detect-sleep-states/train_events.csv
/kaggle/input/child-mind-institute-detect-sleep-states/test_series.parquet
INFO: Pandarallel will run on 2 workers.
INFO: Pandarallel will use Memory file system to transfer data between the main process and workers.


## In order to squeez processin 12hrs window, parallel processing will help.

# Reading the train series 

In [None]:
df=pd.read_parquet(path="/kaggle/input/train-series-modified/df_mod.parquet", parseengine='auto')

# Reading the Train Events.

### As per the dataset description both work together.
- Train series is the series of the datasteps, timestamp, enmo and anglez for multiple serieses.
- Train event describes which step triggered an event wakeup/onset.

### Steps:
- Target one shall be the merger of the two. in a single dataframe. So each recorded step should have status in it whether the subject was sleeping or awake.
- In addition to that we need to workout the timestamps. either change them to the unix style and/or change these to the datetime column.
- Once done we can figure out the correlation between the colums to identify the training features, for that we might want to take a random dataframe out of the series, So we can have a look at it quickly.


In [None]:
train_events=pd.read_csv("/kaggle/input/child-mind-institute-detect-sleep-states/train_events.csv")

### Function to convert the time to unix timestamp

In [None]:
#'2018-08-14T15:30:00-0400'
#"2022-04-07T08:53:42.06717+02:00"
def tscv(dt):
    d=dts.strptime(dt, "%Y-%m-%dT%H:%M:%S%z")
    #d = dts.fromisoformat(dt)
    ts=dts.timestamp(d)
    #print('=', end ="")
    return ts

In [None]:
#df['ts']=df['timestamp'].parallel_apply(lambda x: tscv(x))
#df.sort_values(by=['series_id', 'ts','step'])
#df.to_parquet('df_mod.parquet')


## Investigating around onset event

In [None]:
df.index[(df['series_id']=='038441c925bb') & (df['timestamp']=="2018-08-14T22:26:00-0400")]

In [None]:
df['anglez'][df['step']==1]

In [None]:
print("Mean enmo before Event: ",df['enmo'].loc[0:4992].mean())
print("Mean enmo between Events: ",df['enmo'].loc[4992:10932].mean())
print('Time start: ', df['timestamp'].loc[4992]," Time End: ",df['timestamp'].loc[10932] )
print("Total Time Difference in hrs: ", (df['ts'].loc[10932] - df['ts'].loc[4992])/3600)

In [None]:
print(df.loc[(4992-50):(4992+50)])

## Investigating around wakeup event

In [None]:
df['anglez'][(df['series_id']=='038441c925bb') & (df['timestamp']=="2018-08-15T06:41:00-0400")] #wakeup event recorded precisely at this time

In [None]:
px.line(x=df['step'].loc[10932-50:10932+50], y=df['anglez'].loc[10932-50:10932+50])

In [None]:
plt.plot(df['step'].loc[10932-50:10932+50], df['enmo'].loc[10932-50:10932+50])

## Separating the events colums to merge with series.

In [None]:
#train_events['step'][train_events['step'].isna()==False]=train_events['step'][train_events['step'].isna()==False].astype(int)
#train_events['step'].isnull().sum()
#train_events_p=train_events.dropna()
events=train_events[['series_id', 'step','event']]




### Merging the two dataframes

In [None]:
m_df=pd.merge(df,events,on=["step","series_id"],how='left')

# Function to Change the time to utc and datetime

In [None]:
m_df.head()

In [None]:
x=pd.to_datetime(m_df['ts'].loc[4992])
x

In [3]:
df=pd.read_pickle('/kaggle/input/later-data/full_clustered .pkl')

In [None]:
pandarallel.initialize(progress_bar=True)

#m_df['timstamp']=m_df['timestamp'].parallel_apply(lambda x: pd.to_datetime(x))

## Creating another column sleep for classification of the steps.

### Filling sleep =1 and wakeup=0 for the column

## Between onset and wakeup status, the subject should be sleeping.

### That will leave first 4991 values nan, so filling these with 0.0

In [None]:
m_df['sleep']=np.nan
m_df.loc[m_df["event"]=="onset", "sleep"] = 1
m_df.loc[m_df["event"]=="wakeup", "sleep"] = 0
m_df['sleep'].fillna(method='ffill', inplace=True)
m_df['sleep'].fillna(value=0.0, inplace=True)
m_df['sleep'].mean()

In [None]:
m_df.loc[4900:5000]

In [None]:
m_df.info()

In [None]:
m_df[['step','enmo','ts','sleep','anglez']].corr()


In [None]:
train_data=m_df[['series_id','step','enmo','timestamp','sleep','anglez']].iloc[:20000000,:]
train_data.info()


In [None]:
figure= px.imshow(train_data[['step','enmo','ts','sleep','anglez']].corr(), text_auto=True, width=1200, height=1200)
figure.show()

In [None]:
figure= px.imshow(m_df[['step','enmo','ts','sleep','anglez']].corr(), text_auto=True, width=1200, height=1200)
figure.show()

In [None]:
train_data['timestamp']=train_data['timestamp'].parallel_apply(lambda x: pd.to_datetime(x))

In [None]:
train_data.to_pickle('train_data.pkl')

In [None]:
train_data

In [None]:
import plotly.graph_objects as go
from plotly.subplots import make_subplots
fig = make_subplots(specs=[[{"secondary_y": True}]])
fig.add_trace(
    go.Scatter(x=df['step'][df['series_id']=='038441c925bb'], y=df['enmo'][df['series_id']=='038441c925bb'], name="ENMO"),
    secondary_y=False,
)
fig.add_trace(
    go.Scatter(x=df['step'][df['series_id']=='038441c925bb'], y=df['onset'][df['series_id']=='038441c925bb'], name="Onset"),
    secondary_y=True,
)
fig.show()