# Exploratory Data Analysis for Sleep-Accel dataset.

Steps for downloading, preparing and exploring the Sleep Accel dataset. The dataset documentation and files can be found <a href="https://physionet.org/content/sleep-accel/1.0.0/steps/#files-panel">here</a>.


## Download and prepare the data.

Use the `./data/download.sh` script to automatically download and prepare the raw dataset. For ease of use, the script reorganizes the downloaded dataset using the following layout:

```
sleep-accel
└───user_id1
│   │   heartrate.txt
│   │   acceleration.txt
│   │   steps.txt
|   |   labels.txt
└───user_id2
│   │   heartrate.txt
│   │   acceleration.txt
│   │   steps.txt
|   |   labels.txt
└─── ...
```

The script executes three distinct stages/commands: `download`, `prepare` and `cleanup`.  Run `./data/download.sh --help` to see the usage instructions.

In [1]:
! ../data/download.sh --data-dir ../data --help

Download and process the PhysioNet Sleep-Accel dataset
Usage: ../data/download.sh [--data-dir PATH] [COMMAND]
Commands:
  download   Download the necessary files.
  prepare    Prepare the downloaded files.
  cleanup    Remove temporary files.
  -h, --help Display this help message.
Options:
  --data-dir PATH   Specify the data directory (default: ./data).


## Setup

Get user IDs, define functions for loading user data.

In [8]:
from typing import List
from datetime import datetime

import numpy as np
import pandas as pd
from pathlib import Path

from matplotlib import pyplot as plt
import seaborn as sns

from nightwatch.data.sleep_accel import SleepAccel

data = SleepAccel("../data/sleep-accel")
data.user_ids

['1449548',
 '9618981',
 '1360686',
 '3509524',
 '5383425',
 '2638030',
 '7749105',
 '8692923',
 '3997827',
 '759667',
 '8173033',
 '46343',
 '8686948',
 '5498603',
 '5132496',
 '1455390',
 '1066528',
 '4426783',
 '8530312',
 '5797046',
 '4018081',
 '2598705',
 '8000685',
 '1818471',
 '844359',
 '6220552',
 '9106476',
 '4314139',
 '781756',
 '8258170',
 '9961348']

Plot some data for a random user ID:

In [13]:
user_id = "3509524"
acc = data.load_user_acc(user_id)
steps = data.load_user_steps(user_id)
bp = data.load_user_heartrate(user_id)
labels = data.load_user_labels(user_id)

user_data = data.load_user_data(user_id)
print(user_data)

UserData(labels=     timestamp  label
0            0      0
1           30      0
2           60      0
3           90      0
4          120      0
..         ...    ...
412      12360      2
413      12390      2
414      12420      2
415      12450      2
416      12480      2

[417 rows x 2 columns], motion=           timestamp     acc_x     acc_y     acc_z
100551      0.002704  0.032333 -0.457611 -0.888519
100552      0.017983  0.027435 -0.462997 -0.887161
100553      0.032728  0.023880 -0.471771 -0.879440
100554      0.047828  0.025223 -0.465408 -0.874557
100555      0.077778  0.026062 -0.460510 -0.868713
...              ...       ...       ...       ...
924637  12479.937852 -0.107315 -0.197815 -0.967606
924638  12479.953821 -0.105362 -0.197327 -0.968048
924639  12479.968439 -0.104874 -0.195862 -0.968048
924640  12479.983591 -0.105362 -0.196350 -0.968048
924641  12479.998345 -0.106293 -0.197815 -0.970016

[824091 rows x 4 columns], heartrate=        timestamp    bp
1328      2.97

In [49]:
_compute_activity_counts(user_data.motion)

array([[2.70414352e-03, 0.00000000e+00],
       [1.50207494e+01, 2.45600000e+01],
       [3.00387947e+01, 1.53500000e+01],
       ...,
       [1.24499623e+04, 0.00000000e+00],
       [1.24649803e+04, 0.00000000e+00],
       [1.24799983e+04, 0.00000000e+00]])