# Load data from Smart-Kages
Load data from the Smart-Kages folder structure.

In [1]:
from pathlib import Path

from smart_kages_movement.io import parse_data_into_df

## Summarise data into a single dataframe

First let's define the path to the folder containing all the data.

In [2]:
data_dir = Path.home() / "Data" / "Smart-Kages"
assert data_dir.exists(), f"Data directory {data_dir} does not exist."

The data is stored per Smart-Kage, in folders names as `kageN`, e.g. `kage1`, `kage2`, etc.

Each Smart-Kage folder contains:
- daily videos are stored in `videos/YYYY/MM/DD/`, split into 1-hour segments. Each 1-hour segment is an `.mp4` file named `kageN_YYYYMMDD_HHMMSS.mp4`.
- corresponding DeepLabCut (DLC) predictions are stored in `analysis/dlc_output/YYYY/MM/DD/`. Each 1-hour `.h5` file therein is prefixed with `kageN_YYYYMMDD_HHMMSS`.

Let's parse the relevant parts of the data structure into a single dataframe.

In [3]:
df = parse_data_into_df(data_dir)

Found 2 kage directories:  kage1 kage3
Found a total of 1615 .h5 pose files output by DLC.


In [4]:
df.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,date,time,pose_file_path,video_exists,video_file_path
kage,datetime,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
kage1,2024-04-03 09:54:20,20240403,95420,/Users/nsirmpilatze/Data/Smart-Kages/kage1/ana...,True,/Users/nsirmpilatze/Data/Smart-Kages/kage1/vid...
kage1,2024-04-03 10:00:02,20240403,100002,/Users/nsirmpilatze/Data/Smart-Kages/kage1/ana...,True,/Users/nsirmpilatze/Data/Smart-Kages/kage1/vid...
kage1,2024-04-03 11:01:03,20240403,110103,/Users/nsirmpilatze/Data/Smart-Kages/kage1/ana...,True,/Users/nsirmpilatze/Data/Smart-Kages/kage1/vid...
kage1,2024-04-03 12:01:04,20240403,120104,/Users/nsirmpilatze/Data/Smart-Kages/kage1/ana...,True,/Users/nsirmpilatze/Data/Smart-Kages/kage1/vid...
kage1,2024-04-03 13:01:03,20240403,130103,/Users/nsirmpilatze/Data/Smart-Kages/kage1/ana...,True,/Users/nsirmpilatze/Data/Smart-Kages/kage1/vid...
