This is an official implementation of MICCAI 2022 paper Weakly Supervised Online Action Detection for Infant General Movements.
The code is developed on Ubuntu 18.04.6. with NVIDIA 3090 GPUs. Other platforms or GPU cards are not fully tested.
To reproduce our python environment by conda
, one can simply using the following command:
conda env create -f environment.yaml
If you have already prepared python 3.7, you can also reproduce environment by pip
:
pip install -r requirements.txt
git clone https://github.com/scofiedluo/WO-GMA.git
cd WO-GMA
Our model takes processed skeleton squences as input. You could extract skeletons by OpenPose, and preprocess by this repository.
-
skeleton data
A
.npy
file with dimension$B \times C \times T \times N$ , where$B$ is the number of videos,$C$ is the number of feature channels,$T$ is the length of skeleton sequence, and$N$ is the number of skeleton vertices. In our setting,$C = 3$ ,$T =6000$ ,$N = 18$ . -
video-level labels
A
.pkl
tuple file contains two lists, that is(video_name list, label list)
. Here the index of each video is same as keleton data. -
video info
.csv
file to record all video information in dataset. This file need to include some necessary fields:
video_name | nframes | duration (second) | label | FPS | width | width |
---|---|---|---|---|---|---|
string | int, number of frames in original video | double, time duration of video | string, video-level label | double | frame width | frame width |
- data split
.csv
files to split you dataset into different part. These files need to include some necessary fields:
video_name | nframes | duration (second) | label | FPS | index |
---|---|---|---|---|---|
string | int, number of frames in original video | double, time duration of video | string, video-level label | double | int, video index in skeleton data |
- frame-level labels
.csv
files record frame-level annotations. This file need to include some necessary fields:
video_name | nframes | duration (second) | label | FPS | segments_frame | segments_time |
---|---|---|---|---|---|---|
string | int, number of frames in original video | double, time duration of video | string, video-level label | double | frame annotation in one video, looks like "[(228, 2834), (2983, 3233)]" |
convert frame annotation into time axis, looks like "[(9.13, 113.39), (119.35, 129.33)]" |
Note: All video names should be unique.
python main.py --experiment 'train_eval' --device {your GPU id, default 0} \
--data_path {path to your skeleton data} \
--label_path {path to your video level label} \
--video_info_path {path to your video information file} \
--split_train_csv_path {path to training split} \
--split_test_csv_path {path to testing split} \
--frame_annotations {path to your frame-level labels}
python inference.py --work-dir 'inference_results/' \
--experiment 'test_dev' --device {your GPU id, default 0} \
--checkpoint {path to your trained model checkpoint} \
--data_path {path to your skeleton data} \
--label_path {path to your video level label} \
--video_info_path {path to your video information file} \
--split_test_csv_path {path to testing split} \
--frame_annotations {path to your frame-level labels}
If you find this work or code is helpful in your research, please cite:
@inproceedings{luo2022weakly,
title={Weakly Supervised Online Action Detection for Infant General Movements},
author={Luo, Tongyi and Xiao, Jia and Zhang, Chuncao and Chen, Siheng and Tian, Yuan and Yu, Guangjun and Dang, Kang and Ding, Xiaowei},
booktitle={International Conference on Medical Image Computing and Computer-Assisted Intervention},
pages={721--731},
year={2022},
organization={Springer}
}
Please email scofiedluo@gmail.com
for further questions.