A Video-based End-to-end Pipeline for Non-nutritive Sucking Action Recognition and Segmentation in Young Infants
This is the official code repository for our paper, introducing a computer vision based detector for infant non-nutritive sucking (NNS):
Zhu, S., Wan, M., Hatamimajoumerd, E., Jain, K., Zlota, S., Kamath, C.V., Rowan, C.B., Grace, E., Goodwin, M.S., Hayes, M.J., Schwartz-Mette, R.A., Zimmerman, E., Ostadabbas, S., "A Video-based End-to-end Pipeline for Non-nutritive Sucking Action Recognition and Segmentation in Young Infants." MICCAI'23. Preprint, arXiv:2303.16867 [cs.CV], 2023. [arXiv link]
Zhu, S., Wan, M., Manne, S. K. R., Hatamimajoumerd, E., Hayes, M. J., Zimmerman, E., & Ostadabbas, S. (2024). Subtle signals: Video-based detection of infant non-nutritive sucking as a neurodevelopmental cue. Computer Vision and Image Understanding, 104081. [paper link]
A quick-start guide is available for those who just want to get predicted NNS segmentation results from a video, using our pretrained model, with no machine learning training necessary.
- License & Patents
- Background: NNS & Action Segmentation
- Quick-Start: Obtain NNS Segmentation Predictions
- NNS Datasets
- Advanced: Full Pipeline Training & Evaluation
- BibTeX Citation
Fig. 1. Top: NNS signal extracted from a pressure transducer pacifier device. Bottom: Image samples from our training and testing datasets.
This code is for non-commercial purpose only. For other uses please contact ACLab of NEU.
This work is related to the following invention disclosures:
- Ostadabbas, S. and Zimmerman, Emily. 2021. An AI-Guided Contact-less Non-Nutrative Suck Monitoring System. Invention Disclosure INV-22095.
- Ostadabbas, S., Zimmerman, E., Huang, X., Wan, M. 2022. Infant Facial Landmark Estimation. Invention Disclosure INV-22104.
Non-nutritive sucking (NNS) is an infant oral sucking pattern characterized by the absence of nutrient delivery. NNS reflects neural and motor development in early life and may reduce the risk of SIDS, the leading cause of death for US infants aged 1-12 months. However, studying the relationship between NNS patterns and breathing, feeding, and arousal during sleep has been challenging due to the difficulty of measuring the NNS signal. Current transducer-based approaches are effective, but expensive, limited to research use, and may affect the sucking behavior.
We present an end-to-end computer vision system to recognize and segment NNS actions from lengthy videos, enabling applications in automatic screening and telehealth, with a focus on high precision to enable periods of sucking activity to be reliably extracted for analysis by human experts. Our NNS activity segmentation algorithm predicts start and end timestamps for NNS activity, with high certainty—in our testing, up to 94.0% average precision and 84.9% average recall across 30 heterogeneous 60 s clips, drawn from our manually annotated NNS clinical in-crib dataset of 183 hours of overnight baby monitor footage from 19 infants.
Follow these steps to obtain NNS segmentation predictions from our model on an input infant video. We provide the pretrained model, and the inference can be done on CPU, so the main prerequisite is a modern Python installation. The output is a demo video illustrating the NNS predictions.
- Installing the required Python libraries:
pip3 install -r requirements.txt
- Set up requirements for the video preprocessing pipeline, which includes importing and optical flow package:
cd preprocessing
git clone https://github.com/pathak22/pyflow.git
cd pyflow/
python3 setup.py build_ext -i
cd ../..
Windows users: Open pyflow\src\project.h
and comment out Line 9 (i.e., replace it with //#define _LINUX_MAC
).
-
Download the model weights from here and unzip the .pth file into your preferred directory.
-
Inference and visualization can be performed on a single video with the following command (after substituting the appropriate
PATH_TO_
variables):
python3 inference.py --input_clip_path PATH_TO_INPUT_VIDEO_FILE --output_video_path PATH_TO_OUTPUT_RESULT_VIDEO --model_weights PATH_TO_MODEL_WEIGHT_FILE
We present two new datasets in our work:
- the private NNS clinical in-crib dataset, consisting of 183 h of nighttime in-crib baby monitor footage collected from 19 infants and annotated for NNS activity and pacifier use by our interdisciplinary team of behavioral psychology and machine learning researchers, and
- the public NNS in-the-wild dataset [download], consisting of 10 naturalistic infant video clips annotated for NNS activity.
Fig. 1 shows sample frames from both datasets. Annotations include start and end timestamps for NNS and pacifier events, collected with the VGG VIA software and saved in their format. See our paper for collection details.
Fig. 2. Our global NNS action segmentation pipeline (a), built on our local NNS action recognition (classification) module (b).
The above figure illustrates our pipeline for NNS action segmentation, which takes in long-form videos of infants using pacifiers and predicts timestamps for NNS events in the entire video. We first cover the long video with short sliding windows, then apply our NNS action recognition module to obtain a classification for NNS vs non-NNS (or the confidence score), and aggregate the output classes (or scores) into a segmentation result consisting of predicted start and end timestamps.
The training and evaluation process of the NNS action recognition model includes preprocessing the data and training the model.
The data preprocessing includes the following steps:
- Trimming the long recordings (10 hr+) into short video clips (e.g. 2.5 s) for the classifier input.
- Cropping the full scale frames in the short video clips to generate interested-area-only videos.
- Converting all cropped videos into optical flow videos.
Details of the operations above can be found in the subfolder preprocessing.
Note: This yields the pretrained model, whose weights can also be downloaded directly here.
The goal of the pipeline is to train a cnn-lstm based model for classifying the NNS vs Non-NNS actions.
The training data are the optical flow of the area-of-interest-only short videos generated from the preprocessing section.
The training data setup details and training instructions are in the subfolder cnn_lstm.
To get the segmentation results, we can run the following script:
python3 segmentation.py --opt_flow_dir PATH_TO_OPTICAL_FLOW_DIR --raw_vid_dir PATH_TO_RAW_VIDEOS --results_dir DIR_TO_SAVE_RESULTS \
--model_weights PATH_TO_MODEL_WEIGHT_FILE --sample_dur FRAME_LENGTH_OF_WINDOW --window_stride SLIDING_WINDOW_STRIDE
To evaluate the NNS segmentation results on the datasets, we can run the following script:
python3 evaluation.py --datasets in-wild --agg_methods simple average --result_dir <PATH_TO_SEGMENTATION_NPY_FILES> --num_frames 600 --num_windows 116 --window_size 25 --window_stride 5
Here is the BibTeX citation for our paper:
@misc{zhu_video-based_2023,
title = {A {Video}-based {End}-to-end {Pipeline} for {Non}-nutritive {Sucking} {Action} {Recognition} and {Segmentation} in {Young} {Infants}},
url = {http://arxiv.org/abs/2303.16867},
author = {Zhu, Shaotong and Wan, Michael and Hatamimajoumerd, Elaheh and Jain, Kashish and Zlota, Samuel and Kamath, Cholpady Vikram and Rowan, Cassandra B. and Grace, Emma C. and Goodwin, Matthew S. and Hayes, Marie J. and Schwartz-Mette, Rebecca A. and Zimmerman, Emily and Ostadabbas, Sarah},
booktitle={International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI) \textbf{(acceptance rate: 32\%)}},
month = mar,
year = {2023},
note = {arXiv:2303.16867 [cs]}
}
@article{zhu2024subtle,
title={Subtle signals: Video-based detection of infant non-nutritive sucking as a neurodevelopmental cue},
author={Zhu, Shaotong and Wan, Michael and Manne, Sai Kumar Reddy and Hatamimajoumerd, Elaheh and Hayes, Marie J and Zimmerman, Emily and Ostadabbas, Sarah},
journal={Computer Vision and Image Understanding},
pages={104081},
year={2024},
publisher={Elsevier}
}