A Video-based End-to-end Pipeline for Non-nutritive Sucking Action Recognition and Segmentation in Young Infants

This is the official code repository for our paper, introducing a computer vision based detector for infant non-nutritive sucking (NNS):

Zhu, S., Wan, M., Hatamimajoumerd, E., Jain, K., Zlota, S., Kamath, C.V., Rowan, C.B., Grace, E., Goodwin, M.S., Hayes, M.J., Schwartz-Mette, R.A., Zimmerman, E., Ostadabbas, S., "A Video-based End-to-end Pipeline for Non-nutritive Sucking Action Recognition and Segmentation in Young Infants." MICCAI'23. Preprint, arXiv:2303.16867 [cs.CV], 2023. [arXiv link]

A quick-start guide is available for those who just want to get predicted NNS segmentation results from a video, using our pretrained model, with no machine learning training necessary.

License & Patents

This code is for non-commercial purpose only. For other uses please contact ACLab of NEU.

This work is related to the following invention disclosures:

Ostadabbas, S. and Zimmerman, Emily. 2021. An AI-Guided Contact-less Non-Nutrative Suck Monitoring System. Invention Disclosure INV-22095.
Ostadabbas, S., Zimmerman, E., Huang, X., Wan, M. 2022. Infant Facial Landmark Estimation. Invention Disclosure INV-22104.

Background: NNS & Action Segmentation

Non-nutritive sucking (NNS) is an infant oral sucking pattern characterized by the absence of nutrient delivery. NNS reflects neural and motor development in early life and may reduce the risk of SIDS, the leading cause of death for US infants aged 1-12 months. However, studying the relationship between NNS patterns and breathing, feeding, and arousal during sleep has been challenging due to the difficulty of measuring the NNS signal. Current transducer-based approaches are effective, but expensive, limited to research use, and may affect the sucking behavior.

We present an end-to-end computer vision system to recognize and segment NNS actions from lengthy videos, enabling applications in automatic screening and telehealth, with a focus on high precision to enable periods of sucking activity to be reliably extracted for analysis by human experts. Our NNS activity segmentation algorithm predicts start and end timestamps for NNS activity, with high certainty—in our testing, up to 94.0% average precision and 84.9% average recall across 30 heterogeneous 60 s clips, drawn from our manually annotated NNS clinical in-crib dataset of 183 hours of overnight baby monitor footage from 19 infants.

Quick-Start: Obtaining NNS Segmentation Predictions

Follow these steps to obtain NNS segmentation predictions from our model on an input infant video. We provide the pretrained model, and the inference can be done on CPU, so the main prerequisite is a modern Python installation. The output is a demo video illustrating the NNS predictions.

Installing the required Python libraries:

pip3 install -r requirements.txt

Set up requirements for the video preprocessing pipeline, which includes importing and optical flow package:

cd preprocessing
git clone https://github.com/pathak22/pyflow.git
cd pyflow/
python3 setup.py build_ext -i
cd ../..

Windows users: Open pyflow\src\project.h and comment out Line 9 (i.e., replace it with //#define _LINUX_MAC).

Download the model weights from here and unzip the .pth file into your preferred directory.
Inference and visualization can be performed on a single video with the following command (after substituting the appropriate PATH_TO_ variables):

python3 inference.py --input_clip_path PATH_TO_INPUT_VIDEO_FILE --output_video_path PATH_TO_OUTPUT_RESULT_VIDEO --model_weights PATH_TO_MODEL_WEIGHT_FILE

NNS Datasets

We present two new datasets in our work:

the private NNS clinical in-crib dataset, consisting of 183 h of nighttime in-crib baby monitor footage collected from 19 infants and annotated for NNS activity and pacifier use by our interdisciplinary team of behavioral psychology and machine learning researchers, and
the public NNS in-the-wild dataset [download], consisting of 10 naturalistic infant video clips annotated for NNS activity.

Fig. 1 shows sample frames from both datasets. Annotations include start and end timestamps for NNS and pacifier events, collected with the VGG VIA software and saved in their format. See our paper for collection details.

Advanced: Full Pipeline Training & Evaluation

Fig. 2. Our global NNS action segmentation pipeline (a), built on our local NNS action recognition (classification) module (b).

The above figure illustrates our pipeline for NNS action segmentation, which takes in long-form videos of infants using pacifiers and predicts timestamps for NNS events in the entire video. We first cover the long video with short sliding windows, then apply our NNS action recognition module to obtain a classification for NNS vs non-NNS (or the confidence score), and aggregate the output classes (or scores) into a segmentation result consisting of predicted start and end timestamps.

NNS Action Recognition

The training and evaluation process of the NNS action recognition model includes preprocessing the data and training the model.

Preprocessing

The data preprocessing includes the following steps:

Trimming the long recordings (10 hr+) into short video clips (e.g. 2.5 s) for the classifier input.
Cropping the full scale frames in the short video clips to generate interested-area-only videos.
Converting all cropped videos into optical flow videos.

Details of the operations above can be found in the subfolder preprocessing.

Action Recognition Model Training

Note: This yields the pretrained model, whose weights can also be downloaded directly here.

The goal of the pipeline is to train a cnn-lstm based model for classifying the NNS vs Non-NNS actions.

The training data are the optical flow of the area-of-interest-only short videos generated from the preprocessing section.

The training data setup details and training instructions are in the subfolder cnn_lstm.

NNS Segmentation

NNS Segmentation Predictions

To get the segmentation results, we can run the following script:

python3 segmentation.py --opt_flow_dir PATH_TO_OPTICAL_FLOW_DIR --raw_vid_dir PATH_TO_RAW_VIDEOS --results_dir DIR_TO_SAVE_RESULTS \
--model_weights PATH_TO_MODEL_WEIGHT_FILE --sample_dur FRAME_LENGTH_OF_WINDOW --window_stride SLIDING_WINDOW_STRIDE

Evaluating Segmentation Predictions

To evaluate the NNS segmentation results on the datasets, we can run the following script:

python3 evaluation.py --datasets in-wild --agg_methods simple average --result_dir <PATH_TO_SEGMENTATION_NPY_FILES> --num_frames 600 --num_windows 116 --window_size 25 --window_stride 5

BibTeX Citation

Here is the BibTeX citation for our paper:

@misc{zhu_video-based_2023,
	title = {A {Video}-based {End}-to-end {Pipeline} for {Non}-nutritive {Sucking} {Action} {Recognition} and {Segmentation} in {Young} {Infants}},
	url = {http://arxiv.org/abs/2303.16867},
	author = {Zhu, Shaotong and Wan, Michael and Hatamimajoumerd, Elaheh and Jain, Kashish and Zlota, Samuel and Kamath, Cholpady Vikram and Rowan, Cassandra B. and Grace, Emma C. and Goodwin, Matthew S. and Hayes, Marie J. and Schwartz-Mette, Rebecca A. and Zimmerman, Emily and Ostadabbas, Sarah},
	booktitle={International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI) \textbf{(acceptance rate: 32\%)}},
	month = mar,
	year = {2023},
	note = {arXiv:2303.16867 [cs]}
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
cnn_lstm		cnn_lstm
preprocessing		preprocessing
readme		readme
.gitignore		.gitignore
evaluation.py		evaluation.py
inference.py		inference.py
metadata.py		metadata.py
readme.md		readme.md
requirements.txt		requirements.txt
segmentation.py		segmentation.py

ostadabbas/NNS-Detection-and-Segmentation

Folders and files

Latest commit

History

Repository files navigation

A Video-based End-to-end Pipeline for Non-nutritive Sucking Action Recognition and Segmentation in Young Infants

Table of contents

License & Patents

Background: NNS & Action Segmentation

Quick-Start: Obtaining NNS Segmentation Predictions

NNS Datasets

Advanced: Full Pipeline Training & Evaluation

NNS Action Recognition

Preprocessing

Action Recognition Model Training

NNS Segmentation

NNS Segmentation Predictions

Evaluating Segmentation Predictions

BibTeX Citation

About

Resources

Stars

Watchers

Forks

Languages