Sensor-Augmented Egocentric-Video Captioning with Dynamic Modal Attention

by Katsuyuki Nakamura, Hiroki Ohashi, and Mitsuhiro Okada.

This repository contains the dataset of the paper "Sensor-Augmented Egocentric-Video Captioning with Dynamic Modal Attention", which is accepted to ACMMM2021.

MMAC Captions dataset

We provide a dataset called MMAC Captions for sensor-augmented egocentric-video captioning. The dataset contains 5,002 activity descriptions by extending the CMU-MMAC dataset. A number of activity description examples are shown as follows.

- Spreading tomato sauce on pizza crust with a spoon.
- Taking a fork, knife, and peeler from a drawer.
- Cutting a zucchini in half with a kitchen knife.
- Moving a paper plate slightly to the left.
- Stirring brownie batter with a fork.

We split the dataset into training, validation, and test sets, resulting in 2,923, 838, and 1,241 data for the training, validation, and test sets, respectively. Please see our paper for details.

Usage

Preparation

Make sure to download the CMU-MMAC dataset, and unzip them as follows. Wireless IMU data (6DOFv4.zip) and Wired IMU data (3DMGX1.zip) are required to perform the following pre-processing.

./data/cmu_sensor_data/
    S07_Brownie_3DMGX1/
        2794_01-30_16_30_49-time.txt
        2795_01-30_16_30_49-time.txt
        2796_01-30_16_30_49-time.txt
        3261_01-30_16_30_49-time.txt
        3337_01-30_16_30_49-time.txt
    S07_Brownie_6DOFv4/
        000666015711_01-30_16_30_30-time-synch.txt
        000666015715_01-30_16_30_30-time-synch.txt
        000666015735_01-30_16_30_30-time-synch.txt
        0006660160E3_01-30_16_30_30-time-synch.txt
    S07_Eggs_3DMGX1/
        2794_01-30_17_11_20-time.txt
        2795_01-30_17_11_20-time.txt
        2796_01-30_17_11_20-time.txt
        3261_01-30_17_11_20-time.txt
        3337_01-30_17_11_20-time.txt
    S07_Eggs_6DOFv4/
    ...

Resampling the sensor data

To resample the sensor data into 30Hz, please run:

$ cd sh; bash resampling_sensor_raw_data_default.sh

You can edit the output path settings in ./setting/setting_sensor_default.toml.

Synchronizing video and sensor data

To synchronize video and sensor data, please run:

$ cd sh; bash sensor_selected_timestamp_default.sh

This process will provide the sensor data of 63-dim sequences.

Requirements

Python 3.9.5
numpy
pandas
toml

Citation

Please consider citing our paper if it helps your research:

@inproceedings{nakamura2021sensoraugmented,
    title={Sensor-Augmented Egocentric-Video Captioning with Dynamic Modal Attention},
    author={Nakamura, Katsuyuki and Ohashi, Hiroki and Okada, Mitsuhiro},
    booktitle={ACM International Conference on Multimedia (MM)},
    year={2021},
}

Acknowledgement

The CMU-MMAC data used in this paper was obtained from http://kitchen.cs.cmu.edu/ and the data collection was funded in part by the National Science Foundation under Grant No. EEEC-0540865.

License

This software is released under the MIT License, see LICENSE.txt.

If you have questions, please contact to mmac-captions at rdgml.intra.hitachi.co.jp

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.github		.github
data		data
setting		setting
sh		sh
LICENSE.txt		LICENSE.txt
README.md		README.md
resampling_sensor_raw_data.py		resampling_sensor_raw_data.py
sensor_selected_timestamp.py		sensor_selected_timestamp.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.github

.github

data

data

setting

setting

sh

sh

LICENSE.txt

LICENSE.txt

README.md

README.md

resampling_sensor_raw_data.py

resampling_sensor_raw_data.py

sensor_selected_timestamp.py

sensor_selected_timestamp.py

Repository files navigation

Sensor-Augmented Egocentric-Video Captioning with Dynamic Modal Attention

MMAC Captions dataset

Usage

Preparation

Resampling the sensor data

Synchronizing video and sensor data

Requirements

Citation

Acknowledgement

License

About

Releases

Packages

Languages

License

hitachi-rd-cv/mmac_captions

Folders and files

Latest commit

History

Repository files navigation

Sensor-Augmented Egocentric-Video Captioning with Dynamic Modal Attention

MMAC Captions dataset

Usage

Preparation

Resampling the sensor data

Synchronizing video and sensor data

Requirements

Citation

Acknowledgement

License

About

Resources

License

Stars

Watchers

Forks

Languages