MM-office dataset

MM-office is a multi-view and multi-modal dataset in an office environment (MM-Office) that records events, e.g., 'enter' to the office room, 'sit down' on the chair, and 'take out' something from a shelf, in the room assuming the daily work. These events are recorded simultaneously using eight non-directional microphones and four cameras. The audio and video clips are divided into scenes, each about 30 to 90 seconds. The amount of data was 880 clips per point and sensor. The labels available for training are given as multi-labels that indicate which each clip contains what event. Only the test data is annotated with a strong label containing the onset/offset time of each event.

Download

You can download the dataset here.

Details of dataset

The dataset has following folder structure:

MM_Office_Dataset
├── audio
│   ├── test
│   └── train
├── video
│   ├── test
│   └── train
└──label
    ├── testlabel
    └── trainlabel
        ├── classinfo.csv
        ├── eventinfo.csv
        └── recinfo.csv

audio/video

Audio and video were recorded synchronously using four cameras (GoPro HERO8) and eight non-directional microphones (HOSIDEN KUB4225) installed in the office, as shown in the room setup figure below. The audio was recorded at 48kHz/16bit. The video was recorded at 1920×1440/30fps, and then resized to 480

The naming convention for these recordings is as follows.

split[split index]_id[sensor index]_s[scene index]_recid[recording id]_[division].[wav or mp4]

The MM-Office dataset is split into 10 splits for convenience, and the split index (0 to 9) is the index of that. The sensor index is the sensor number of the camera and microphone and corresponds to the room setup figure above (but starts with 0). The scene index is an index that shows the scenario pattern of actions performed by the actors. Refer to eventinfo.csv to see what kind of actions and events each scene contains. The recording id is the serial number of the recording, but after recording, we decided to split each recording in half to make a single clip, so each recording id has two duplicates. The division indicates this, where the first half is 0 and the second half is 1.

label

testlabel

index	eventclass	starttime	endtime
0	8	6	14
1	11	20	35

recinfo.csv

recid	sceneid	patternid
0	1	1
...
679	11	1

eventinfo.csv

sceneid	patternid	division	class1	class2	class3	class4	...	class12
5	1	0	0	0	0	1		0
...							...
3	6	1	0	1	0	0		1

classinfo.csv

It contains the event name (e.g. 'stand up,' 'phone') of each of the event classes shown in eventinfo.csv and a description of what kind of event it is.

Requirements for sample_data_loader.py

The following is the execution environment in which the expected operation of this program has been verified.

OS

Ubuntu 22.04.4 LTS

GPU environment

NVIDIA GPU V100 32GB (x4)
NVIDIA Driver Version == 470.239.06
CUDA Version == 10.1

Python library

pytorch == 1.7.1
torchaudio == 0.7.2
torchvision == 0.8.2
numpy == 1.18.1
pandas == 1.0.0
glob2 == 0.7
tqdm == 4.42.0

How to use sample_data_loader

Prepare the above environment
Download and put MM-Office Dataset at mm-office/
Run . training.sh

License

See this license file.

Authors and Contact

Masahiro Yasuda (Email: masahiro.yasuda@ieee.org)
Yasunori Ohishi
Shoichiro Saito
Noboru Harada

Citing this work

If you'd like to cite this work, you may use the following.

Masahiro Yasuda, Yasunori Ohishi, Shoichiro Saito, Noboru Harada “Multi-view and Multi-modal Event Detection Utilizing Transformer-based Multi-sensor fusion,” in IEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP), 2022.

Link

Paper: arXiv

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
utils		utils
.gitignore		.gitignore
LICENSE.pdf		LICENSE.pdf
README.md		README.md
config.py		config.py
sample_data_loader.py		sample_data_loader.py
testlist.csv		testlist.csv
training.sh		training.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MM-office dataset

Download

Details of dataset

audio/video

label

testlabel

recinfo.csv

eventinfo.csv

classinfo.csv

Requirements for sample_data_loader.py

OS

GPU environment

Python library

How to use sample_data_loader

License

Authors and Contact

Citing this work

Link

About

Releases

Packages

Languages

nttrd-mdlab/mm-office

Folders and files

Latest commit

History

Repository files navigation

MM-office dataset

Download

Details of dataset

audio/video

label

testlabel

recinfo.csv

eventinfo.csv

classinfo.csv

Requirements for sample_data_loader.py

OS

GPU environment

Python library

How to use sample_data_loader

License

Authors and Contact

Citing this work

Link

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages