Skip to content

MM-Office: multi-view and multi-modal dataset in an office environment

Notifications You must be signed in to change notification settings

nttrd-mdlab/mm-office

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MM-office dataset

DOI

MM-office is a multi-view and multi-modal dataset in an office environment (MM-Office) that records events, e.g., 'enter' to the office room, 'sit down' on the chair, and 'take out' something from a shelf, in the room assuming the daily work. These events are recorded simultaneously using eight non-directional microphones and four cameras. The audio and video clips are divided into scenes, each about 30 to 90 seconds. The amount of data was 880 clips per point and sensor. The labels available for training are given as multi-labels that indicate which each clip contains what event. Only the test data is annotated with a strong label containing the onset/offset time of each event.

Download

You can download the dataset here.

Details of dataset

The dataset has following folder structure:

MM_Office_Dataset
├── audio
│   ├── test
│   └── train
├── video
│   ├── test
│   └── train
└──label
    ├── testlabel
    └── trainlabel
        ├── classinfo.csv
        ├── eventinfo.csv
        └── recinfo.csv

audio/video

Audio and video were recorded synchronously using four cameras (GoPro HERO8) and eight non-directional microphones (HOSIDEN KUB4225) installed in the office, as shown in the room setup figure below. The audio was recorded at 48kHz/16bit. The video was recorded at 1920×1440/30fps, and then resized to 480

roomsetup

The naming convention for these recordings is as follows.

split[split index]_id[sensor index]_s[scene index]_recid[recording id]_[division].[wav or mp4]

The MM-Office dataset is split into 10 splits for convenience, and the split index (0 to 9) is the index of that. The sensor index is the sensor number of the camera and microphone and corresponds to the room setup figure above (but starts with 0). The scene index is an index that shows the scenario pattern of actions performed by the actors. Refer to eventinfo.csv to see what kind of actions and events each scene contains. The recording id is the serial number of the recording, but after recording, we decided to split each recording in half to make a single clip, so each recording id has two duplicates. The division indicates this, where the first half is 0 and the second half is 1.

label

testlabel

index eventclass starttime endtime
0 8 6 14
1 11 20 35

recinfo.csv

recid sceneid patternid
0 1 1
...
679 11 1

eventinfo.csv

sceneid patternid division class1 class2 class3 class4 ... class12
5 1 0 0 0 0 1 0
... ...
3 6 1 0 1 0 0 1

classinfo.csv

It contains the event name (e.g. 'stand up,' 'phone') of each of the event classes shown in eventinfo.csv and a description of what kind of event it is.

Requirements for sample_data_loader.py

The following is the execution environment in which the expected operation of this program has been verified.

OS

  • Ubuntu 22.04.4 LTS

GPU environment

  • NVIDIA GPU V100 32GB (x4)
  • NVIDIA Driver Version == 470.239.06
  • CUDA Version == 10.1

Python library

pytorch == 1.7.1
torchaudio == 0.7.2
torchvision == 0.8.2
numpy == 1.18.1
pandas == 1.0.0
glob2 == 0.7
tqdm == 4.42.0

How to use sample_data_loader

  1. Prepare the above environment
  2. Download and put MM-Office Dataset at mm-office/
  3. Run . training.sh

License

See this license file.

Authors and Contact

Citing this work

If you'd like to cite this work, you may use the following.

Masahiro Yasuda, Yasunori Ohishi, Shoichiro Saito, Noboru Harada “Multi-view and Multi-modal Event Detection Utilizing Transformer-based Multi-sensor fusion,” in IEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP), 2022.

Link

Paper: arXiv

About

MM-Office: multi-view and multi-modal dataset in an office environment

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published