Official PyTorch implementation of MOVAD, paper accepted to ICASSP 2024.
We propose MOVAD, a brand new architecture for online (frame-level) video anomaly detection.
Authors: Leonardo Rossi, Vittorio Bernuzzi, Tomaso Fontanini, Massimo Bertozzi, Andrea Prati.
IMP Lab - Dipartimento di Ingegneria e Architettura
University of Parma, Italy
The ability to understand the surrounding scene is of paramount importance for Autonomous Vehicles (AVs).
This paper presents a system capable to work in an online fashion, giving an immediate response to the arise of anomalies surrounding the AV, exploiting only the videos captured by a dash-mounted camera.
Our architecture, called MOVAD, relies on two main modules: a Short-Term Memory Module to extract information related to the ongoing action, implemented by a Video Swin Transformer (VST), and a Long-Term Memory Module injected inside the classifier that considers also remote past information and action context thanks to the use of a Long-Short Term Memory (LSTM) network.
The strengths of MOVAD are not only linked to its excellent performance, but also to its straightforward and modular architecture, trained in a end-to-end fashion with only RGB frames with as less assumptions as possible, which makes it easy to implement and play with.
We evaluated the performance of our method on Detection of Traffic Anomaly (DoTA) dataset, a challenging collection of dash-mounted camera videos of accidents.
After an extensive ablation study, MOVAD is able to reach an AUC score of
82.17%, surpassing the current state-of-the-art by
$ git clone https://github.com/IMPLabUniPr/movad/tree/movad_vad
$ cd movad
$ wget https://github.com/SwinTransformer/storage/releases/download/v1.0.4/swin_base_patch244_window1677_sthv2.pth -O pretrained/swin_base_patch244_window1677_sthv2.pth
$ conda env create -n movad_env --file environment.yml
$ conda activate movad_env
Please download from official website
the dataset and save inside data/dota
directory.
You should obtain the following structure:
data/dota
├── annotations
│ ├── 0qfbmt4G8Rw_000306.json
│ ├── 0qfbmt4G8Rw_000435.json
│ ├── 0qfbmt4G8Rw_000602.json
│ ...
├── frames
│ ├── 0qfbmt4G8Rw_000072
│ ├── 0qfbmt4G8Rw_000306
│ ├── 0qfbmt4G8Rw_000435
│ ....
└── metadata
├── metadata_train.json
├── metadata_val.json
├── train_split.txt
└── val_split.txt
Open Release v1.0
page and download .pt (pretrained) and .pkl (results) file.
Unzip them inside the output
directory, obtaining the following directories
structure:
output/
├── v4_1
│ ├── checkpoints
│ │ └── model-640.pt
│ └── eval
│ └── results-640.pkl
└── v4_2
├── checkpoints
│ └── model-690.pt
└── eval
└── results-690.pkl
python src/main.py --config cfgs/v4_2.yml --output output/v4_2/ --phase train --epochs 1000 --epoch -1
python src/main.py --config cfgs/v4_2.yml --output output/v4_2/ --phase test --epoch 690
python src/main.py --config cfgs/v4_2.yml --output output/v4_2/ --phase play --epoch 690
Memory modules effectiveness.
# | Short-term | Long-term | AUC | Conf |
---|---|---|---|---|
1 | 66.53 | conf | ||
2 | X | 74.46 | conf | |
3 | X | 68.76 | conf | |
4 | X | X | 79.21 | conf |
Short-term memory module.
Name | Conf |
---|---|
NF 1 | conf |
NF 2 | conf |
NF 3 | conf |
NF 4 | conf |
NF 5 | conf |
Long-term memory module.
Name | Conf |
---|---|
w/out LSTM | conf |
LSTM (1 cell) | conf |
LSTM (2 cells) | conf |
LSTM (3 cells) | conf |
LSTM (4 cells) | conf |
Video clip length (VCL).
Name | Conf |
---|---|
4 frames | conf |
8 frames | conf |
12 frames | conf |
16 frames | conf |
Comparison with the state of the art.
# | Method | Input | AUC | Conf |
---|---|---|---|---|
9 | Our (MOVAD) | RGB (320x240) | 80.09 | conf |
10 | Our (MOVAD) | RGB (640x480) | 82.17 | conf |
See GPL v2 License.
This research benefits from the HPC (High Performance Computing) facility of the University of Parma, Italy.
If you find our work useful in your research, please cite:
@inproceedings{rossi2024memory,
title={Memory-augmented Online Video Anomaly Detection},
author={Rossi, Leonardo and Bernuzzi, Vittorio and Fontanini, Tomaso and Bertozzi, Massimo and Prati, Andrea},
booktitle={ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
pages={6590--6594},
year={2024},
organization={IEEE}
}