Download DoTa dataset

Memory-augmented Online Video Anomaly Detection (MOVAD)

Official PyTorch implementation of MOVAD, paper accepted to ICASSP 2024.

We propose MOVAD, a brand new architecture for online (frame-level) video anomaly detection.

Authors: Leonardo Rossi, Vittorio Bernuzzi, Tomaso Fontanini, Massimo Bertozzi, Andrea Prati.

IMP Lab - Dipartimento di Ingegneria e Architettura

University of Parma, Italy

Abstract

The ability to understand the surrounding scene is of paramount importance for Autonomous Vehicles (AVs).

This paper presents a system capable to work in an online fashion, giving an immediate response to the arise of anomalies surrounding the AV, exploiting only the videos captured by a dash-mounted camera.

Our architecture, called MOVAD, relies on two main modules: a Short-Term Memory Module to extract information related to the ongoing action, implemented by a Video Swin Transformer (VST), and a Long-Term Memory Module injected inside the classifier that considers also remote past information and action context thanks to the use of a Long-Short Term Memory (LSTM) network.

The strengths of MOVAD are not only linked to its excellent performance, but also to its straightforward and modular architecture, trained in a end-to-end fashion with only RGB frames with as less assumptions as possible, which makes it easy to implement and play with.

We evaluated the performance of our method on Detection of Traffic Anomaly (DoTA) dataset, a challenging collection of dash-mounted camera videos of accidents.

After an extensive ablation study, MOVAD is able to reach an AUC score of 82.17%, surpassing the current state-of-the-art by $+2.87$ AUC.

Usage

Installation

$ git clone https://github.com/IMPLabUniPr/movad/tree/movad_vad
$ cd movad
$ wget https://github.com/SwinTransformer/storage/releases/download/v1.0.4/swin_base_patch244_window1677_sthv2.pth -O pretrained/swin_base_patch244_window1677_sthv2.pth
$ conda env create -n movad_env --file environment.yml
$ conda activate movad_env

Download DoTa dataset

Please download from official website the dataset and save inside data/dota directory.

You should obtain the following structure:

data/dota
├── annotations
│   ├── 0qfbmt4G8Rw_000306.json
│   ├── 0qfbmt4G8Rw_000435.json
│   ├── 0qfbmt4G8Rw_000602.json
│   ...
├── frames
│   ├── 0qfbmt4G8Rw_000072
│   ├── 0qfbmt4G8Rw_000306
│   ├── 0qfbmt4G8Rw_000435
│   .... 
└── metadata
    ├── metadata_train.json
    ├── metadata_val.json
    ├── train_split.txt
    └── val_split.txt

Download pretrained on DoTA dataset

Open Release v1.0 page and download .pt (pretrained) and .pkl (results) file. Unzip them inside the output directory, obtaining the following directories structure:

output/
├── v4_1
│   ├── checkpoints
│   │   └── model-640.pt
│   └── eval
│       └── results-640.pkl
└── v4_2
    ├── checkpoints
    │   └── model-690.pt
    └── eval
        └── results-690.pkl

Train

python src/main.py --config cfgs/v4_2.yml --output output/v4_2/ --phase train --epochs 1000 --epoch -1

Eval

python src/main.py --config cfgs/v4_2.yml --output output/v4_2/ --phase test --epoch 690

Play: generate video

python src/main.py --config cfgs/v4_2.yml --output output/v4_2/ --phase play --epoch 690

Results

Table 1

Memory modules effectiveness.

#	Short-term	Long-term	AUC	Conf
1			66.53	conf
2	X		74.46	conf
3		X	68.76	conf
4	X	X	79.21	conf

Figure 2

Short-term memory module.

Name	Conf
NF 1	conf
NF 2	conf
NF 3	conf
NF 4	conf
NF 5	conf

Figure 3

Long-term memory module.

Name	Conf
w/out LSTM	conf
LSTM (1 cell)	conf
LSTM (2 cells)	conf
LSTM (3 cells)	conf
LSTM (4 cells)	conf

Figure 4

Video clip length (VCL).

Name	Conf
4 frames	conf
8 frames	conf
12 frames	conf
16 frames	conf

Table 2

Comparison with the state of the art.

#	Method	Input	AUC	Conf
9	Our (MOVAD)	RGB (320x240)	80.09	conf
10	Our (MOVAD)	RGB (640x480)	82.17	conf

License

See GPL v2 License.

Acknowledgement

This research benefits from the HPC (High Performance Computing) facility of the University of Parma, Italy.

Citation

If you find our work useful in your research, please cite:

@inproceedings{rossi2024memory,
  title={Memory-augmented Online Video Anomaly Detection},
  author={Rossi, Leonardo and Bernuzzi, Vittorio and Fontanini, Tomaso and Bertozzi, Massimo and Prati, Andrea},
  booktitle={ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={6590--6594},
  year={2024},
  organization={IEEE}
}

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
cfgs		cfgs
images		images
pretrained		pretrained
scripts		scripts
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Memory-augmented Online Video Anomaly Detection (MOVAD)

Abstract

Usage

Installation

Download DoTa dataset

Download pretrained on DoTA dataset

Train

Eval

Play: generate video

Results

Table 1

Figure 2

Figure 3

Figure 4

Table 2

License

Acknowledgement

Citation

About

Releases

Packages

Languages

License

hachreak/movad

Folders and files

Latest commit

History

Repository files navigation

Memory-augmented Online Video Anomaly Detection (MOVAD)

Abstract

Usage

Installation

Download DoTa dataset

Download pretrained on DoTA dataset

Train

Eval

Play: generate video

Results

Table 1

Figure 2

Figure 3

Figure 4

Table 2

License

Acknowledgement

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages