This repository contains the implementation code for paper: Pair-wise Layer Attention with Spatial Masking for Video Prediction
Video prediction yields future frames by employing the historical frames and has exhibited its great potential in many applications, e.g., meteorological prediction, and autonomous driving. Previous works often decode the ultimate high-level semantic features to future frames without texture details, which deteriorates the prediction quality. Motivated by this, we develop a Pair-wise Layer Attention (PLA) module to enhance the layer-wise semantic dependency of the feature maps derived from the U-shape structure in Translator, by coupling low-level visual cues and high-level features. Hence, the texture details of predicted frames are enriched. Moreover, most existing methods capture the spatiotemporal dynamics by Translator, but fail to sufficiently utilize the spatial features of Encoder. This inspires us to design a Spatial Masking (SM) module to mask partial encoding features during pretraining, which adds the visibility of remaining feature pixels by Decoder. To this end, we present a Pair-wise Layer Attention with Spatial Masking (PLA-SM) framework for video prediction to capture the spatiotemporal dynamics, which reflect the motion trend.
- torch=1.9.0
- scikit-image=0.19.3
- numpy=1.21.5
- argparse
- tqdm=4.64.1
- addict=2.4.0
- fvcore=0.1.5
- hickle=5.0.2
- opencv-python=4.6.0
- pandas=1.3.5
- pillow=9.2.0
- [MinkowskiEngine](ConvNeXt-V2/INSTALL.md at main · facebookresearch/ConvNeXt-V2 · GitHub)
simvp/api
contains an experiment runner.simvp/core
contains core training plugins and metrics.simvp/datasets
contains datasets and dataloaders.simvp/methods/
contains training methods for various video predictionsimvp/models/
contains the main network architectures of various video prediction methods.simvp/modules/
contains network modules and layers.tools/non_dist_train.py
is the executable python file with possible arguments for training, validating, and testing pipelines.
cd ./data/moving_mnist
bash download_mmnist.sh #download the mmnist dataset
python main_pretrain.py #pretrain stage
python main_train.py #tarining stage
MSE | MAE | SSIM | |
---|---|---|---|
PLA-SM | 18.4 | 57.6 | 0.960 |
If you find this repo useful, please cite the following papers.
@article{li-PLA-SM,
author = {Ping Li, Chenhan Zhang, Zheng Yang, Xianghua Xu, Mingli Song},
title = {Pair-wise Layer Attention with Spatial Masking for Video Prediction},
journal = {arXiv},
year = {2023},
doi = {https://arxiv.org/abs/2311.11289}
}
If you have any questions, please feel free to contact Mr. Zhang Chenhan via email (zch2020@hdu.edu.cn)
We would like to thank to the authors of SimVP for making their source code public, which significantly accelerated the development of PLA-SM.