Group-based Bi-Directional Recurrent Wavelet Neural Network for Efficient Video Super-Resolution (VSR)
This repository is the official PyTorch implementation of the paper published in Pattern Recognition Letters (Elsevier).
Video super-resolution (VSR) is an important technology for enhancing the quality of video frames. The recurrent neural network (RNN)-based approach is suitable for sequential data because it can use accu- mulated temporal information. However, since existing methods only tend to capture slow and symmet- rical motion with low frame rate, there are still limitations to restore the missing details for more dy- namic motion. Most of the previous methods using spatial information treat different types of the spatial features identically. It leads to lack of obtaining meaningful information and enhancing the fine details. We propose a group-based bi-directional recurrent wavelet neural network (GBR-WNN) to exploit spatio- temporal information effectively. The proposed group-based bi-directional RNN (GBR) framework is built on the well-structured process with the group of pictures (GOP). In a GOP, we resolves the low-resolution (LR) frames from border frames to center target frame. Because super-resolved features in a GOP are cu- mulative, neighboring features are improved progressively and asymmetrical motion can be dealt with. Also, we propose a temporal wavelet attention (TWA) adopting attention module for both spatial and temporal features simultaneously based on discrete wavelet transform. Experiments show that the pro- posed scheme achieves superior performance compared with state-of-the-art methods.
-
Anaconda3
-
Python == 3.6
conda create --name gbrwnn python=3.6
-
Trained on PyTorch 1.8.1 CUDA 10.2
conda install pytorch==1.8.1 torchvision==0.9.1 torchaudio==0.8.1 cudatoolkit=10.2 -c pytorch
-
tqdm, pyyaml, tensorboard, opencv-python, lmdb
conda install -c conda-forge tqdm pyyaml tensorboard pip install opencv-python pip install lmdb
We used Vimeo90K dataset for training and Vid4, REDS4, SPMCS, DAVIS-2019 datasets for testing.
-
Prepare for Vimeo90K
-
Please refer to Dataset.md in our Deep-Video-Super-Resolution repository for more details.
-
Download dataset from the official website.
-
Put the dataset in ./datasets/
-
Generate LR data
Run in ./codes/data_processing_scripts/
python generate_LR_Vimeo90K.py
-
Generate LMDB
Run in ./codes/data_processing_scripts/
python generate_lmdb_Vimeo90K.py
-
-
Prepare for Vid4
-
Please refer to Dataset.md in our Deep-Video-Super-Resolution repository for more details.
-
Download dataset from here.
-
Put the dataset in ./datasets/
-
Generate LR data
Run in ./codes/data_processing_scripts/
python generate_LR_Vid4.py
-
-
Prepare for REDS4
-
Please refer to Dataset.md in our Deep-Video-Super-Resolution repository for more details.
-
Download dataset from the official website.
-
Put the dataset in ./datasets/
-
-
Prepare for SPMCS
-
Download dataset from here.
-
Put the dataset in ./datasets/
-
Generate LR data
Run in ./codes/data_processing_scripts/
python generate_LR_SPMCS.py
-
-
Prepare for DAVIS-2019
-
Download dataset from the official website.
-
Put the dataset in ./datasets/
-
Generate LR data
Run in ./codes/data_processing_scripts/
python generate_LR_DAVIS.py
-
Pre-trained models are available in below link.
Run in ./codes/
-
GBR-WNN-L
Using single GPU
python train.py -opt options/train/train_GBRWNN_L.yml
Using multiple GPUs (nproc_per_node means the number of GPUs) with setting CUDA_VISIBLE_DEVICES in .yml file
For example, set 'gpu_ids: [0,1,2,3,4,5,6,7]' in .yml file for 8 GPUs
python -m torch.distributed.launch --nproc_per_node=8 --master_port=4321 train.py -opt options/train/train_GBRWNN_L.yml --launcher pytorch
-
GBR-WNN-M
Using single GPU
python train.py -opt options/train/train_GBRWNN_M.yml
Using multiple GPUs (nproc_per_node means the number of GPUs) with setting CUDA_VISIBLE_DEVICES in .yml file
For example, set 'gpu_ids: [0,1,2,3,4,5,6,7]' in .yml file for 8 GPUs
python -m torch.distributed.launch --nproc_per_node=8 --master_port=4321 train.py -opt options/train/train_GBRWNN_M.yml --launcher pytorch
-
GBR-WNN-S
Using single GPU
python train.py -opt options/train/train_GBRWNN_S.yml
Using multiple GPUs (nproc_per_node means the number of GPUs) with setting CUDA_VISIBLE_DEVICES in .yml file
For example, set 'gpu_ids: [0,1,2,3,4,5,6,7]' in .yml file for 8 GPUs
python -m torch.distributed.launch --nproc_per_node=8 --master_port=4321 train.py -opt options/train/train_GBRWNN_S.yml --launcher pytorch
Run in ./codes/
python test.py
You can test the GBR-WNN-L, GBR-WNN-M, GBR-WNN-S models under Vid4, REDS4, SPMCS, DAVIS-2019 test datasets by modifying the 'model_mode' and 'data_mode' in source code.
@article{choi2022group,
title={Group-based bi-directional recurrent wavelet neural network for efficient video super-resolution (VSR)},
author={Choi, Young-Ju and Lee, Young-Woon and Kim, Byung-Gyu},
journal={Pattern Recognition Letters},
volume={164},
pages={246--253},
year={2022},
publisher={Elsevier}
}
The codes are heavily based on EDVR and BasicSR. Thanks for their awesome works.
EDVR :
Wang, Xintao, et al. "Edvr: Video restoration with enhanced deformable convolutional networks." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 2019.
BasicSR :
@misc{basicsr,
author = {Xintao Wang and Liangbin Xie and Ke Yu and Kelvin C.K. Chan and Chen Change Loy and Chao Dong},
title = {{BasicSR}: Open Source Image and Video Restoration Toolbox},
howpublished = {\url{https://github.com/XPixelGroup/BasicSR}},
year = {2022}
}