Single Image Super-Resolution with WDSR, EDSR and SRGAN
A Keras-based implementation of
- Wide Activation for Efficient and Accurate Image Super-Resolution (WDSR), winner of the NTIRE 2018 super-resolution challenge.
- Enhanced Deep Residual Networks for Single Image Super-Resolution (EDSR), winner of the NTIRE 2017 super-resolution challenge.
- Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network (SRGAN).
This projects also supports fine-tuning of EDSR models as generators in SRGAN-like networks.
Table of contents
- Environment setup
- Getting started
- Pre-trained models
- JPEG compression
- Weight normalization
- Other implementations
On a system with a GPU create a new conda environment with *)
conda env create -f environment-gpu.yml
On a system without a GPU create an environment with
conda env create -f environment-cpu.yml
Activate the environment with
source activate sisr
*) It is assumed that appropriate CUDA and
cuDNN versions for the current tensorflow-gpu
version are already installed on your system. These libraries are not automatically installed when using
This section uses pre-trained models to super-resolve images with factor x4.
Here, the pre-trained WDSR-A model wdsr-a-32-x4 is
used. Click on the link to download the model. It is an experimental model not described in the WDSR paper that was
trained with a pixel-wise loss function (mean absolute error). Assuming that the path to the downloaded model is
~/Downloads/wdsr-a-32-x4-psnr-29.1736.h5, the following command super-resolves images in directory
with factor x4 and writes the results to directory
python demo.py -i ./demo -o ./output --model ~/Downloads/wdsr-a-32-x4-psnr-29.1736.h5
output directory only contains the super-resolved images. Below are figures that additionally compare the
super-resolution (SR) results with the corresponding low-resolution (LR) and high-resolution (HR) images and an x4
resize with bicubic interpolation (code for generating these figures not included yet).
EDSR + SRGAN
A problem with pixel-wise loss functions is that they fail to recover high-frequency details. Super-resolution results are typically overly smooth with lower perceptual quality, especially at scale x4. A perceptual loss as described in the SRGAN paper (a combination of a VGG-based content loss and an adversarial loss) is able to generate more realistic textures with higher perceptual quality but at the cost of lower PSNR values.
An EDSR baseline model that has been fine-tuned as generator in an SRGAN-like network can be downloaded from here.
Please note that support for SRGAN training is still work in progress. Assuming that the path
to the downloaded model is
~/Downloads/edsr-16-x4-gen-epoch-088.h5, the following command super-resolves the image in
./demo/gan with factor x4 and writes the result to directory
python demo.py -i ./demo/gan -o ./output --model ~/Downloads/edsr-16-x4-gen-epoch-088.h5
output directory only contains the super-resolved image. The following figure additionally compares the result
with that obtained from an EDSR model that has been trained with a pixel-wise loss only (mean squared error). One can
clearly see how training with a perceptual loss in a GAN improves recovery of high-frequency content.
If you want to train and evaluate models, you need to download the DIV2K dataset
and extract the downloaded archives to a directory of your choice (
DIV2K in the following example). The resulting
directory structure should look like:
DIV2K DIV2K_train_HR DIV2K_train_LR_bicubic X2 X3 X4 DIV2K_train_LR_unknown X2 X3 X4 DIV2K_valid_HR DIV2K_valid_LR_bicubic ... DIV2K_valid_LR_unknown ...
You only need to download DIV2K archives for those downgrade operators (unknown, bicubic) and super-resolution scales (x2, x3, x4) that you'll actually use for training.
Before the DIV2K images can be used they must be converted to numpy arrays and stored in a separate location. Conversion
to numpy arrays dramatically reduces image pre-processing times. Conversion can be done with the
python convert.py -i ./DIV2K -o ./DIV2K_BIN numpy
In this example, converted images are written to the
DIV2K_BIN directory. By default, training and evaluation scripts
read from this directory which can be overriden with the
--dataset command line option.
WDSR, EDSR and SRResNet *) models can be trained with a pixel-wise loss function with
Default for WDSR and EDSR is mean absolute error, for SRResNet it is mean squared error. For example, a WDSR-A
baseline model with 8 residual blocks can be trained for scale x2 with
python train.py --dataset ./DIV2K_BIN --outdir ./output --profile wdsr-a-8 --scale 2
--dataset option sets the location of the DIV2K dataset and the
--output option the output directory (defaults
./output). Each training run creates a timestamped sub-directory in the specified output directory which contains
saved models, all command line options (default and user-defined) in an
args.txt file as well as
TensorBoard logs. The super-resolution factor is set with
--scale option. The downgrade operator can be set with the
--downgrade option. It defaults to
bicubic and can
be changed to
bicubic_jpeg_90 (see also section JPEG compression).
By default, the model is validated against randomly cropped images from the DIV2K validation set. If you'd rather
want to evaluate the model against full-sized DIV2K validation images after each epoch you need to set the
command line option. This however slows down training significantly and makes only sense for smaller models. Alternatively,
you can evaluate saved models later with
evaluate.py as described in the section Evaluation.
To train models for higher scales (x3 or x4) it is possible to re-use the weights of models pre-trained for a smaller
scale (x2). This can be done with the
--pretrained-model option. For example,
python train.py --dataset ./DIV2K_BIN --outdir ./output --profile wdsr-a-8 --scale 4 \ --pretrained-model ./output/20181016-063620/models/epoch-294-psnr-34.5394.h5
trains a WDSR-A baseline model with 8 residual blocks for scale x4 re-using the weights of model
a WDSR-A baseline model with the same number of residual blocks trained for scale x2.
For a more detailed overview of available command line options and profiles take a look at
train.py or run
python train.py -h. Section Pre-trained models also shows the training command for each available
*) SRResNet is the super-resolution model used in the SRGAN paper.
Perceptual loss (SRGAN)
Training with a perceptual loss as described in the SRGAN paper requires a model that has been pre-trained with a pixel-wise loss. At the moment, only SRResNet and EDSR models at scale x4 are supported for SRGAN training. For example, SRResNet can be pre-trained with
python train.py --dataset ./DIV2K_BIN --profile sr-resnet
An EDSR baseline model that can be used as generator in an SRGAN-like network can be pre-trained with
python train.py --dataset ./DIV2K_BIN --profile edsr-gen --scale 4 --num-res-blocks 16
Selected models from pre-training can then be used as starting point for SRGAN training. For example,
python train_gan.py --dataset ./DIV2K_BIN --generator sr-resnet --label-noise 0.0 \ --pretrained-model <path-to-pretrained-model>
starts SRGAN training as described in the SRGAN paper using a VGG54 content loss and SRResNet as generator whereas
python train_gan.py --dataset ./DIV2K_BIN --generator edsr-gen --scale 4 --num-res-blocks 16 \ --pretrained-model <path-to-pretrained-model>
uses an EDSR baseline model with 16 residual blocks as generator. SRGAN training is still work in progress.
An alternative to the
--benchmark training option is to evaluate saved models with
evaluate.py and then select the
model with the highest PSNR. For example,
python evaluate.py --dataset ./DIV2K_BIN -i ./output/20181016-063620/models -o eval.json
evaluates all models in directory
./output/20181016-063620/models and writes the results to
eval.json. This JSON
file maps model filenames to PSNR values. The
evaluate.py script also writes the model with the best PSNR to
at the end of evaluation:
Best PSNR = 34.5394 for model ./output/20181016-063620/models/epoch-294-psnr-37.4630.h5
The higher PSNR value in the model filename must not be confused with the value generated by
evaluate.py. The PSNR value
in the filename was generated during training by validating against smaller, randomly cropped images which tends to yield
higher PSNR values.
The following list contains available pre-trained models. They were trained with images 1-800 from the DIV2K training set using the specified downgrade operator. Random crops and transformations were made as described in the EDSR paper. Model performance is measured in dB PSNR on the DIV2K validation set (images 801-900, RGB channels, without self-ensemble).
|edsr-16-x2 1)||x2||16||bicubic||1.37M||34.64 dB||
python train.py --profile edsr-16 \
|edsr-16-x4-gen-pre 2)||x4||16||bicubic||1.52M||28.89 dB||
python train.py --profile edsr-gen \
python train_gan.py --generator edsr-gen \
|wdsr-a-16-x2 4)||x2||16||bicubic||1.19M||34.68 dB||
python train.py --profile wdsr-a-16 \
|wdsr-a-32-x2 5)||x2||32||bicubic||3.55M||34.80 dB||
python train.py --profile wdsr-a-32 \
|wdsr-a-32-x4 5)||x4||32||bicubic||3.56M||29.17 dB||
python train.py --profile wdsr-a-32 \
|wdsr-b-32-x2 6)||x2||32||bicubic||0.59M||34.63 dB||
python train.py --profile wdsr-b-32 \
1) EDSR baseline, see also EDSR project page.
2) EDSR baseline pre-trained for usage as generator in an SRGAN-like network.
3) EDSR baseline fine-tuned as generator in an SRGAN-like network.
4) WDSR baseline, see also WDSR project page.
5) Experimental WDSR-A models trained with an expansion ratio of 6 (default is 4).
6) Experimental WDSR-B model.
There is experimental support for adding JPEG compression artifacts to LR images and training with compressed images.
The following commands convert bicubic downscaled DIV2K training and validation images to JPEG images with quality
python convert.py -i ./DIV2K/DIV2K_train_LR_bicubic \ -o ./DIV2K/DIV2K_train_LR_bicubic_jpeg_90 \ --jpeg-quality 90 jpeg python convert.py -i ./DIV2K/DIV2K_valid_LR_bicubic \ -o ./DIV2K/DIV2K_valid_LR_bicubic_jpeg_90 \ --jpeg-quality 90 jpeg
After having converted these JPEG images to numpy arrays, as described in section Dataset, models can be
trained with the
--downgrade bicubic_jpeg_90 option to additionally learn to recover from JPEG compression artifacts.
Two models trained in that manner are available as pre-trained models:
|wdsr-a-32-x2-q90||x2||32||bicubic + JPEG||3.55M||32.12 dB||
python train.py --profile wdsr-a-32 \
|wdsr-a-32-x4-q90||x4||32||bicubic + JPEG||3.56M||27.63 dB||
python train.py --profile wdsr-a-32 \
WDSR models are trained with weight normalization. This branch uses a modified Adam optimizer
for that purpose. The meanwhile outdated branch wip-conv2d-weight-norm
instead uses a specialized
layer and a default Adam optimizer (experimental work inspired by the official WDSR Tensorflow
port). Current plan is to replace this layer with a default
Conv2D layer and a Tensorflow weight normalization wrapper
when the wrapper becomes officially available.
- Official PyTorch implementation
- Official Torch implementation
- Tensorflow implementation by Josh Miller.
Code in this project requires the Keras Tensorflow backend.