# Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data

<img src = "https://miro.medium.com/max/1400/1*uk_zSjdLzTlR8XTeYbRIag.png">

### propose 
- We propose a high-order degradation process to model practical degradations, and utilize $sinc$ ﬁlters to model common ringing and overshoot artifacts.
- We employ several essential modiﬁcations (e.g., U-Net discriminator with spectral normalization)
- Achieve better visual performance than previous works


https://en.wikipedia.org/wiki/Ringing_artifacts

## Methodology

### 3.1. Classical Degradation Model
Blind SR aims to restore high-resolution images from low-resolution ones with unknown and complex degradations.
1. the ground-truth image $\mathbf{y}$ is ﬁrst convolved with blur kernel $\mathbf{k}$.
2. downsampling operation with scale factor $\mathbf{r}$ is performed.
3. The low-resolution  $\mathbf{x}$ is obtained by adding noise n.
4. JPEG compression is also adopted

$$  \mathbf{x}  = \mathcal{D}(\mathbf{y}) = [( \mathbf{y} \circledast \mathbf{k}\downarrow_{r} + \mathbf{n} )]_\mathrm{JPEG} $$

#### Blur.
- blur degradation as a convolution with a linear blur ﬁlter (kernel).
- Isotropic and anisotropic Gaussian ﬁlters are common choices.

Gaussian blur kernel $\mathbf{k}$ with a kernel size of $2t + 1$,
$$ \mathbf{k}(i,j) = \frac{1}{N} \exp(-\frac{1}{2} \mathbf{C}^T \mathbf{\Sigma}^{-1} \mathbf{C}), \qquad  C = [i,j]^T  $$
where $(i,j) \in [-t,t]$ ,  N is the normalization constant.


$$\Sigma = \begin{bmatrix} \cos\theta & - \sin\theta \\ \sin\theta & \cos\theta \end{bmatrix}  
\begin{bmatrix} \sigma_1 & 0\\ 0 & \sigma_2 \end{bmatrix} \begin{bmatrix} \cos\theta &  \sin\theta \\ -\sin\theta & \cos\theta \end{bmatrix} $$

- When $\sigma_1 = \sigma_2$,  $\mathbf{k}$ is an isotropic Gaussian blur kernel; otherwise $\mathbf{k}$ is an anisotropic kernel.

https://datacarpentry.org/image-processing/06-blurring/

#### Noise.
two commonly-used noise types 
1. additive Gaussian noise ( Gaussian distribution )
    - color noise.
    - synthesize gray noise
2. Poisson noise ( Poisson distribution )
    - It is usually used to approximately model the **sensor noise** caused by statistical quantum ﬂuctuations

#### Resize (Downsampling)
- we consider both downsamping and upsampling, i.e., the resize operation
- There are several resize algorithms 
    - nearest-neighbor interpolation, 
    - area resize
    - bilinear interpolation
    - bicubic interpolation.
    
- we exclude NN since it causes the misalignment issue

#### JPEG compression. 
- 이후 보충
- Images are then split into 8 × 8 blocks and each block is transformed with a two-dimensional discrete cosine transform (DCT) 

### 3.2. High-order Degradation Model

- adopt the above classical degradation model to synthesize training pairs, the trained model could indeed handle some real samples.

- However, it still can not resolve some complicated degradations in the real world, especially the unknown noises and complex artifacts

- The classical degradation model only includes a ﬁxed number of basic degradations 

- the real-life degradation processes are quite diverse
    - eg
    - low-quality image download from the Internet
    - the original image might be taken with a cellphone
    - The image was then edited with sharpening and resize operations
    - It was uploaded to some social media applications
    
- Thus, we propose a high-order degradation model.

$$\mathbf{x}  = \mathcal{D}^n(\mathbf{y}) = (\mathcal{D}_n \circ \cdots \circ \mathcal{D}_2 \circ \mathcal{D}_1) (\mathbf{y})$$

<img src = "https://miro.medium.com/max/1400/1*Evfha3EKdZKK5SzlRHr2Gw.png">

### 3.3. Ringing and overshoot artifacts

<img src = "https://miro.medium.com/max/1400/1*pYOD-TjDsGMG2lHJwuLrkQ.png">

- 보충 해야함
- Top: Real samples suffering from ringing and overshoot artifacts.
- Bottom: Examples of sinc kernels (kernel size 21)
- Ringing artifacts는 종종 띠 또는 마치 영혼이 물체에서 빠져 나가는것 같은 형태로 모서리 부분에 나타난다. Overshooting artifacts는 보통 ringing artifacts 결합한 열화 형태로 생성된다.

### 3.4. Networks and Training
#### ESRGAN generator
-  extend the original ×4 ESRGAN architecture to perform super-resolution with a scale factor of ×2 and ×1.
- As ESRGAN is a heavy network, we ﬁrst employ the pixel-unshufﬂe

<img src="https://velog.velcdn.com/images%2Fheaseo%2Fpost%2Fbc601e9b-96e4-4801-839f-979d9ddceada%2FESRGAN%20network.png">

#### U-Net discriminator with spectral normalization (SN).
- Speciﬁcally, the discriminator in Real-ESRGAN requires a greater discriminative power for complex training outputs.
- The UNet outputs realness values for each pixel, and can provide detailed per-pixel feedback to the generator.
- We employ the spectral normalization regularization [37] to stabilize the training dynamics.

<img src = "https://miro.medium.com/max/940/1*yR86ulSzVJph3YwWIMw9mw.png">


## 4. Experiments
### 4.1. Datasets and Implementation
### 4.2. Comparisons with Prior Works

<img src ="https://miro.medium.com/max/1400/1*uVjrZKKHQ11xJ7fcm9LFyg.png">
### 4.3. Ablation Studies

#### Second-order degradation model.
#### sinc ﬁlters.
#### U-Net discriminator with SN regularization.
#### More complicated blur kernels.

### 4.4. Limitations

<img src ="https://miro.medium.com/max/1400/1*4lUdb2I1TP-PmTegBWk59A.png">