issue about x2 SR training process #2

wwlCape · 2022-04-11T03:48:29Z

Hi, when I train the model under setting1_x2, I can not obtain a good result. Maybe I missed something in my training. Could you give some advice on it?

#### general settings
name: DCLSx2_setting1
use_tb_logger: true
model: blind
distortion: sr
scale: 2
gpu_ids: [0, 1, 2, 3]
pca_matrix_path: ../../../pca_matrix/DCLS/pca_matrix.pth

degradation:
  random_kernel: True
  ksize: 21
  code_length: 10
  sig_min: 0.2
  sig_max: 2.0
  rate_iso: 1.0
  random_disturb: false

#### datasets
datasets:
  train:
    name: DIV2K
    mode: GT
    dataroot_GT: /datasets/DF2K/HR/x2HR.lmdb

    use_shuffle: true
    n_workers: 4  # per GPU
    batch_size: 64
    GT_size: 128
    LR_size: 64
    use_flip: true
    use_rot: true
    color: RGB
  val:
    name: Set5
    mode: LQGT
    dataroot_GT: /datasets/Set5/x2HR.lmdb
    dataroot_LQ: /datasets/Set5/x2LRblur.lmdb

#### network structures
network_G:
  which_model_G: DCLS
  setting:
    nf: 64
    nb: 10
    ng: 5
    input_para: 256
    kernel_size: 21

#### path
path:
  pretrain_model_G: ~
  strict_load: true
  resume_state: ~

#### training settings: learning rate scheme, loss
train:
  lr_G: !!float 4e-4
  lr_E: !!float 4e-4
  lr_scheme: MultiStepLR
  beta1: 0.9
  beta2: 0.99
  niter: 500000
  warmup_iter: -1  # no warm up
  lr_steps: [200000, 400000]
  lr_gamma: 0.5
  eta_min: !!float 1e-7

  pixel_criterion: l1
  pixel_weight: 1.0

  manual_seed: 0
  val_freq: !!float 100

#### logger
logger:
  print_freq: 20
  save_checkpoint_freq: !!float 1000

This is the config settings. Thank you for your reply!

The text was updated successfully, but these errors were encountered:

Algolzw · 2022-04-11T10:31:19Z

We also notice that the training is unstable and often collapses after several epochs. Address it, the common solution is to resume from the last normal training state when the PSNR collapse.
Alternatively, you can also write a shell command that allows the program automatically restart the training from the previous state.

This phenomenon also occurs in DAN. Maybe it is an inherent drawback of the outer framework.

Sorry for this problem, and we will try to solve it in the next version.

wwlCape · 2022-04-11T11:42:46Z

We also notice that the training is unstable and often collapses after several epochs. Address it, the common solution is to resume from the last normal training state when the PSNR collapse. Alternatively, you can also write a shell command that allows the program automatically restart the training from the previous state.

This phenomenon also occurs in DAN. Maybe it is an inherent drawback of the outer framework.

Sorry for this problem, and we will try to solve it in the next version.

Thank you for your reply! Could you please send a train log file to my email? I want to check whether my training process is normal. Thank you very much! (Email address: wenwlmail@163.com)

Algolzw · 2022-04-13T15:15:25Z

The problem has been solved, so I close this issue.

RC-Qiao · 2022-09-06T14:00:12Z

我想请问为啥他迭代28800次才两个epoch呢？训练集一共才3500张图片一批64张迭代不到100次不就一个epoch了么

Algolzw · 2022-09-06T20:04:01Z

我想请问为啥他迭代28800次才两个epoch呢？训练集一共才3500张图片一批64张迭代不到100次不就一个epoch了么

由于每次batch都是crop后的结果，所以对整个数据集乘以了10作为一个epoch。

Algolzw closed this as completed Apr 13, 2022

Algolzw mentioned this issue Jul 11, 2022

sos #8

Open

Algolzw mentioned this issue Aug 6, 2022

setting2的训练 #14

Closed

Algolzw mentioned this issue Jul 27, 2023

關於train時發生的錯誤問題 #43

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

issue about x2 SR training process #2

issue about x2 SR training process #2

wwlCape commented Apr 11, 2022 •

edited

Loading

Algolzw commented Apr 11, 2022

wwlCape commented Apr 11, 2022

Algolzw commented Apr 13, 2022

RC-Qiao commented Sep 6, 2022

Algolzw commented Sep 6, 2022

issue about x2 SR training process #2

issue about x2 SR training process #2

Comments

wwlCape commented Apr 11, 2022 • edited Loading

Algolzw commented Apr 11, 2022

wwlCape commented Apr 11, 2022

Algolzw commented Apr 13, 2022

RC-Qiao commented Sep 6, 2022

Algolzw commented Sep 6, 2022

wwlCape commented Apr 11, 2022 •

edited

Loading