Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

issue about x2 SR training process #2

Closed
wwlCape opened this issue Apr 11, 2022 · 5 comments
Closed

issue about x2 SR training process #2

wwlCape opened this issue Apr 11, 2022 · 5 comments

Comments

@wwlCape
Copy link

wwlCape commented Apr 11, 2022

image
Hi, when I train the model under setting1_x2, I can not obtain a good result. Maybe I missed something in my training. Could you give some advice on it?

#### general settings
name: DCLSx2_setting1
use_tb_logger: true
model: blind
distortion: sr
scale: 2
gpu_ids: [0, 1, 2, 3]
pca_matrix_path: ../../../pca_matrix/DCLS/pca_matrix.pth

degradation:
  random_kernel: True
  ksize: 21
  code_length: 10
  sig_min: 0.2
  sig_max: 2.0
  rate_iso: 1.0
  random_disturb: false

#### datasets
datasets:
  train:
    name: DIV2K
    mode: GT
    dataroot_GT: /datasets/DF2K/HR/x2HR.lmdb

    use_shuffle: true
    n_workers: 4  # per GPU
    batch_size: 64
    GT_size: 128
    LR_size: 64
    use_flip: true
    use_rot: true
    color: RGB
  val:
    name: Set5
    mode: LQGT
    dataroot_GT: /datasets/Set5/x2HR.lmdb
    dataroot_LQ: /datasets/Set5/x2LRblur.lmdb

#### network structures
network_G:
  which_model_G: DCLS
  setting:
    nf: 64
    nb: 10
    ng: 5
    input_para: 256
    kernel_size: 21

#### path
path:
  pretrain_model_G: ~
  strict_load: true
  resume_state: ~

#### training settings: learning rate scheme, loss
train:
  lr_G: !!float 4e-4
  lr_E: !!float 4e-4
  lr_scheme: MultiStepLR
  beta1: 0.9
  beta2: 0.99
  niter: 500000
  warmup_iter: -1  # no warm up
  lr_steps: [200000, 400000]
  lr_gamma: 0.5
  eta_min: !!float 1e-7

  pixel_criterion: l1
  pixel_weight: 1.0

  manual_seed: 0
  val_freq: !!float 100

#### logger
logger:
  print_freq: 20
  save_checkpoint_freq: !!float 1000

This is the config settings. Thank you for your reply!

@Algolzw
Copy link
Collaborator

Algolzw commented Apr 11, 2022

We also notice that the training is unstable and often collapses after several epochs. Address it, the common solution is to resume from the last normal training state when the PSNR collapse.
Alternatively, you can also write a shell command that allows the program automatically restart the training from the previous state.

This phenomenon also occurs in DAN. Maybe it is an inherent drawback of the outer framework.

Sorry for this problem, and we will try to solve it in the next version.

@wwlCape
Copy link
Author

wwlCape commented Apr 11, 2022

We also notice that the training is unstable and often collapses after several epochs. Address it, the common solution is to resume from the last normal training state when the PSNR collapse. Alternatively, you can also write a shell command that allows the program automatically restart the training from the previous state.

This phenomenon also occurs in DAN. Maybe it is an inherent drawback of the outer framework.

Sorry for this problem, and we will try to solve it in the next version.

Thank you for your reply! Could you please send a train log file to my email? I want to check whether my training process is normal. Thank you very much! (Email address: wenwlmail@163.com)

@Algolzw
Copy link
Collaborator

Algolzw commented Apr 13, 2022

The problem has been solved, so I close this issue.

@Algolzw Algolzw closed this as completed Apr 13, 2022
@Algolzw Algolzw mentioned this issue Jul 11, 2022
@RC-Qiao
Copy link

RC-Qiao commented Sep 6, 2022

我想请问为啥 他迭代28800次才两个epoch呢?训练集一共才3500张图片一批64张迭代不到100次不就一个epoch了么

@Algolzw
Copy link
Collaborator

Algolzw commented Sep 6, 2022

我想请问为啥 他迭代28800次才两个epoch呢?训练集一共才3500张图片一批64张迭代不到100次不就一个epoch了么

由于每次batch都是crop后的结果,所以对整个数据集乘以了10作为一个epoch。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants