Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CIFAR-10 Reproduction #13

Closed
HieuPhan33 opened this issue Sep 2, 2021 · 6 comments
Closed

CIFAR-10 Reproduction #13

HieuPhan33 opened this issue Sep 2, 2021 · 6 comments

Comments

@HieuPhan33
Copy link

Hi Shaojie,
I could not reproduce the result for MDEQ on CIFAR-10 image classification.

I only obtained 91.56% using MDEQ_large.
I'm using the same parameters in cls_mdeq_LARGE_reg.yaml, except the batch size and the number of GPUs.
Batch size per gpu are 512 with 2 GPUs.
I'm using 2 RTX 3090 graphic cards.

Hope that you can give me some advice.

Thanks Shaojie.

@jerrybai1995
Copy link
Member

Hi @HieuPhan33 ,

Are you using an equivalent batch size of 1024? Could you try a smaller batch size like the default one (I usually use ~100, and found this to be important)?

In addition, when I reproduced the result, I also sometimes (but generally rarely) get <93%, which is part of the fluctuation. If you still encounter the issue, you can also reach me via email and I can send you a sample training log for you to compare... I believe you should expect ~92% after 100 epochs already.

@HieuPhan33
Copy link
Author

Hi @jerrybai1995, thanks for quick response. I will reduce the batch size and keep you updated.
Thumbs up.

@HieuPhan33
Copy link
Author

Hi, I achieved 92.30% when using a batch size of 128.
Would you have any advice to continue to increase the accuracy to ~93% as expected?

@jerrybai1995
Copy link
Member

jerrybai1995 commented Sep 3, 2021

Hmmm, 92.3% still sounds too low to me for the given default parameters (my logs are usually in the range 92.6% - 93.4%). Could you try increasing f_thres (e.g., 9) and b_thres (e.g., 8 or 9) in the yaml file and using the default batch size? I also think that increasing the momentum (e.g., to 0.99) would improve the performance but I believe you should be able to reproduce the ~93% level performance even without tuning these things.

I'll look into this but in case you might find it useful, feel free to contact me (shaojieb@cs.cmu.edu) and I'll send you some training logs.

@jerrybai1995
Copy link
Member

Hi @HieuPhan33 ,

I was able to produce 93.04% and 92.78% on two (slightly different and) independent runs, basically with the modifications/settings mentioned above. E.g., I got 93.04% from the following yaml:

GPUS: (0,)
LOG_DIR: 'log/'
DATA_DIR: ''
OUTPUT_DIR: 'output/'
WORKERS: 2
PRINT_FREQ: 100

MODEL: 
  NAME: mdeq
  NUM_LAYERS: 8
  NUM_CLASSES: 10
  NUM_GROUPS: 8
  DROPOUT: 0.22
  WNORM: true
  DOWNSAMPLE_TIMES: 0
  EXPANSION_FACTOR: 5
  POST_GN_AFFINE: false
  IMAGE_SIZE: 
    - 32
    - 32
  EXTRA:
    FULL_STAGE:
      NUM_MODULES: 1
      NUM_BRANCHES: 4
      BLOCK: BASIC
      BIG_KERNELS:
      - 0
      - 0
      - 0
      - 0
      HEAD_CHANNELS:
      - 14
      - 28
      - 56
      - 112
      FINAL_CHANSIZE: 1680
      NUM_BLOCKS:
      - 1
      - 1
      - 1
      - 1
      NUM_CHANNELS:
      - 32
      - 64
      - 128
      - 256
      FUSE_METHOD: SUM
DEQ:
  F_SOLVER: 'broyden'
  B_SOLVER: 'broyden'
  STOP_MODE: 'rel'
  F_THRES: 8
  B_THRES: 7
  RAND_F_THRES_DELTA: 1
  SPECTRAL_RADIUS_MODE: false
CUDNN:
  BENCHMARK: true
  DETERMINISTIC: false
  ENABLED: true
LOSS:
  JAC_LOSS_FREQ: 0.02
  JAC_LOSS_WEIGHT: 0.4
  PRETRAIN_JAC_LOSS_WEIGHT: 0.0
  JAC_STOP_EPOCH: 90
DATASET:
  DATASET: 'cifar10'
  DATA_FORMAT: 'jpg'
  ROOT: 'data/cifar10/'
  TEST_SET: 'val'
  TRAIN_SET: 'train'
TEST:
  BATCH_SIZE_PER_GPU: 96
  MODEL_FILE: ''
TRAIN:
  BATCH_SIZE_PER_GPU: 96
  BEGIN_EPOCH: 0
  END_EPOCH: 220
  RESUME: false
  LR_SCHEDULER: 'cosine'
  PRETRAIN_STEPS: 12000
  LR_FACTOR: 0.1
  LR_STEP:
  - 30
  - 60
  - 90
  OPTIMIZER: adam
  LR: 0.001
  WD: 0.0
  MOMENTUM: 0.99
  NESTEROV: true
  SHUFFLE: true
DEBUG:
  DEBUG: false

Hope this helps!

@HieuPhan33
Copy link
Author

Thanks Shaojie, really appreciate your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants